View Full Version : How did they do that!!!
Jim Morris
12-19-2002, 02:36 PM
I've seen on some shopping carts the following scenario, but can't figure out how to set up the db to do it. Here goes:
You buy something on a cart, then, a box appears stating that... "Customers who purchased this item also purchased ...whatever". It also shows "Customers who looked at this item also looked at ...whatever".
I don't even know where to start. How would you set up relating tables to cover something that seems to be happening on the fly once an item is selected, and then know which item to refer to the customer based on what they chose?
It's an awesome feature on a shopping cart, but I can't even begin to understand how they do it.
Thanks in advance,
Jim:)
It's done with some form of order or viewing history analysis. That is, you track whenever a person is looking at an item; and of course you track orders.
Once the session is complete, you would archive some of the information to a set of tables, then run some analysis on that data (you could do this out of the raw data tables, but I have a feeling this would get a little slow). The simple approach is to use a simple count (the product that was bought by most people who bought the one being looked at). The more elegant (and better) approaches will perform some sort of analysis on the data to find the most similar products, based on more than one variable.
(Personally, I would use multi-dimensional scaling to measure the (mathematical) similarity between products, based on purchase patterns, then suggest the most similar. I think most more sophisticated approaches use some form of clustering). The results from this analysis provide you with an associative table -- that is, what items are most closely associated with what other items. You can then rank them (and list them).
One of the most interesting approaches I've thought of was to perform a decent MDS analysis of both the customer and the product, look at total sales figures then rank the products against a weight index of the three values. This would produce a product that is 1. Related to the one they are looking at; 2. Related to their own purchase patterns; and 3. Related to things people are actually buying. Of course doing this would probably require someone pretty good with higher stats, so that rules me out from actually doing it.
krycek
12-19-2002, 03:09 PM
...applause :thumbsup:
Good explanation :) you beat me to it :p
::] krycek [::
Hmm. I'm also no expert, but you wont find exerts on this in a MySQL forum.
Such functionalitys don't have much ado with databases. More with datamining. Anyway, this is my opinion:
- don't waist your time on it, unless you have realy a lot of customers (1000 a day) (or realy have a lot of spare time on your hands)
- don't start with it if you have many items in your shop (because the more items you hav, the more customers you need before you can, find strong patterns)
-i you do start on this, do some homework about neural networks. MDS could get you started, buth this will only pay off if you have a realy big numer of customers, with a rather small number of products. Also, this woun't take some meaningful relationships into account. It's more straight clustering
With neural nets, you can develop a feedback mechanisme to capture these relationships.
I know that packages like SPSS ans SAS have datamining modules that provide these functionalitys (if you can spend 50 000 $ on them, your troubles or over) buth of coarse this means:
- computing relationships is done off-line
- you only get the build in algorithms
You could write some analyzing tool yourself, but realy, without the big number of customers, you wount find any strong pattern.
You'll probably be better of with scaling your products manualy on a few dimension (= categorizing them manualy) and just display all the items that are in the same category.
Jim Morris
12-23-2002, 02:59 PM
Yea, I think your right. This is WAY!!!! more involved than I even care to deal with. I thought it was a great feature, but certainly not be practical for me. I think I will just do as you suggested and just do it all myself and stick the additional add on items on the html page itself on each product their looking at ( or at least that's what I think your recommending... again, I'm a newbie). I doubt it would make much difference anyway, and the result would be about the same... upsell, upsell, upsell!!!
Thanks for all the input.
Jim:)
what i mean is:
scale each new product that you enter on a few dimensions (for instance, dimension price (with classes like <5$, 6-10%, 11-15%,...), dimension medium (book, CD, ...), dimension author, dimension expertlevel (beginners, advanced, pro's), dimension sector (cars, planes, boats), dimension role (teacher, student, researcher) ....
The dimensions to use, depend on the content for your site.
You could then build the select statement based on the vealues for the newly bought product, and retrieve other product with the same caracteristiks.
If you can find the time, you could also define 'distance-measurements' between the values in each dimension. This way you can compute which products are closest to the latest bought item.
For instance, say these are the weights for these values:
expert =100
advaced-high = 80
advanced = 60
advanced-low = 20
beginner = 10
researcher= 90
teacher=80
student=40
Say someone buys a book vith values "expert" and "teacher". Now could be that there are no other books with these values (the more dimension, the more likely this will happen). With our distance measures we can use some fuzzy logic and start computing witch books have the most similarity.
For instance: other "expert" books for "researchers" will score higher then "expert" books for "students"
You can use different algorythems to compute these scores. Buth a vary simple one is this:
y= (a-a1) x1 + (b-b2) x2 + (c-c3) x3 ...
where
a-a1= value on dimension 1 for the bought item - value on dimension 1 for the item we are compairing
x1 = value for dimension 1, x2 = value for dimension 2, ...
It looks difficult but it isn't. It more a matter of choosing dimensions and classes (beginner, experts, ...) + giving them a value + determining the importance of each dimension (a dimension like price, could influence y more just by giving it a higher value. Say price is x1 and role is x2, then you could give price a higher importance by giving setting x1=1 and x2=0.1)
You see ?
The most similar products are the one with the lowest Y.
It's not that hard to program, buth it takes some time to determine all the values and weights, and to try the best formulas.
Also, this is just a simple formula whith a few disadvantages (for instance, each dimension is treated as a separate thing, were it's most likely that most dimensions aren't independent (an auther could very well write more then one book for the same target audiance etc )
So, setting up a dataming-framework is probably to much overkill, be setting up some fuzzy logic matchingtool is fairly good to handle. And as long as you see it as a sort of "optimized navigation system" (optimized to sell!!), it will work for you
(Neural nets can help you modifying the values for the dimensions or for the distance measures, but that envolves an elaborated tracking system (who look at which product and was it suggested and does he by it etc) and some research)
krycek
12-23-2002, 04:13 PM
What everyone has said is correct, and also quite involved. You have to do some fancy stats to get things going.
Now, there is nothing wrong with that, but I thought that maybe I should point out an easier method,
When anyone buys something, you store their order in a database, OK? That is standard procedure normally. Now follow my logic here:
Let's say that Mr Customer buys a book, a DVD, and a console game from a selection of ten items. The item codes are stored with his order.
Now, when someone else looks at an item, ALL the orders are queried to see who has bought that particular item. So, someone looks at a DVD. The query finds our earlier order and says, "hey! someone bought a DVD... and they bought a book, and a console game too!"
...anyone see where I am going yet? ;)
Ten customers place orders, which contain DVDs. Three of those customers also buy books. Five of them buy console games too. The others buy other stuff :)
So, when someone views a DVD, our query asks the database for all the orders that have contained DVDs, and also sorts those orders by the most common items. So, the most common item that is bought with a DVD would in this case be a console game. Then, we say that we want to display the two most related items to that DVD, and our message is, "people buying this DVD also bought this console game and this book".
You get the picture.
Now, all that querying can get pretty hectic with a big database, which is why you should at some stage introduce more complex methods such as item ranking (making sure that data is effectively cached for most common queries) and suchlike.
However that then gets into the realm of the other posts, and I don't want to repeat what has already been said very well. :) I merely wanted to explain that there is a more simplistic way to do this, with an easy-to-understand concept.
It might help! :p
::] krycek [::
The approach you suggested is basically not practical. If you don't have large numbers of customers, the recommendations don't mean anything (if ten orders contain the item; if three of them contain other items -- a reasonable assumption, given internet purchase patterns -- then those three are going to be ranked. But I might have been buying the DVD for myself, and the other items for something else. Only when you get to large numbers (100+ orders for unpopular items; into the 1000s for populat items) do these variances dissappear). Before you get to numbers large enough for simple queries to make sense, the server load for running the queries will be far, far too high. Basically, it's not useful with small numbers, and not scalable to large numbers.
You would be better off spending the time and money promoting your site and getting more customers, than developing something like that.
redstormaudio
12-24-2002, 12:13 AM
Assuming you know php just take a look at the source for oscommerce (http://www.oscommerce.com) (it's GNU GPL).
Jim Morris
12-24-2002, 12:30 AM
Thanks everyone for all of the input, especially the link you provided to me redstormaudio. That site seems to have tons of neat functions for e-commerce sites like I have. I really appreciate it. Man, I don't think I've ever had this many responses to a thread. CodingForums ROCKS!!!!!!!!!!!
Jim:thumbsup:
krycek
12-24-2002, 01:24 AM
Originally posted by Kiwi
The approach you suggested is basically not practical. If you don't have large numbers of customers, the recommendations don't mean anything (if ten orders contain the item; if three of them contain other items -- a reasonable assumption, given internet purchase patterns -- then those three are going to be ranked. But I might have been buying the DVD for myself, and the other items for something else. Only when you get to large numbers (100+ orders for unpopular items; into the 1000s for populat items) do these variances dissappear). Before you get to numbers large enough for simple queries to make sense, the server load for running the queries will be far, far too high. Basically, it's not useful with small numbers, and not scalable to large numbers.
You would be better off spending the time and money promoting your site and getting more customers, than developing something like that.
hey hey hey! calm down! :D
I never said it was PRACTICAL :p lol
And I do agree with you.
However, what I suggested was a simple concept that can be built upon. I never said it was the best method, and I did hint at its impracticalities. However, I think it is important for people to understand what is going on, and tbh the replies before me, although totally correct and very good, were also a little above what the average beginner can understand.
Not knowing Jim's level, I didn't want to add to it at first, but when he said it was over his head, I thought I would explain a method that he would understand more easily and that would help him to explore what he could do. I would not base a system on that one method, it is not that easy. For a start, the number of queries grows exponentially in proportion to both the number of views AND the number of orders, and that is totally impractical for a large database.
I hoped that what I posted would make things a little clearer and give Jim something to experiment with. At the very least he may now know a little more about the concept, and he may be better placed to understand what he does actually need.
Sometimes you have to introduce a simple yet inefficient method as a starter, and then move on to more complex things :)
::] krycek [::
krycek
12-24-2002, 01:31 AM
Oh yeah and while I remember, two other things:
Firstly the method I described would take VERY LITTLE time and money to set up, because it is very simple. Therefore you could start it off and then when you get bigger change it for a better method (not my recommendation, but hey, people do what they want)
Second, most people that buy on the net are looking for a particular thing and buy that thing, so this kind of target marketing can be of debatable value except for certain kinds of product.
Oh yeah and because I can't count, ;) third, yes you may have bought an item for yourself and also something for someone else, but the probability of someone else doing a similar thing is equivocable to that of someone else wanting to buy those things for themselves. So the accuracy of the data is not a worry, because for small numbers of transactions it will just be a bit haphazard (even advanced methods have weaknesses so this is not necessarily a bad thing in comparison) and for larger numbers, like you said the stats would average out.
Yes a larger amount of transactions should utilise a different method. But I happen to know some successful sites that use the method I outlined, and they have low transaction volume and also are very pleased with the way the system works. (Oh yeah and I didn;t write it lol :))
::] krycek [::
krycek,
doen't want to start a public controverse here ("make code, not war", especially on christmaseve) but you shouldn't scare people away from my simple model.
realy, what i suggested isn't difficult and doesn't imply fancy stats (unless you have diffiulties with subtracting and multiplying:)
it's, like i said, more a question a manual labour.
the big difference with what you're suggesting (as startup-method?) is the granulity (that is, the size of the classes) + the computing. Now, all that computing at runtime isn't going to work very fast.
since you could use classes on each dimension, you could compute all the proximities (distance between two items) in advance, since the distance between two items is the same as the
distance between the two cells they belong to (think of it as a typical MDDB) and all you need to do at runtime, is select the "top 5 cells" and the items that belong to that cell.
A MySQL database shouldn't need a second to run that.
Example: say we have 3 dimensions with each five classes (values). Now you could build a 3d cube with these dimensions. Each cell in that cube, contains items with the same three values
(1 for each dimension). It's not that hard now to compute the distances between all these cell's. OK, it might take 10 seconds of computing time, buth who cares ?
The result should be a table with (WARNING : FANCY STAT COMING UP) 5x5x5 rows and 5X5X5X5X5X5 cels, that you can convert into a table with with at least 3 variabels:
CellID1, distance_to_CellID2, number_in_cellID2 (you can reduce the number of records to !125 - 125, but this will slow down runtime selections)
Now, when someone buys an items, you know the cell it belongs to.
with a query like: select * from distancetable where CellID1 = ProID order by distance_to_cellID2 desc
you can then retrieve to most nearby cells, and use that cellID to select all the items in that cell. (The number of items in that cell can help you to finetune your algorythem)
It could be quite accurate + it doesn't use a lot of runtime-resources + it's fairly simple + when sells realy go through the roof, you can use backward feedback tricks to have an automatic and
continues finetuning of your proximity values.
krycek
12-24-2002, 03:43 PM
heheh raf I think you totally missed my point :)
I was not trying to scare anyone away from your model, indeed yours is an ideal one to try after fiddling with mine. My method was merely intended to introduce people to the concepts involved. I think if someone gets the hang of my method they will be better placed to handle yours (let's face it, mine is extremely simple, and even though the maths in yours are not exactly complex, many people do not like maths ;))
I personally have to deal with complicated maths every working day, often involving fancy stats, so I am quite able to understand your method with no trouble and I agree it is a good and suitable method... so long as it is implemented correctly.
Like I said before, if someone understands enough that they can miss out my example, fine. Otherwise it may well be of some help in introducing the concepts, especially as it is easy to see what is going on.
Merry Christmas Everyone!
::] krycek [::
vBulletin® v3.8.2, Copyright ©2000-2010, Jelsoft Enterprises Ltd.