Go Back   CodingForums.com > :: Server side development > MySQL

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 11-25-2012, 10:08 PM   PM User | #1
shoryuken
New Coder

 
Join Date: May 2012
Posts: 17
Thanks: 2
Thanked 0 Times in 0 Posts
shoryuken is an unknown quantity at this point
Architecture: Filters, counts and product searchs

I've been thinking about this for some time and thought this would be the best place to post since you guys know what you're talking about.

For a long time ecommerce sites have had "Filters" in the left nav. But recently, I've been noticing some of them have superquick "stackable" filters (e.g. checkboxes which you can keep on ticking) and a count next to the the number of results if you tick each box (e.g. http://www.zappos.com/womens-casual-shoes~94 - same with Amazon Ebay)

Now, they use ajax pagination to load only the main div rather than the whole page which will speed everything up obviously, not to mention they have some pretty hardcore hardware, but the output just seems crazy quick for everything it's counting in the background (< 0.5 seconds). Even though they have mega high traffic, I still think the number of possible combinations of filters seems to me that if a cache ID was implemented on the counts then hit rate would be very low because so many URLs would be unique and a lot of the queries would be raw. They would also need to expire cache in categories when products are added, I imagine.

So, does anyone have any idea what they are using for this? I'm starting to think it's not mysql + memcached... their stuff is just too fast.. can it be? especially with millions of rows?

e.g. They can't surely be doing one query for every count? e.g. "Slippers (1286)","Comfort (7068)", or are they?

OR perhaps they getting the current result set of product IDs using a 3rd party search tool like amazon A9 and then using dynamic language to parse the results and calculate the counts coupled with a global cache based on the URL?

Or are they using something totally different to mysql which I am not yet aware of?

Sorry to use specific examples of sites (I checked the rules and this seems OK) but just wanted to be clear what I was talking about.

Any input / ideas on how they do this, or the best implementation of this would be appreciated.

Thanks
shoryuken is offline   Reply With Quote
Old 11-25-2012, 11:04 PM   PM User | #2
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,188
Thanks: 59
Thanked 3,995 Times in 3,964 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
[quote]They can't surely be doing one query for every count? e.g. "Slippers (1286)","Comfort (7068)", or are they?[/qoute]
They could be.

But you know, they COULD be doing those counts in JS code. If the total number of items is in the low thousands, say, that would kind of make sense. Take the load off the server.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 11-25-2012, 11:47 PM   PM User | #3
shoryuken
New Coder

 
Join Date: May 2012
Posts: 17
Thanks: 2
Thanked 0 Times in 0 Posts
shoryuken is an unknown quantity at this point
Ah, thanks. I never considered JS - interesting idea. Looking at their code in the example I gave, the counts appear to be prepopulated in the original wget source so in this cause I guess it's server side.

I'm just trying to figure out how this can be done fast with 200k products, each with 50 odd attributes in a joined table with a regularly updated product record set.

I guess the third option would be to perform one mysqlquery, get all attributes for every current product then increment variable variables in the while loop and put the load onto PHP rather than MySQL.

I know there are various ways to achieve this, but if your target was speed without scaling out the hardware to read slaves during peaks how would you guys do it in the example I gave please?
shoryuken is offline   Reply With Quote
Old 11-26-2012, 05:22 AM   PM User | #4
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,188
Thanks: 59
Thanked 3,995 Times in 3,964 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Do all the counts ahead of time and just remember them? In a separate "counts" table?

You would presumably only need to update them when a product or product variant was added, no?
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 12-06-2012, 04:52 AM   PM User | #5
shoryuken
New Coder

 
Join Date: May 2012
Posts: 17
Thanks: 2
Thanked 0 Times in 0 Posts
shoryuken is an unknown quantity at this point
Sorry I didn't reply sooner, and for bumping a now old-ish thread, and thanks again. I have been thinking about your response and testing various things. I now think you are right in that the counts need to be precalculated and stored in a "counts" table, and then just re-calculated when an update occurs to a product involved in the counts.

The bit I'm having trouble getting my head around is how to calculate all of the counts efficiently, even on a nightly cron. Because the number of counts required grows exponentially with both the number of products and the number of count options. I believe there are millions of different counts in my example (Zappos) as they are stackable (i.e. you can select multiple options from each field).

Right now, I am caching on the fly with 3 hour cache after a cache miss. This causes data to be out of date sometimes, but when you get a cache hit its super quick - just not so quick on the cache misses. So I have inconsistent page load times. The DB servers could also be vulnerable to (D)DOS if someone threw rarely-used random parameters at it.

I need to take another step back to think about it. I wish I could find some more info on it... but I'd like to update this thread when I make progress.
shoryuken is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:42 AM.


Advertisement
Log in to turn off these ads.