Architecture: Filters, counts and product searchs
I've been thinking about this for some time and thought this would be the best place to post since you guys know what you're talking about.
For a long time ecommerce sites have had "Filters" in the left nav. But recently, I've been noticing some of them have superquick "stackable" filters (e.g. checkboxes which you can keep on ticking) and a count next to the the number of results if you tick each box (e.g. http://www.zappos.com/womens-casual-shoes~94 - same with Amazon Ebay)
Now, they use ajax pagination to load only the main div rather than the whole page which will speed everything up obviously, not to mention they have some pretty hardcore hardware, but the output just seems crazy quick for everything it's counting in the background (< 0.5 seconds). Even though they have mega high traffic, I still think the number of possible combinations of filters seems to me that if a cache ID was implemented on the counts then hit rate would be very low because so many URLs would be unique and a lot of the queries would be raw. They would also need to expire cache in categories when products are added, I imagine.
So, does anyone have any idea what they are using for this? I'm starting to think it's not mysql + memcached... their stuff is just too fast.. can it be? especially with millions of rows?
e.g. They can't surely be doing one query for every count? e.g. "Slippers (1286)","Comfort (7068)", or are they?
OR perhaps they getting the current result set of product IDs using a 3rd party search tool like amazon A9 and then using dynamic language to parse the results and calculate the counts coupled with a global cache based on the URL?
Or are they using something totally different to mysql which I am not yet aware of?
Sorry to use specific examples of sites (I checked the rules and this seems OK) but just wanted to be clear what I was talking about.
Any input / ideas on how they do this, or the best implementation of this would be appreciated.