View Full Version : Database Design - need advice

08-11-2006, 02:01 PM
I have a problem with our current database design and I'm exploring options, I've been exploring them for some time and I cant seem to find any suggestions that would improve our situation (other than more hardware). We're currently using myIsam tables but maybe something else would be better.

Our database currently has 2 million records in it which are being searched constantly throughout the day. At any point during the day our customers can upload their data which can be anything from a few hundred to a few hundred thousand records, which we process and insert. They can also include duplicates which we dont want.

The way we currently handle this is to update to a backup table then select distinct into another backup table then delete the current clients offers from the main table and insert the new offers from the backup table to the main table...
Which means in the case of our biggest client (in number of offers) their offers are offline for 5 minutes - time difference between the delete and the insert. Something which my boss has suddenly decided to be fuming about, despite me telling him about it long ago.

So what I've done is added a grouped unique index on all the fields that we search on and display on the website, this avoids the triple table route and the select distinct, inserts are a little slower but the time that a single client is offline is down to seconds rather than minutes - regardless of file size, but I dont know if this will make searches slower?
Also I still have indexes on the fields within the unique index - would that make things slower - should I remove them?

Or maybe there is some other option I havent thought about? The searches unfortunately are not just simple selects there are lots of DISTINCT, COUNT and MAX queries that run to narrow down the selection criteria - reducing the number of queries with 0 result but I frequently see mysql using 80+% CPU in multiple threads.

08-11-2006, 08:57 PM
Honestly I never trust anyone else's advice on performance. I set it up on my own development server with the exact same versions, servers, code, network configs, and everything I will be doing in production and see what I get.

If it's any help, this is similar to something I have written and the indexes don't cause any discernable delay in the other queries. Now I just check for duplicates before inserting. It takes longer to upload but it's a lot less stress. :)

08-12-2006, 01:48 AM
Adding a unique constraint won't affect selects (negatively) in the slightest. Depending on the selects, the implicit index might even make things faster. Inserts and updates though, will suffer, as there is a check done for compliance, which takes time.

As for the indexes, this depends on how you search. Anything that is commonly searched for (any combination) benefits from having an index on it. The downside, is that updates and inserts get slower (and more disk-space is used, but for your setup I can't imagine this will matter), as when a record is added or changed, the appropriate indexes are added/changed.

Does the data you get sent require processing before it gets inserted? If not, then doing a straight file import, rather than iterating over-rows and inserting, will be quicker.
If it does require processing, and the actual time it takes to perform the process isn't the key factor (and you have a spare machine) you could do whatever processing to the file (resulting on one or more final csv files) to then import directly. This would take longer, but put less strain on the database. This is perhaps a bit drastic for your current situation (even with hundreds of thousands), but perhaps something to think about if that starts to increase.

just noticed this bit

Now I just check for duplicates before inserting. It takes longer to upload but it's a lot less stress.

In what I assume is a hands-off process, checking for duplicates isn't necessary, just attempt the insert and let it fail.

08-12-2006, 08:47 AM
Writting to a CSV and using LOAD DATA INFILE is a possibility, My main concern with the selects is that I dont want my inserts and deletes to cause slowdown in the selects.
We already have indexes on the fields that are searched, with the grouped unique index, should I drop the existing indexes? Or leave them in? Will having both cause a problem?
A big problem with the site that we're seeing is mySQL consistantly taking up 60-90% CPU for a single user - which causes really high load problems. I've seen server load average get over 30. We need to get that down to 1 and I dont think thats going to happen.

08-12-2006, 10:55 AM
Basically, composite indexes aren't a replacement for single-column ones.
If you have, as a simple example, an index on (surname,firstname), then the combination will be indexed, but as a side-effect, so will surname. In this case, you wouldn't need (or want) an index on surname. Firstname, though, won't be individually indexed, and so if sometimes, you want to search on firstname, you'd want to add a separate index.

It's fairly long at 45mins, and I haven't watched it myself (I don't really use mysql very much) but have heard a lot of good reports on this video on tuning mysql that might help:

08-12-2006, 05:53 PM
Great vid GJay, learned a couple of things I didnt know - particularly that caching isnt turned on by default - though ours was and was set to a cache size of 16mb and a cache limit of 1mb.
I've increased them to 128M and 4M but I'm not sure what is appropriate - we have 4Gb RAM - how big should I let the cache get?
Theres no doubt that when the query cache is being used the site is lightning fast. But I dont want to bring the site to a crawl by making it too big.