...

View Full Version : Most efficient way to update multiple rows?



johnnyb
05-04-2009, 04:50 PM
I have a PHP application with a database where lots of rows are updated quite often. Currently, I am updating the rows by sending a bunch of different UPDATE statements. Does anyone know if it would be more efficient to use a switch statement like described here (http://dev.mysql.com/doc/refman/5.0/en/update.html#c6075)?

Essentially, should I replace 500 queries like this:


UPDATE table SET row1=someval WHERE key=keyval;

with 1 query like this:


UPDATE table SET row1=CASE key
WHEN keyval THEN someval
WHEN otherkeyval THEN someotherval
... 500 times....
ELSE row1 END;

johnnyb
05-04-2009, 06:22 PM
After some experimentation & research, it seems that the single query is faster IF you use a where clause saying WHERE key IN (list,of,key,values) AND the number of rows being updated is small enough. On my local system I was able to update 10000 rows in around 1 second, however, when I expand that to 100,000 rows it takes almost 200 seconds.

In comparison, issuing 100,000 individual UPDATE queries takes about 15 seconds.

Fumigator
05-04-2009, 08:02 PM
Wouldn't this solution also require that the column gets updated with the same value for all 100k rows? Where the use of CASE WHEN would give you the flexibility to update with a different value for each of the 100k rows?

Old Pedant
05-04-2009, 08:38 PM
I think he means that he does


UPDATE table SET row1 = 'aaa' WHERE key = 11;
UPDATE table SET row1 = 'bbb' WHERE key = 33;
UPDATE table SET row1 = 'ccc' WHERE key = 57;
UPDATE table SET row1 = 'ddd' WHERE key = 85;
... etc. ...

That is, 100K keys with 100K *different* values.

So it probably makes lots of sense, especially if the key is a primary key. Each of those individual operations is about as simple a thing as the DB can possibly do, whereas the CASE WHEN almost surely involves the DB's creating some complex execution plan.

Fumigator
05-04-2009, 10:20 PM
I was referring to the single query WHERE key IN (x,y,z) which would only work if the column was being updated with a single value.

Old Pedant
05-04-2009, 10:28 PM
Oh, DOH on me. Yes, of course.

Yeah, not sure how he expected that to work.

johnnyb
05-05-2009, 02:14 AM
I'm back - and there's been some action!

I did mean this:


UPDATE table SET row1 = 'aaa' WHERE key = 11;
UPDATE table SET row1 = 'bbb' WHERE key = 33;
UPDATE table SET row1 = 'ccc' WHERE key = 57;
UPDATE table SET row1 = 'ddd' WHERE key = 85;
... etc. ...

And, you would think that it would be the fastest way to do things, however, it *seems* that it is faster to use a single UPDATE query with the more complicated CASE structure.

I'm guessing that this is because of the overhead involved with initiating 100k individual queries is worse than the single tricky query.

Old Pedant
05-05-2009, 02:23 AM
Ummmm...that's the opposite of what you said earlier.

I quote your earlier post:


when I expand that to 100,000 rows it takes almost 200 seconds.

In comparison, issuing 100,000 individual UPDATE queries takes about 15 seconds.

I assume you mis-measured the time, before?

Yes, there is a *lot* of overhead making a query from a server side language to a DB. Each one of the queries has to be sent across the "wire" (probably shared memory or an anonymous pipe) from the Web server to the DB server.

Can I ask a question? Is there possibly some way to do the UPDATEs algorithmically?? That is, "if the old value is between 17 and 33 then the new value is 27" or some sort of other rule-based system? If so, and if there are only a (relative) handful of such rules, then maybe doing all this in a Stored Proc would be by far the fastest?

johnnyb
05-05-2009, 02:43 AM
"Ummmm...that's the opposite of what you said earlier."

And that's what happens when I go out for the afternoon - I forget things!

At 10,000 rows the single query is faster, at 100,000 multiple queries is faster, however, it seems that 100,000 rows takes WAY more than 10 times as long as 10,000 rows. So, at 10k rows updated a single query is faster, but somewhere between 10k and 100k rows multiple queries becomes faster. I am guessing that at some point in there the dataset gets big enough that a disk has to be used as temporary storage, or, the CASE structure becomes too tricky and simply is slower than the overhead.

Unfortunately there's no algorithmic way to determine which rows need updating. It's recording the results of an outside process and is extremely random.

Fumigator
05-05-2009, 05:12 PM
I would guess the same exact thing, that at some point MySQL has to start caching to disk which slows things down exponentially. It may be worth your while to experiment until you find the sweet spot number of rows you can update with one query before caching happens, then run a few of those; could save you a few seconds.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum