PDA

View Full Version : Best way to manually increment a value?


snapplepitchcoc
05-14-2008, 01:54 AM
Hi!

I have a table that contains an auto_increment column as well as another column that I need to increment occasionally.

Specifically, my situation is as follows: I'm developing a simple message board with php and mysql. To boost performance, most content is stored in the same table, including post_id and thread_id. post_id is set to auto_increment. I want to increment thread_id only when someone creates a new thread. What is the best method to achieve this end?

Some options I am considering include:

- INSERT with a sub-SELECT and MAX
- ON DUPLICATE KEY UPDATE :confused:
- CREATE TRIGGER :confused:
- Use a stored procedure :confused:

When a user creates a new thread, my initial instinct is structure the insert statement like this:

INSERT INTO posts(thread_id, posted_on, user_id, thread_title, post_content)
VALUES ((SELECT MAX(thread_id)+1 FROM posts), NOW(), '$author', '$title', '$content')";

I haven't tested the above; I wanted to see what the pros think first. Is there a better, more logical approach? Is it likely that I will need to LOCK and UNLOCK the table for this query to ensure there is no accidental overlap if two new threads are created at the same time?

Thanks in advance.

Mike

snapplepitchcoc
05-14-2008, 02:19 AM
Okay; I ran some basic tests and revised my original query to this:

INSERT INTO posts(thread_id, posted_on, user_id, thread_title, post_content)
SELECT (SELECT MAX(thread_id)+1 FROM posts), NOW(), '$author', '$title', '$content';

The above works---but is it an ideal solution? I would prefer not to lock/unlock the table but I think I would have to to prevent overlap if two threads are created at the same time. If I need to lock/unlock the table, then I wonder if it would be easier to break the above into two separate queries?

Thanks again.

Mike

PappaJohn
05-14-2008, 02:47 AM
If you make thread_id and auto_increment field, you can eliminate your sub-query as the value will be incremented for each new record.

snapplepitchcoc
05-14-2008, 05:15 AM
PappaJohn, thanks for the reply.

Although it's not immediately obvious from the query above, there is another column in my table called 'post_id' and that field is an auto_increment field. To explain the table another way, all posts are unique, but some belong to the same thread.

As far as I know, it's not possible to have two columns in the same table that auto_increment. But I could be off base here, hence my post to this forum. :)

PappaJohn
05-14-2008, 05:26 AM
As far as I know, it's not possible to have two columns in the same table that auto_increment. But I could be off base here, hence my post to this forum. :)
No, you are correct.

snapplepitchcoc
05-14-2008, 07:35 AM
I did some more research on my question. From the manual:

"To ensure that the binary log can be used to re-create the original tables, MySQL does not allow concurrent inserts for INSERT ... SELECT statements."

http://dev.mysql.com/doc/refman/5.1/en/insert-select.html

Thus it seems that there is no need to use LOCK/UNLOCK. However, I still wonder if using two separate queries and LOCK/UNLOCK would be faster than relying on MYSQL to create a temporary table. It doesn't seem that way, but ya never know. Heck, I wouldn't be surprised if there is a solution that is more effective than the one I propose above.

In any eveny, I will continue to test independently and search the web for ideas.

chaosprime
05-14-2008, 06:24 PM
I suggest using a thread table with an auto_increment ID. When a thread is created, add a new row to that table and use the resulting ID in your post table.

snapplepitchcoc
05-14-2008, 06:49 PM
chaosprime, thanks for the reply.

my current architecture is set up like you suggest. generally speaking, i agree that a normalized architecture is more sensible.

however, my message board uses ajax to make the threads seem more like a chat room. I'm trying to find a way to "pre-join" the 'thread' and 'post' tables....my hope is that this will improve performance by reducing the number of table joins required for SELECT queries.

in the past ive gotten burned by painstakingly creating a normalized DB architecture, only to find that a simpler version is faster (even if it means some data is duplicated.) however, i'm not sure if such a change will net a similar improvement in this instance. i guess i will have to do some independent testing. :)

chaosprime
05-14-2008, 06:56 PM
I usually find that performance problems that can accompany normalization aren't entirely the normalization's fault; assuming adequate indexing and such, it's more when you start doing Big Hairy Joins on large tables that mysql grinds to a halt -- anytime mysql resorts to temporary table generation, it's bad news. Usually that can be worked around by decomposing operations up at the application level. Which isn't generally very hard. I find that I and others tend to assume that pushing work down to the database will make it happen faster, so we go to great lengths to be able to write a huge join to get exactly the final results we want, and it turns out that the huge join runs like a dog and we would've been better off just writing regular code.

Fumigator
05-14-2008, 07:10 PM
Wait a minute, I'm confused. You say your design includes a Threads table? Are you not inserting a row into your threads table when a user creates a new thread and thereby obtaining a new thread_id? Why are you creating the new thread_id when inserting into the Posts table? I'm missing something here.

chaosprime
05-14-2008, 07:23 PM
He's trying to eliminate the threads table from his schema so he can do less joining.

snapplepitchcoc
05-14-2008, 07:43 PM
that's right, chaosprime; i'm trying to simplify the architecture....even though it may be a fool's errand.

my next step is to benchmark various schemas against one another to see how they compare.

chaosprime
05-14-2008, 08:20 PM
Sounds reasonable. I actually do not think you need to do additional locking in order to do that INSERT-SELECT you have there, but it's sketchy as hell and probably storage engine dependent.

Fumigator
05-14-2008, 08:54 PM
K that clears it up for me... I'd be interested in your testing conclusions because I've never seen a performance issue with refactoring one table into two using a simple parent-child design. So it'll be interesting to see what you come up with. I would think you'll need to stuff your test database with a minimum of 500,000 rows to see any difference at all.

AFAIK The only issue when getting and inserting MAX(id)+1 without locking the table first is the possibility that two users hit the query at exactly the same moment and happen to pick up the same value. An rare occurance indeed, but you know what they say about events like this-- it will happen eventually :)

Also I may be wrong about this but I'm pretty sure MySQL will not allow a subquery on the same table you are inserting into so you'll have to query for max(id)+1 just prior to the insert.

(edit) I looked it up in the manual and it's true--

Currently, you cannot insert into a table and select from the same table in a subquery.

On this page:

http://dev.mysql.com/doc/refman/5.0/en/insert-select.html

snapplepitchcoc
05-14-2008, 09:07 PM
Fumigator, thanks for the comments. One thing I will be benchmarking is the affects of the architecture on the server (e.g., CPU usage, etc.) I think that's where the main benefits will show up, more so than comparing the speed of the queries. That's just my hunch, though. Again, this could be a fool's errand.

You're right that normally you can't "insert into a table and select from the same table in a subquery." However, my query in Post #2 seems to work okay; at least using dummy values inside of phpmyadmin.

//won't work because you can't sub-select from same table as insert, as far as i know:
INSERT INTO posts(thread_id, posted_on, user_id, thread_title, post_content)
VALUES ((SELECT MAX(thread_id)+1 FROM posts), NOW(), '$author', '$title', '$content')";

//seems to work okay:
//See post #2 above
INSERT INTO posts(thread_id, posted_on, user_id, thread_title, post_content)
SELECT (SELECT MAX(thread_id)+1 FROM posts), NOW(), '$author', '$title', '$content';

Fumigator
05-14-2008, 09:23 PM
I see the difference... interesting and supported by this from the manual:

The target table of the INSERT statement may appear in the FROM clause of the SELECT part of the query. (This was not possible in some older versions of MySQL.) In this case, MySQL creates a temporary table to hold the rows from the SELECT and then inserts those rows into the target table.

As you use a subquery, I assumed the "subquery rule" would apply but your query even using a subquery obviously falls under this rule.

As for benchmarking, wouldn't it stand to reason that the more CPU a query consumes the longer it takes to run? Though it's probably a more pure measurement of a query's performance so by all means :)