PDA

View Full Version : Non-normalized approach for online forum DB?


snapplepitchcoc
07-18-2008, 06:27 PM
I am building an online forum. I've found that performance improves if I use one table for the thread title, author and number of replies and a separate table that contains the thread title, author, respondents and thread content. In other words, my current database has some duplicate data.

Creating a new thread requires two inserts: one into the 'Title' table and one in the 'Content' table. Replies require me to update the number of replies in the 'Title' table and add the content to the 'Content' table. I like this approach because it simplifies the SELECT statements. I anticipate that SELECTS (i.e., viewing pages) will be more common than INSERTS (i.e., posting a reply) so it seems like a good idea. The additional overhead to INSERT into two tables (e.g., I have to lock and unlock the tables) isn't too noticeable since the amount of data to be inserted is small.

I have reference keys in place in case content needs to be removed from the database.

Does this approach sound reasonable to you?

Thanks in advance!!

bazz
07-19-2008, 04:59 AM
Any time duplication is performed, it's pretty likely, there is something wrong.
I would suggest:

take time to get the 'more difficult' queries done and most of your duplication can be removed.
Replies which require you to do something to other tables, says to me that something is wrong there too.

If you have it set up correctly, all that should be automated.

bazz

oesxyl
07-19-2008, 05:13 AM
Any time duplication is performed, it's pretty likely, there is something wrong.
I would suggest:

take time to get the 'more difficult' queries done and most of your duplication can be removed.
Replies which require you to do something to other tables, says to me that something is wrong there too.

If you have it set up correctly, all that should be automated.

bazz
agree, :)

Creating a new thread requires two inserts: one into the 'Title' table and one in the 'Content' table. Replies require me to update the number of replies in the 'Title' table and add the content to the 'Content' table. I like this approach because it simplifies the SELECT statements. I anticipate that SELECTS (i.e., viewing pages) will be more common than INSERTS (i.e., posting a reply) so it seems like a good idea. The additional overhead to INSERT into two tables (e.g., I have to lock and unlock the tables) isn't too noticeable since the amount of data to be inserted is small.

I have reference keys in place in case content needs to be removed from the database.

Does this approach sound reasonable to you?
no, :)
as bazz said, you can use more difficult query or you can create views to optimize some frecvent query.

http://dev.mysql.com/doc/refman/5.0/en/create-view.html

database normalization keep the data safe against update, insertion and delete anomalies

http://en.wikipedia.org/wiki/Database_normalization

regards

snapplepitchcoc
07-19-2008, 07:12 AM
Thanks for the replies.

I will conduct some more formal comparisons using different architectures before proceeding.

guelphdad
07-19-2008, 04:38 PM
with your data properly normalized and the tables correctly indexed you also have to properly tune your server for optimal results. Mysql can easily handle millions of rows of data, people have used mysql for applications storing terabytes of data. spend the time normalizing your data rather than duplicating data.