...

View Full Version : Which model to base database structure on for a directory



Nightfire
08-04-2011, 08:45 PM
Hi, I've been reading for a while on different types of database structures, such as adajacency, nested sets, etc and not sure which is the best or ideal to go for.

The directory I'll be making will be huge, loads of categories and subcategories and subsubcategories, so wanting to keep as much resources free as possible. I read that recursion isn't good for PHP as it keeps it all in the memory (or something like that). Last thing I want is a dead slow site while all the processing is going on.

What structure would you suggest would be ideal?

Old Pedant
08-05-2011, 02:46 AM
My person favorite is one that I call the "path" method.

Where, basically, you actually *do* store the full "directory path" for a record in each and every record.

Except, of course, you do it by ID.

Example:


id : parent : name : path
1 : 0 : main : 0001-
2 : 1 : sub1 : 0001-0002-
3 : 1 : sub2 : 0001-0003-
4 : 2 : sub1-ss1 : 0001-0002-0004-
5 : 2 : sub1-ss2 : 0001-0002-0005-
6 : 4 : sub1-ss2-sss1 : 0001-0002-0004-0006-

And now you can do most anything you want using the "path".

Just for example, you can get all records in usual nested tree order via simply


SELECT * FROM table ORDER BY path
1 : 0 : main : 0001-
2 : 1 : sub1 : 0001-0002-
4 : 2 : sub1-ss1 : 0001-0002-0004-
6 : 4 : sub1-ss2-sss1 : 0001-0002-0004-0006-
5 : 2 : sub1-ss2 : 0001-0002-0005-
3 : 1 : sub2 : 0001-0003-

But you can also find all children of *ANY* node/record via something like this:


SELECT t2.*
FROM table AS t1, table AS t2
WHERE t1.id = ###
AND t2.path LIKE CONCAT( t1.path, '%' )
ORDER BY t2.path

If, for example, you substitute in 2 for the ### there, you would get this:


2 : 1 : sub1 : 0001-0002-
4 : 2 : sub1-ss1 : 0001-0002-0004-
6 : 4 : sub1-ss2-sss1 : 0001-0002-0004-0006-
5 : 2 : sub1-ss2 : 0001-0002-0005-

(To exclude 2, itself, you simply add AND t1.id <> t2.id to the WHERE)

And you can similarly easily find all parents of any child.

And so on.

The trick, I'm sure you can see, is to pad the ID's with zeros (or spaces) all to a constant length when building the PATH. The delimiters (I used dash there) are optional but make it easier for humans to follow. And of course you don't have to just use numbers for the id's and/or you could easily compress the path values. (Though neither is worth it unless you are talking HUGE "trees" and even then...)

Now...

The big disadvantage: The system is not good for dynamic trees. Oh, it can be adapted if the tree isn't too dynamic, but it's best for something like a threaded message forum, where messages are added but never removed.

bullant
08-05-2011, 03:02 AM
I used to use the adjacency list model , which has its disadvantages, for storing hierarchical data in databases but after I read up on and played with the nested set model life became so much easier and I have been using it ever since when required (mainly e-commerce databases)

Once you get your head around the fact that the left and right values are used to simply store the location of the item in the hierarchy and if you are reasonably skilled in sql then the nested set model is fairly straight forward.

Deleting nodes and their children becomes very straight forward and moving a node and its children to another location in the hierarchy requires just resetting the left and right values. Adding new nodes wherever you like in the hierarchy is straight forward by just simply adding a record to the table and setting appropriate left and right values to the new node and adjusting the left and right values of the "downstream" nodes accordingly.

Old Pedant
08-05-2011, 03:36 AM
Yep. If you have a highly dynamic structure, clearly nested set is superior to my "path". But you can't beat "path" for efficiency of queries *IF* you have static data. I do recognize that's a huge "IF".

bullant
08-05-2011, 03:47 AM
Yep. If you have a highly dynamic structure, clearly nested set is superior to my "path". But you can't beat "path" for efficiency of queries *IF* you have static data. I do recognize that's a huge "IF".

yep, that's a very big IF.

Off the top of my head I can't think of a real world situation where a hierarchical data set would always be static and never change.

Old Pedant
08-05-2011, 03:53 AM
Threaded message forums. I first "invented" the technique first time I created a forum. (Of course, I then found out that it had been discussed on SQLTeam.com not more than six months before. Oh, well. Embarrassing.)

A fully threaded forum is what you typically used to see. I don't know why they aren't popular any more. I grew up on USENET, not surprisingly, which is of course fully threaded; and I really much prefer it to "flat" forums like this one. But maybe there are so many flat forums just because the coders couldn't figure out a good way to write threaded ones. Or maybe most people find threaded forums confusing? Whatever. (See, on a fully threaded forum, people uninterested in this sub-topic would skip the sub-thread. This thing requires people to read every message in a thread.)

(And there are other situations. Massive Bill-of-Materials kept only for historical purposes. I remember when BOMPs could chew up half the time on old systems with only a few MB of memory. But I'll grant you that static hierarchies are not as common as dynamic, by a long shot.)

Nightfire
08-05-2011, 02:34 PM
Thanks for that :)

I looked into the nested set, I think I got my head around it. Just looks a PITA when it comes to adding a new subcategory by having to renumber everything after that again.

I'll try some experimenting between the ones suggested and see what I find best. Thanks again

bullant
08-06-2011, 12:37 AM
I looked into the nested set, I think I got my head around it. Just looks a PITA when it comes to adding a new subcategory by having to renumber everything after that again.


Maybe use this quick and simple demo to insert a node as a guide.

This demo inserts a sub category called "Plastic Plates" below an existing category called "Plates".



SELECT @myRight := fldRgt FROM tblcategory
WHERE fldCatName = 'Plates';

UPDATE tblcategory SET fldRgt = fldRgt + 2 WHERE fldRgt >= @myRight;
UPDATE tblcategory SET fldLft = fldLft + 2 WHERE fldLft >= @myRight;

INSERT INTO tblcategory(fldCatName, fldLft, fldRgt)
VALUES ('Plastic Plates', @myRight, @myRight + 1);

Nightfire
08-06-2011, 12:51 AM
Oh nice, that made it look much simpler than I thought it would be. Thanks for that

bullant
08-06-2011, 01:20 AM
you're welcome :)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum