View Full Version : Relationships
krycek
12-16-2002, 02:39 PM
OK I have a question:
MySQL 4 supports better relations when you use an InnoDB table. However, I am not sure how to go about this properly.
I currently use key-based relations which I handle myself, for instance if I have user comments, I have a field UserID in the comments table, which equates to the ID column of the users table.
What I am wondering is, is there a better way (or another way that I could play with) of doing this?
To display the actual username of the comment poster, I have to use two SELECT queries: one to get the comments, and one to get the user for each comment, based on the field in the comments table. This means that I do a SELECT, then loop through the results doing a SELECT for each row.
I am sure that I can set up a proper relationship in the actual MySQL tables; what I am envisaging is that upon retrieving the comments table, the users will automatically included, maybe as sub-arrays...? I am not sure how to do this.
Obviously the benefits are that I have a better database structure, and can minimize code and processing time, and also I will only be running one query instead of n+1 where n is the number of comments.
Any help on this plus any links to any tutorials, articles etc. on this specific subject (I am by no means new to MySQL, or database theory... I have just never implemented proper relationship structure in MySQL) would be much appreciated! :D
::] krycek [::
What you need is essentially a hack for multi-keys that the bigger databases to automatically. This is a fairly standard datastructure when you have 0+ records in a child relating to a specified parent table.
Assuming that comments are tied uniquely to users, then your comments table should use UserID and CommentID combined as a key field. In the comments table, either field can be repeated, but there can be only one of each combination (I hope this makes sense; it's a concept I never seem to explain clearly). MS SQL does this by creating a hidden key, that the user never sees and enforces the integrity properly. In MySql, you have to do those things in the application, rather than the database.
Your user table should contain the UserID as a key, and the rest of the data. Your Comments table should contain the UserID, the CommentID (both indexed) and a key field, that you will prbably never use.
When you want to use the comments for a user, you specify the UserID, return the UserID, the CommentID and the comment itself. You then loop through the results to present the comment lines. The biggest issue is usually how do you handle a null value.
The comments loop can be embedded in a larger loop that reports on different users.
Doing a joint query is possible. Be careful with this, because if you get the links wrong, it makes for messy results. The simplest syntax is something like:SELECT u.*, c.* FROM users AS u, comments AS c WHERE UserID="a valid value" AND u.UserID=c.UserID;(It's usually quicker to specify the most limiting clause first on your select queries -- most databases create dummy tables for each clause, so you want this table to be a small as possible).
The problem with this query is that each line contains duplication of the user's data. This can be worked around at the application level, but if you're grabbing a lot of data, it might not be a good idea.
krycek
12-16-2002, 04:11 PM
hmmm I read your post carefully kiwi, and thanks for taking the time to reply, but to be honest you *appear* to be doing the same as what I am doing currently, but for some reason using an extra field :confused:
In my data structure, I have the following setup:
--- comments table ---
ID
Comment
UserID
--- users table ---
ID
Username
Of course, I actually have more than that stored but those fields will do for the sake of our example.
I use the comments -> UserID field as a unique key relating to users -> ID. Therefore I can just query the database for the user whose ID matches that on the comments, and get the user's data. I can then use the row returned to display the username, post count, a link to profile etc.
I don't see the need to have a separate UserID field in the users table, when the unique ID will do... or are you saying that in your example, you are calling my ID field UserID, in which case they are one and the same? Whatever way it is, I am confused as to why I should need ANOTHER key... and what it should do... unless I have misunderstood what you said.
I am interested in your line of SQL however I am not sure that it would do what I require. What result would your code obtain and how would it address my problem?
Finally, I started my initial post by pointing out that MySQL 4 supports relationships inside the database definition, hence your remark about having to deal with it inside the application becomes false, and it is precisely that reason why I asked in the first place :) I have seen the specs for MySQL 4 and am using RC2 at the moment, however there is a lack of docs for the new relationship features and I was hoping that someone knew about it.
I know I could probably streamline the method of querying that I outlined (although because of the specifics of the current queries I am working on it has so far proved more efficient to loop queries than use a joint query) and I am interested in any suggestions, however I am most keen on learning about the new relationship features of MySQL 4 and whether they will allow me to do all this within MySQL instead of just the application.
::] krycek [::
In my design, the comment numbers can be organised for each user. You can then have User 001, comment 001; user 001 comment 002; user 002, comment 001; user 003 comment 001. Each refers to a unique record.
It tends to produce a more flexible structure (especially for larger designs). As I said, most high-end databases do this in the background. With MySql, you have to do it at the application level.
krycek
12-16-2002, 06:13 PM
hehehe kiwi are you reading my posts?
My design currently works fine and is scaleable, I am not sure why you think yours is better because as far as I can tell it is the same but more longwinded, for no apparent reason.
In my system, all comments and all users are obviously unique. The link is the UserID field in the comments table. You can find the author of a comment by querying the users table for the UserID of that comment, and similarly you can find all the comments by that user by querying the comments table for all comments with that users ID.
This method works perfectly and is an ideal, scaleable method with no problems. Is there actually a difference between your method and mine? Please explain if so. Otherwise I will have to assume that your method is the same but not as streamlined. (why, I don;t know... the method I use is a very simple, commonplace method).
Now please read what I have already stated twice:
In MySQL 4 there is support for defining relationships within MySQL itself and not having to do this in the application
So please stop telling me it isn;t possible! I am simply wanting to talk to someone who KNOWS about this new feature... if you don't know, fine, but please don't keep telling me something that is incorrect!
::] krycek [::
The innoDB has implemented foreign keys, meaning that mySql is actually a genuine relational databse. If that's what you're referring to, you need to add a constraint to the field in a create or update table command:FOREIGN KEY (UserID) REFERENCES users(UserID)It's normally sensible to add an ON DELETE SET NULL or somesuch in as well.
Given that it's straightforward SQL, I assumed that you were fully aware and already doing this, which is why I skipped over it.
I may be wrong about multi-keying in innoDB tables, but it has not been implemented on mySql in the past. As far as any of the documentation I have seen goes, it has not been implemented on the new tables.
The primary advantages of the approach I suggested are not recognised in mySql (oracle and MS SQL, for example, puts the dual keys into the clustered index, which allows for faster searching than a normal index).
krycek
12-16-2002, 08:03 PM
I think that's along the right lines, i.e. it sounds like what I am after.
However my attempts so far have failed to work... also I am trying to use cascade deletion rather than set null, but agin no success.
How could I solve my initial problem using this approach? i.e. how could I set up my DB so that it will return, upon using a single query, all the comments with user information as a subset of that? and would it handle the relationships automatically.
or am I better off sticking to my 'manual' method.
::] krycek [::
I'm not great on the delete rules -- I usually left that a DBA. I'm not sure how well mySql supports ON DELETE. You have to be a little careful with them, to make sure that the cascade delete operates correctly.
As I said, you can't produce them as a subset in the strict sense with a single query (select queries return uniform lines). A query with an implied join is your best option (the one I posted earlier is an example). You might want to add a GROUP BY clause to the end of it, which ensure that the records will come out in the order specified. If you're getting a lot of users with a lot of comments, then you'd probably be best to do it seperately.
I think it can be done in sqr, but I'm not sure on that and it's not really relevant. That sort of splitting needs to be done in the application engine.
Oh, one thing I missed earlier. UserID is not a new field: it's the primary key form your user table -- whatever that may be. I tend to add the table name to field names, because it makes this sort of task much easier to work out what's happening.
krycek
12-17-2002, 02:51 PM
Thanks kiwi, that was indeed what I was on about. :thumbsup:
Although I have done this with other versions of SQL, for some reason I am having problems doing this properly in MySQL. Can you quickly explain the following, with a couple of SQL statements to illustrate...?
1. How would I return an array, using one single SQL statement, that contained ALL comments and the user details for each comment attached to them? Using your first method, and also the foreign key method.
2. How do I properly set up a foriegn key so that I can do the above?
3. What kind of format will the data be returned in...? Would the user info be as subarrays of the comment info...?
Thanks for your time kiwi :)
::] krycek [::
Originally posted by krycek
1. How would I return an array, using one single SQL statement, that contained ALL comments and the user details for each comment attached to them? Using your first method, and also the foreign key method.The select for the two methods are identical (the difference is behind the scenes).SELECT users.*, comments.*
FROM users AS u, comments AS c
WHERE userid="a valid value"
AND u.userid=c.userid;This will dump all the data from both tables. The details for a user will be dumped for each comment they provide.
2. How do I properly set up a foriegn key so that I can do the above?The obvious things are making sure that the two fields are compatible data types (I'm not sure what datatype mysql uses for autonumber -- I think it's an integer, but I don't know the length; it doesn't seem to matter). Anyway, you then need to add the FOREIGN KEY constraint to the data definition for the child table (comments(userid)). You can do this with an ALTER command or when the tables created. In either case, you need to index the foreign key before you add the constraint (other dbs do this automatically and, for some reason, mySql doesn't and fails to report an error).
Something likeALTER TABLE comments
ADD INDEX userid(userid),
ADD FOREIGN KEY userid
REFERENCES users(userid)
ON DELETE CASCADE;should work. Apparently, this will only work in mySql 3.23.50 or later. There's also a problem with ALTER TABLE removing all foreign key constraints from the table (which sounds messy to me).
3. What kind of format will the data be returned in...? Would the user info be as subarrays of the comment info...?A single array, with each line including all the fields from both tables. Where a user has more than one comment, the details from the user table will be duplicated. For this reason, it's usually sensible to chose the fields to be returned.
A query is essentially a temporary table (literally, in most implementations). So always going to be a simple array -- fields for the columns and records for the rows.
krycek
12-17-2002, 03:56 PM
thanks kiwi... that makes a lot of sense :thumbsup:
I am assuming that any fields which have the same name in both tables will cause a conflict if both are requested?
i.e. would the ID from the users table overwrite the ID from the comments table? (In the data that is returned by the query)
I know I should still be able to access the fields by index, but obviously referencing by key is preferable.
So... may I ask... what would you think would be the best method...? Querying the comments and then looping through, and getting the user details for each comment, or else doing the whole thing with one query? (By the way, I am still not completely sure as to the difference between your method that uses a foreign key and your method that doesn't!)
::] krycek [::
The best method (in terms of minimising the data returned) would be to run two queries -- one on the users table and one of the comments table. Sort both by the user ID, then step through the user records. At each step through the user table, check if the comment user matches. If it does, get the comment info and increment the comment results (loop on this until the comment id doesn't match). When the comment user id and current user id don't match, if the comment's less than the user id, increment comments, otherwise increment the user id and check again if they match.
It might be useful to add a number of comments field to the user table (or you could generate this dynamically when you query the user records). This will provide a check-bit for the results.
The method I suggested at the start builds a composite primary key out of two fields, rather than just one. This is useful as it allows you to navigate the comments results more cleanly. By allowing you to sequentially number the comments for each user, you then have a simple check to see if you're at the start of the user's comments or not and (if you get the number of comments) if you're at the end of the comments for a user or not.
The only other approach I'd consider for doing this is to query the users once. Then as you iterate through the users table, query the comments table for each user. This is only a useful approach if your not looking at all the users and if there are a lot of comments for each user. Technically it makes for an easier algorithm, but the time for the queries make it a far less efficient approach.
krycek
12-17-2002, 04:34 PM
hmmmm I think I would prefer this method:
- grab all comments for current page
- from those comments, grab all userids and put them into a query (making sure that there is no duplication)
- query the users table for all users in that list
- loop through the comments and display each comment and the attached user info which has already been found
that way, there are two queries only, minimal data transfer and the whole thing is efficient.
very similar to your suggested approach but I believe a little more efficient...?
::] krycek [::
I was asuming that the comments were driven by the user. If this isn't the case, then there could be a lot of different ways to approach it.
If you have a seperate query that identifies the records you like, then it might be worth using an inner join with a DISTINCT subquery for the user ids. This will automatically identify the users, without any problem of duplication.
The subquery will look something like:SELECT DISTINCT userid
FROM comments
WHERE #your criteria for the comments
;The pop a join in between the user table and the comments.userid from thsi subquery. I'm not 100% sure on the exact syntax from that, but play around with it and you'll get it right. AT a guess, the whole query will look something like:
SELECT *
FROM users
WHERE userid = (SELECT DISTINCT userid
FROM comments
WHERE # your criteria
);
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.