...

View Full Version : Sorting by joined table



Keleth
04-18-2009, 05:07 AM
I was hoping someone could help me figure out the syntax for something.

I'm developing a rudimentary forum, as much for learning as use. I want to display all threads from a forum by last post in that thread. My current table has no time information except for in posts (normalized), so the simple thing in words is to select from the threads table, joining on the posts table sorted by date. Is the syntax that simple?


SELECT (columns) JOIN LEFT `posts` ON `threads`.`id` = `posts`.`threadID` FROM `threads` (where statement) ORDER BY `posts`.`date`

Does that make sense for what I'm trying to do?

Old Pedant
04-18-2009, 06:59 AM
Well, first, let's make the code readable in the forum:


SELECT (columns) JOIN LEFT `posts`
ON `threads`.`id` = `posts`.`threadID`
FROM `threads` (where statement)
ORDER BY `posts`.`date`

Ummm...no...not even close...badly scrambled order.

Try this:


SELECT (columns)
FROM `threads` LEFT JOIN `posts`
ON `threads`.`id` = `posts`.`threadID`
WHERE [ONLY use fields from threads table here!]
ORDER BY `posts`.`date`

As noted there, your WHERE clause CAN NOT contain any field from the posts table. If it does, you just converted your LEFT JOIN into an INNER JOIN.

Now, that might actually be okay. I'm more than a little curious as to why you think you NEED a LEFT JOIN. Would you *really* have a record in the threads table that had NOT EVEN ONE POST in the posts table???? That's the only reason to use the LEFT JOIN there.

Now that I read what you are after, though, I think this will not work at all.

This will give you *ALL* posts in *ALL* threads. And you just want the information from *ONE* post--the latest one in a given thread, yes??

So let's throw that all out and start over:



SELECT T.field1, T.field2, T.fieldN, MP.latestPost
FROM threads,
( SELECT threadID, MAX(`date`) AS latestPost
FROM posts
GROUP BY threadID ) AS MP
WHERE threads.ID = MP.threadID
ORDER BY MP.latestPost DESC

I think that makes much more sense and is what you want.

Keleth
04-18-2009, 07:26 AM
Thanks for the info... I think its obviously, I'm pretty new to databases, hence why I wasn't aware the order was so important. I'll work on remembering that.


Would you *really* have a record in the threads table that had NOT EVEN ONE POST in the posts table?

While I get the code you posted, that question threw me off a bit on the logic train.

My thread table contains only 3 columns, id, title, and parentID (whatever forum its parent is). The posts table contains id, posterID, threadID, title, body, date. So why would I not use a join? For a thread to exist, there must be a post in it, so... maybe I'm missing something. Isn't that the purpose of a join? Where some data pertinent is contained in another table, which can be extended onto the current table? Since I'm left joining, with threads on the left, why would I get all posts? Wouldn't I get all threads with one post joined on each?

Old Pedant
04-18-2009, 08:23 AM
No no...you do want a JOIN!

But you only need an INNER JOIN, not a LEFT JOIN (also known as a LEFT OUTER JOIN).

When you do


SELECT ...
FROM tableA LEFT JOIN tableB ON ...

it tells SQL that you want *ALL* records form tableA, *even if there are NO matching records in tableB*!!

An example might be a list of classes offered at a college and a list of students signed up for them. If professor Dinky is a crappy teacher and no student signs up for his class, if we did


SELECT classes.prof, students.name
FROM classes LEFT JOIN students ON classes.classID = students.classID

Then we would *still* see Prof Dinky, even though there were no students in his class.

If you do


SELECT classes.prof, students.name
FROM classes INNER JOIN students ON classes.classID = students.classID

then you wouldn't see Dinky. INNER JOIN means "only show records where there *ARE* matches".

There are two forms of INNER JOIN: The one I showed just above, using the keywords INNER JOIN and an (older, predates SQL 92 standard, but still legal) "implicit join" (my choice of words) that would look like this:


SELECT classes.prof, students.name
FROM classes, students WHERE classes.classID = students.classID


I chose to use the "implicit" form of inner join in the code I gave you, but the following code is *100% SEMANTICALLY EQUIVALENT*:


SELECT T.field1, T.field2, T.fieldN, MP.latestPost
FROM threads INNER JOIN
( SELECT threadID, MAX(`date`) AS latestPost
FROM posts
GROUP BY threadID ) AS MP
ON threads.ID = MP.threadID
ORDER BY MP.latestPost DESC


Last point: Generally, database engines can more efficiently perform inner joins than outer joins. (Not always true, but I don't think an inner would ever be slower, at least.) So unless you need an outer join (and there are also RIGHT OUTER JOINs and even FULL OUTER JOINs), stick with the simpler inner joins.

Does that help?

Old Pedant
04-18-2009, 08:26 AM
Also, you asked

Since I'm left joining, with threads on the left, why would I get all posts? Wouldn't I get all threads with one post joined on each?

No, not at all. You would indeed get all threads, but you would *ALSO* get *ALL* posts. A LEFT JOIN doesn't in any way limit the number of joined records. It *ONLY* guarantees that the fields from the left table will show at least once.

Keleth
04-18-2009, 04:34 PM
Ok, so my issue is not that I choose join, but that I choose an OUTER JOIN instead of an INNER JOIN... I'm not trying to vindicate myself, I just wanna make sure I'm following the logic here. That helps, thanks a lot.

Old Pedant
04-18-2009, 10:17 PM
No need for vindication! You just had learned only part of the story about outer joins, so of course that misled you a bit.

But in this case it's not really the difference between INNER JOIN and LEFT JOIN that matters. Your problem here is that you *DO* want ONLY one record from the right table: the one with the MAX(`date`) value.

And I just realized that *I* way over-complicated the answer!

If the *only* piece of info you need from the THREADS table is indeed that MAX(date) value, then it's really dirt simple:


SELECT T.field1, T.field2, T.fieldN, MAX(P.`date`) AS latestPost
FROM threads INNER JOIN posts AS P
ON threads.ID = MP.threadID
ORDER BY P.latestPost DESC

No need for that "inner SELECT" that I used.

I was over-complicating things because I was thinking of a typical forum thread display (such as in this very forum!), where not only do you show the basic thread info, you *also* show the name of the most recent poster and the original poster.

So let's add another table to the mix: Users.

And let's say you have these fields in your tables:


threads
threadID int primary key
forumID int references forums(forumID)
title

posts
postID int
threadID int references threads(threadID)
whenPosted datetime
userID int references users(userid)
message varchar(8000)

users
userid int primary key
username varchar(50)

And now you want to show a list of the 20 most recent threads, indicating all of the following:
-- thread title
-- date/time of first post
-- username of person who made first post
-- date/time of most recent post
-- username of person who made most recent post

Okay?


SELECT MP.title, MP.whenFirstPost, U1.username, MP.whenLastPost, U2.username
FROM (
SELECT T.threadID, T.title, MIN(P.whenPosted) AS whenFirstPost, MAX(P.whenPosted) AS whenLastPost
FROM threads AS T INNER JOIN posts AS P ON T.threadID = P.threadID
GROUP BY T.threadID, T.title
) AS MP
INNER JOIN posts AS P1
ON MP.threadID = P1.threadID AND MP.whenFirstPost = P1.whenPosted
INNER JOIN users AS U1
ON P1.userID = U1.userID
INNER JOIN posts AS P2
ON MP.threadID = P2.threadID AND MP.whenLastPost = P2.whenPosted
INNER JOIN users AS U2
ON P2.userID = U2.userID
ORDER BY MP.whenLastPost
LIMIT 20

...


WHEW! I *think* that is right. And it is the direction I was heading in with my original post, but I just over complicated things too early.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum