PDA

View Full Version : Very ugly huge join problem


pysan
05-03-2005, 05:23 PM
Hi all, I have thought through this problem and just can't seem to find a solution that will not kill the resources among other things... This is a bit complex of a problem, but I will try to diagram it out as simply as possible.

Basically I have 6 tables A, B, C, D, E, F, G related as follows:
A->B, C->B, D->B, B->E, E->F, G->B

The problem I'm having is how I need the info displayed from these tables...
I have at least one field needed from each of the tables in the returned data, so that requires all the tables, with G being an optional one, so I currently have it under a LEFT JOIN from the rest, and it is working fine. A simplified query for what I am using is

SELECT DISTINCT * FROM A, B, C, D, E, F LEFT JOIN G ON G.BID=BID WHERE AID=B.AID && C.BID=BID && D.BID=BID && B.EID=EID && E.FID=FID && C='value'

I then have 3 loops in PHP as there are multiple needs from those tables
Multiple F records belong to one B record, with multiple B records belonging to 1 A record. Now that much is happening just fine, and I have what I want, except for one thing. The problem is that I need a few additional A records in between the A's I have, meaning I need a few additional records from A that are only from tables A, B, C, rather than all of them, as the rest are. So because they don't need them all, my query doesn't pull them for me. I tried playing with further LEFT JOINs but that just made a mess of it as it returned a ton more records than I need? Now I realize that perhaps subqueries would make this much easier, but I am limited because the server this needs to run on only has MySQL 3.23, and my test server has 4.02 or something, so that doesn't suppport them either.

If there is not a way to pull the required data from that single query, then the only other way I can think of doing it is to pull my primary list from A,B,C and then in the primary outer while loop run the smaller query each time through the loop. But that would take me from 1 db query hit per page to about 20+ queries against the DB, which just doesn't seem like that elegant, or resource friendly, of a solution.

If anyone needs any further clarification I would be glad to try to explain it better, but I have tried to make it as easy as I could, but it is a little difficult for even me to understand...

Thanks in advance!

P.S. for those who want the full deal, here is the actual query I am using:
SELECT DISTINCT ageTitle,TIME_FORMAT(ageStartTime,'%I:%i%p') AS ageStartTime,TIME_FORMAT(ageEndTime,'%l:%i%p') AS ageEndTime,sesID,sesTitle,sesDescription,SessionTitleName,sspModerator,perID,perLastName,perFirstNam e,perTitle,perCompany,perPhoto,spoName,spoLink
FROM Agenda, Event, Session, SessionTitle, SessionParticipant, Person
LEFT JOIN Sponsor ON sesspoID=spoID
WHERE seseveID=eveID && seseveID='$eventID' && sesageID=ageID && sesSessionTitleID=SessionTitleID && sspsesID=sesID && sspperID=perID && eveShowAgenda='1' ORDER BY ageStartTime,sesID,sspModerator,perLastName

Dylan Leblanc
05-05-2005, 11:45 PM
You may be able to improve the performance of the query by duplicating data amongst the tables, so that you don't always need to join on some of the extra tables. Redundant data like this isn't really proper database design, so be carefull, particularly when updating data which may be in more than one table.

pysan
05-06-2005, 02:32 AM
That is an option, but since the primary reason I am using a DB is to ensure accurate data, so I would be afraid that in the future such redundant data would come back to haunt me. I must admit that I learned so much "proper" db design in school that you suggestion made me shudder, but then again, "proper" is not always the best when actual real-world performance actually matters.

However, I think I might have a somewhat elegant solution. Basically I grab the incomplete set that is almost working currently. Then I am doing the full list and looping it into an array. With that array, I then loop that array each time inside the big while(). But I begin the run through the array at a bigger index every time through by using a counter. So that drops me to 2 db calls, and while looping the same array the 20 or so times, at least it isn't 20+ db calls. As an illustration of what I am doing now, here it is:

$x=0;
$count = 0;
while ($row = db(incompleteset)) {
while ($array[$x]) {
$x = $count;
if ($past = $current) {
//code here
$count = $x;
}
$x++;
}
}

I think that sums up what I am trying now. It's not perfect, and not quite working completely yet, but I think I am very close.

Kiwi
05-06-2005, 06:13 AM
When you're running a complex query always specify the table... it makes life so much easier when trying to figure out what's going on.

I'm not sure if this work, but...

The restricted result-set will look like this:

SELECT DISTINCT
A.ageTitle,
TIME_FORMAT(A.ageStartTime,'%I:%i%p') AS StartTime,
TIME_FORMAT(A.ageEndTime,'%l:%i%p') AS EndTime,
C.sesID,
C.sesTitle,
C.sesDescription,
D.SessionTitleName
FROM
Agenda AS A,
Event AS B,
Session AS C,
SessionTitle AS D
WHERE
B.eveShowAgenda = '1'
AND B.eveID = C.seseveID
AND C.seseveID='$eventID'
AND C.sesageID=A.ageID
AND C.sesSessionTitleID=D.SessionTitleID
ORDER BY
StartTime,
C.sesID

Now you need to outer-join the extended result set. The key is to outer-join it properly (I'm not sure if you can outer-join two tables at once - I never do that, because fully featured databases have better alternatives. Unfortunately, I don't have a MYSQL test server to work on, but this would be ideal):

SELECT DISTINCT
A.ageTitle,
A.ageStartTime,
A.ageEndTime,
C.sesID,
C.sesTitle,
C.sesDescription,
D.SessionTitleName
E.sspModerator,
F.perID,
F.perLastName,
F.perFirstName,
F.perTitle,
F.perCompany,
F.perPhoto,
G.spoName,
G.spoLink
FROM
Agenda AS A,
Event AS B,
Session AS C,
SessionTitle AS D
LEFT JOIN (Sponsor AS G ON C.sesspoID = G.spoID)
LEFT JOIN (SessionParticipant AS E, Person AS F
ON E.sspsesID=C.sesID
WHERE E.sspperID=F.perID)
WHERE
B.eveShowAgenda = '1'
AND B.eveID = C.seseveID
AND C.seseveID='$eventID'
AND C.sesageID=A.ageID
AND C.sesSessionTitleID=D.SessionTitleID
ORDER BY
StartTime,
C.sesID

In reality, I'd solve this with a view (which is why I'm not sure this will work). If it doesn't work, I would recommend creating a pair of temporary tables, once containing the left-side data; one containing the right-side, then join the temp tables.

Your inserts into these temp tables would be built on the selects (and these would be the views I would use):

SELECT DISTINCT
A.ageTitle,
TIME_FORMAT(A.ageStartTime,'%I:%i%p') AS StartTime,
TIME_FORMAT(A.ageEndTime,'%l:%i%p') AS EndTime,
C.sesID,
C.sesTitle,
C.sesDescription,
D.SessionTitleName,
G.spoName,
G.spoLink
FROM
Agenda AS A,
Event AS B,
Session AS C,
SessionTitle AS D
LEFT JOIN (Sponsor AS G ON C.sesspoID = G.spoID)
WHERE
B.eveShowAgenda = '1'
AND B.eveID = C.seseveID
AND C.seseveID='$eventID'
AND C.sesageID=A.ageID
AND C.sesSessionTitleID=D.SessionTitleID
ORDER BY
StartTime,
C.sesID


SELECT
E.sspModerator,
F.perID,
F.perLastName,
F.perFirstName,
F.perTitle,
F.perCompany,
F.perPhoto
FROM
SessionParticipant AS E,
Person AS F
WHERE
E.sspperID=F.perID

Then an outer-join between these two temp tables would give you your complete result set. It's a clumsy solution, but it's the best you can do with MYSQL's limitations. Importantly, it gets most of the data work done in SQL, rather than in your application layer.