PDA

View Full Version : Resolved LEFT JOIN 2 tables with multiple ON fields


JohnDubya
05-14-2009, 11:10 PM
Ugh. Once again, I've gotten stumped on a complex query. I'll try to make it as simple as possible. Basically, we have a DB of people's information, and we allow for our clients to add custom categories to be able to add additional fields themselves. So for each custom category, there are multiple options to choose from. And each person can be associated with one of these options under each custom_field. Here is the DB structure:

TABLES

`custom_fields` => (field_id, name) [this table isn't used in the query, but here to show DB structure]

`custom_options` => (option_id, field_id, value)

`custom_attributes` => (option_id, person_id)

`person` => (person_id, name)


PROBLEM

I need to filter on these things:


custom_options.field_id = 4
custom_options.option_id = custom_attributes.option_id
custom_attributes.person_id = p.person_id


And in addition to needing to filter on these things, I also need the query to show people who do NOT have an option entry in `custom_attributes` for field_id = 4. So it would basically need to return NULL in that case but still show the record. Here's what I've got so far:

$query = "SELECT cuo1.value value FROM ( (person p)
LEFT JOIN custom_attributes cua1 ON cua1.person_id = p.person_id
LEFT JOIN custom_options cuo1 ON cuo1.option_id = cua1.option_id )
WHERE ( (cuo1.field_id = 4 OR cuo1.field_id IS NULL) OR
(NOT EXISTS (SELECT 1 FROM custom_attributes tcua1
JOIN custom_options tcuo1 ON tcuo1.field_id = 4
WHERE tcua1.person_id = p.person_id AND tcua1.option_id = tcuo1.option_id)) )
GROUP BY p.person_id ORDER BY p.name";

This query correctly returns 14 rows, but the value is not completely correct. It returns the correct option value for people who only have a `custom_person` entry for an option under field_id = 4, but if they have additional `custom_person` entries, one of them is sometimes returned too.

So how do I get this query to ONLY grab the option's value for a field_id = 4? Thanks for any help you can provide!

Old Pedant
05-15-2009, 12:23 AM
Any time you use a LEFT JOIN and then use fields from the right-side table in the WHERE clause, you have effectively converted the query into an INNER JOIN.

You *MUST* make selections that are dependent on right-side tables as part of the ON clause.

SELECT cuo1.value
FROM person p
LEFT JOIN custom_attributes cua1
ON cua1.person_id = p.person_id
LEFT JOIN custom_options cuo1
ON ( cuo1.option_id = cua1.option_id AND (cuo1.field_id = 4 OR cuo1.field_id IS NULL) )
... etc. ...


I'm not sure I follow all the rest of that query, yet, so I'll let you play with it a while.

But suffice to say you shouldn't be usieng ANY WHERE clause in there, except possibly in the inner SELECT.

Old Pedant
05-15-2009, 12:28 AM
You know, given *ONLY* the conditions you stated in the beginning of your post, it looks to me like this might do it:

SELECT p.person_id, p.name, cuo1.value
FROM person p
LEFT JOIN custom_attributes cua1
ON cua1.person_id = p.person_id
LEFT JOIN custom_options cuo1
ON ( cuo1.option_id = cua1.option_id AND cuo1.field_id = 4 )
GROUP BY p.person_id, p.name
ORDER BY p.name

Or, if you need more info from the person table, why not just

SELECT DISTINCT p.*, cuo1.value
FROM person p
LEFT JOIN custom_attributes cua1
ON cua1.person_id = p.person_id
LEFT JOIN custom_options cuo1
ON ( cuo1.option_id = cua1.option_id AND cuo1.field_id = 4 )
ORDER BY p.name

JohnDubya
05-15-2009, 05:03 PM
Thanks so much for the responses, Old Pedant!

In addition to your suggestion of the two ON statements, I added more in the WHERE clause. This is what ended up working for me:


SELECT p.person_id, p.name, cuo1.value
FROM ( (person p)
LEFT JOIN custom_attributes cua1
ON cua1.person_id = p.person_id
LEFT JOIN custom_options cuo1
ON ( cuo1.option_id = cua1.option_id AND cuo1.field_id = 4 ) )
WHERE ( (cuo1.field_id = 4 OR cuo1.field_id IS NOT NULL) OR
(NOT EXISTS (SELECT 1 FROM custom_attributes tcua1
JOIN custom_options tcuo1 ON tcuo1.field_id = 4
WHERE tcua1.person_id = p.person_id AND tcua1.option_id = tcuo1.option_id)) )
GROUP BY p.person_id, p.name
ORDER BY p.name


That whole NOT EXISTS part searches for people who do not have entries in the custom_attributes table for field_id = 4. I got that code from a separate post and customized it for my query, so I'm assuming that's all it does. lol

Old Pedant
05-15-2009, 08:19 PM
Okay...now I see it. Makes sense.

I wrote up a message a while back explaining why using WHERE on dependent table fields can convert outer joins to inner joins. I you would like to see it, I'll see if I can dig it up.

JohnDubya
05-15-2009, 08:33 PM
Absolutely. This is about fourth time I've run into an issue similar to this one, so the more I learn about it, the easier it will be next time. This stuff makes my head spin. lol Thanks again!

Old Pedant
05-18-2009, 09:13 PM
When you use a dependent table in the WHERE clause after doing any kind of OUTER JOIN, you automatically effectively convert the outer join to an INNER JOIN.

I posted this example WHY that happens way back in 2004 and it's so hard to find, now, that I just spent 15 minutes looking for it. So I'm gonna copy it here and save a link to it.

******************************************
Let's assume we have the following data in two tables:

table: Cities
id -- city
1 -- LAX
2 -- NYC
3 -- SFO
4 -- SEA

table: Banks
cityID -- HQ -- bank
2 -- True -- CitiBank
3 -- True -- Bank of America
3 -- False -- California Bank

And then look at what happens when we use this query:

QUERY:
SELECT C.city, B.bank
FROM cities AS C LEFT JOIN banks AS B
ON ( C.id = B.cityID )
WHERE B.HQ = True

First, do the query and get all the records *WITHOUT THE WHERE* clause.
Thus:

id - city- cityID - HQ - bank
a) 1 - LAX - null - null - null
b) 2 - NYC - 2 - True - CitiBank
c) 3 - SFO - 3 - True - Bank of America
d) 3 - SFO - 3 - False- California Bank
e) 4 - SEA - null - null - null

*NOW* apply the WHERE clause to that:
... WHERE HQ = True

Well, for record "d", the right thing is done: The FALSE in HQ rejects that record.

But now look at records "a" and "e". Is the HQ field in those records TRUE? Nope. It's NULL.

SO THOSE RECORDS ALSO GET REJECTED!!!!

So the final results will be:

NYC - Citibank
SFO - Bank of America

**************

Now, compare that to this query:

SELECT C.city, B.bank
FROM cities AS C LEFT JOIN banks AS B
ON ( C.id = B.cityID AND B.HQ = True )

Since the test on HQ is now part of the ON, it doesn't affect whether or not a "cities" record will be part of the result.

And, indeed, you will correctly get final results of

LAX - null
NYC - Citibank
SFO - Bank of America
SEA - null

**************

TRY IT!

A comment I left out of the original post: *IF* the test you make in the WHERE clause checks ONLY for NULL, then it's okay! That is, if you use
... WHERE DependentTable.SomeField IS NULL
(or IS NOT NULL, of course).

You can see why, I assume.