PDA

View Full Version : To JOIN or not to JOIN... or am I missing something...?


krycek
04-18-2003, 06:15 AM
Right, I was always under the impression that it was 'better' to use JOINs, partly because it is 'faster'.

I'm now wondering if that is simply a myth.

Take these two SQL statements:

SELECT DISTINCT p.ProductID, p.Image, p.Price
FROM products AS p
RIGHT JOIN category_links AS c_l
ON c_l.ProductID = p.ProductID
INNER JOIN categories AS c
ON c.CategoryID = c_l.CategoryID
RIGHT JOIN brands AS b
ON p.BrandID = b.BrandID
RIGHT JOIN size_links AS s_l
ON s_l.ProductID = p.ProductID
INNER JOIN sizes AS s
ON s.SizeID = s_l.SizeID
RIGHT JOIN colour_links AS co_l
ON co_l.ProductID = p.ProductID
INNER JOIN colours AS co
ON co.ColourID = co_l.ColourID


SELECT DISTINCT p.ProductID, p.Image, p.Price
FROM products AS p,
category_links AS c_l,
categories AS c,
brands AS b,
size_links
AS s_l,
sizes AS s,
colour_links AS co_l,
colours AS co
WHERE c_l.ProductID = p.ProductID
AND c.CategoryID = c_l.CategoryID
AND p.BrandID = b.BrandID
AND s_l.ProductID = p.ProductID
AND s.SizeID = s_l.SizeID
AND co_l.ProductID = p.ProductID
AND co.ColourID = co_l.ColourID

The first one uses JOINs and the second simply uses WHERE. As a matter of information, both have additional WHERE details added to refine the search.

I'm using a fast PC, and there are only 14 products in the database (and not all that much data in the other tables). However, I was getting REALLY slow script execution, and I traced it to the SQL query.

Running the first one takes an average of 7 seconds. Running the second query takes less than 1 second. It's almost instantaneous in fact.

This kinda tells me NOT to use JOINs... and to stick with WHERE for this. But in that case, I am left confused as to where it is appropriate to use JOINs and where it isn't...? I did a couple of hours of Googling and didn't clear the matter up. All the articles I found pointed towards using JOINs.

Obviously at the end of the day I'm going to use the faster method. Plus, after thinking about it, there's a lot more work being done with the JOINs, is there not...?

I'd love to know what some other people think about this, and whether I'm just totally out on my JOIN usage or if other people are using them in the same situations.

::] krycek [::

krycek
04-18-2003, 06:29 AM
By the way, I'm not thick :p As I hinted at, once I thought about the speed difference, it was obvious that the JOINs will take longer, because they are selecting an exponentially larger mass of data than the WHERE.

I did a couple of tests on this.

Query 2 returned 29 out of 316 possible rows.

Query 1 returned 29 out of 340032 possible rows! :eek:

No wonder the JOINs method takes so much longer!

So, I just thought I'd better make it extra-clear that I'm not asking why the JOINs take longer, but rather, why articles etc. recommend using them for things like this (obviously there are time when JOINs are absolutely necessary), and so on and so forth.

::] krycek [::

krycek
04-18-2003, 06:43 AM
I've just been talking to whammy about this, and he's just as confused as me...

I've tried this on:

WinXP Pro, IIS 5, MySQL 4.0
Win2k server, IIS 4, MySQL 4.0
Linux, Apache 1.3, MySQL 3.23.56
Linux, APache 2.0, MySQL 4.0

The times are all approximately the same, with allowance for the differences in hardware etc. Very odd.

Also, the tables have their keys set up, but no relationships defined of course because MySQL does not support that. (Has to be done on a script level). There is not much data - only test data right now. 14 products, 8 colours, that kind of thing, and the links of course.

Either I'm using these JOINs for the wrong thing, or they should be running a heck of a lot quicker... I just realised the times I quoted above were with the extra search criteria. The raw query takes 31 seconds (thru phpMyAdmin) compares to < 1 sec for query 2.

Scary... :eek:

::] krycek [::

raf
04-18-2003, 10:21 AM
Hmm.

I know your not a rookie Krycek, and this was probably just an experiment for you, so i say this in general.

As far as i know, joins shouldn't be that much slower in a realistic situation.
It is true there will be build much more and much larger views then with the were method. If i'm not mistaking, the where form will only create one view (the actual recordset before the DISTINCT). The join form, will create a view for each join + for the joins on these views etc. Especially when you use outer joins (right or left joins) there are created quite big/huge views. And this is probably taking up all that time.

Frankly, if you need such a join to get three variables from the same table ... then there's somenthing completely wrong with your db-design. ;) Just saying this is not a verry realistic/good query. Specially not if this should be run in a client-server webapplication context. If you run this sort a thing for analitacal purposes on a copy of your db, you could have to resort to this sort a query's, but then, the responsetime is somewhat less important (if at all).

For runtime querys, i try to stay below 3 joins, and only an outer join if this is absolutely necessary. When i set up my db-design, i already know which data will be needed/modifyed/inserted on each screen, which processes are critical and need to be run frequently and fast, and i build my db-design accordingly to that.
Using 4 outerjoins in one query is asking for problems.


Joins are a gift from heaven, and they can remidate some poor design, but you need to use them correctly on a good designed db.


You've pointed out a problem. and i hope it was an artificial one. If i would have some time, i'd rewrite some of my actually used querys to see what happens, but ...

In the situation you described, solving it with such an elaborated condition might be the best sollution, but it's a sollution for an unneccessary problem. Knowing whats the best sollution in each situation, and not just always falling back on the same few tricks (+ foreseeing problems and avoiding them !) is just what distinguishes the good from the bad. (i'm just a rookie that fools around :) )

krycek
04-18-2003, 07:45 PM
I agree with most of what you said, raf :)

My db design is fine... no worries there.

I think maybe I used JOINs in the wrong way (I'm still messing with that) however I obviously used them in the wrong place.

You are correct in what you said about what is happening, and indeed also about your advice on usage.

What I'm wondering is WHY I went down that route... which was mainly because articles etc. pointed me in that direction, so WHY did the articles say to use JOINs in this situation, when they are obviously not that great an idea...?

I mean, if the WHERE stuff works fine and is super quick, why use the JOINs...? :confused:

::] krycek [::

raf
04-18-2003, 08:12 PM
IT'S ALL BILL GATES FAULT !! WITH HIS STUPID 'MAKE IT VISIBLE' PHILOSOPHIE !!

seriously. I don't know. It's easier to imagen whats going on if you imagen these join lines. But i absolutely have no idea how the RDBM realy processes these requests (i mean, at a lower level). If ou define the relationships in your db (still not possible with MySQL :( ) you'll be tempted to always use joins. Same with subquerys. Its that easie, you wount look for a where clause that does the same ... (Although subquerys take far more time)

About the tutorials. Well. I've seen quite some tutorials that also use joins through where clauses. I think that the more 'professionally' tainted tutorials use this form moer frequently then the 'building your first etc etc' tutorials. But it's an interesting question.

I hope someone else knows more about this.