...

View Full Version : Union of 2 match queries and ordering by relevance of both



mike22
05-12-2011, 01:03 AM
Hi,

I'm trying to query two unrelated tables (but retrieve related date). I've done this previously as 2 separate queries, but I'm trying to reduce the number of queries to increase performance.

The 2 queries I use, are match queries (in Boolean Mode). I'm not an expert, but I'm trying to get this to perform as well as it can, and have read Match & Boolean Mode for searching over text columns is probably what I should be using. The problem I'm having is that I can order by one score but not the other:



SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4',
MATCH(TAG,SUM,Title)
AGAINST ('Landlord') as RelevanceA
FROM rft_searchlawssum
WHERE MATCH (TAG,SUM,Title) AGAINST('Landlord' IN BOOLEAN MODE)
UNION ALL
SELECT g.URL as gURL, p.message AS pmessage, p.subject as psubject, g.idtype as gidtype, g.id as gid, p.pid as ppid, p.tid as ptid,
MATCH(p.message,p.subject) AGAINST ('Landlord') as RelevanceB
FROM mybb_google_seo g LEFT JOIN mybb_posts p ON g.id = p.tid
WHERE MATCH(p.message,p.subject)
AGAINST('Landlord' IN BOOLEAN MODE) AND g.idtype ='4'
ORDER BY RelevanceA, RelavanceB DESC


If I drop the RelavanceB this works, but with the RelavanceB I get:
#1054 - Unknown column 'RelevanceB' in 'order clause'

Any idea why I cant use RelavanceB in my order clause?

Old Pedant
05-12-2011, 01:19 AM
Same reason you can only use the field *NAMES* from the FIRST SELECT in a UNION.

Go on try it.

Try to use pmessage or psubject in your PHP (or whatever) code that calls this query.

Or, for that matter, try it in MySQL, alone.

Just do:


SELECT pmessage
FROM (
SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4' ...
UNION ALL
SELECT g.URL as gURL, p.message AS pmessage, p.subject as psubject, ...
)


Won't happen.

Or just execute that UNION ALL query, alone, from the MySQL command line.

Look at the names of the fields you get.

Using AS to alias the names of fields in the second and subsequent SELECTs of a union is a waste of code (well, unless you use them in a HAVING clause...but even then the safer thing to do is alias them the same as the names of the first SELECT).

In other words, what you probably should do (if you insist on using aliases in the second SELECT) is


SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4',
MATCH(TAG,SUM,Title)
AGAINST ('Landlord') as RelevanceA
FROM rft_searchlawssum
WHERE MATCH (TAG,SUM,Title) AGAINST('Landlord' IN BOOLEAN MODE)
UNION ALL
SELECT g.URL, p.message AS SUM, p.subject as Title, g.idtype as dummy1, g.id as dummy2, p.pid as dummy3, p.tid as dummy4,
MATCH(p.message,p.subject) AGAINST ('Landlord') as RelevanceA
FROM mybb_google_seo g LEFT JOIN mybb_posts p ON g.id = p.tid
WHERE MATCH(p.message,p.subject)
AGAINST('Landlord' IN BOOLEAN MODE) AND g.idtype ='4'
ORDER BY RelevanceA

Or choose better names in your first SELECT.

[And when you don't give a name to a field in a SELECT, MySQL uses the value as the name. Which is why 'dummy1' gets the name dummy1, etc.]

Old Pedant
05-12-2011, 01:26 AM
And by the way, the field types in the two SELECT lists must agree, in order.

If SUM is a numeric field, as seems likely, and message is some kind of text field, then most DBs would reject the UNION with a type mismatch.

MySQL allows it (I just tested, to verify), but the resulting field type of SUM will be the text type of message, since text can't be converted to a number but of course a number can be converted to a string.

If you *need* to be able to treat the SUM field as a number, then you would be better off UNIONing it with a dummy numeric field from the second SELECT.

mike22
05-12-2011, 01:51 AM
I see what you mean, I only get the 1st field names when running in phpMyAdmin

I matched up the types (Sum is Summary, so its the same as p.message/ post Message, and the same column type)

I'm going to switch the top and bottom quires, since I need the gid, ppid ... aliases.

so it will be:


SELECT g.URL as gURL, p.message AS pmessage, p.subject as psubject, g.idtype as gidtype, g.id as gid, p.pid as ppid, p.tid as ptid,
MATCH(p.message,p.subject) AGAINST ('Landlord') as RelevanceB
FROM mybb_google_seo g LEFT JOIN mybb_posts p ON g.id = p.tid
WHERE MATCH(p.message,p.subject)
AGAINST('Landlord' IN BOOLEAN MODE) AND g.idtype ='4'
UNION ALL
SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4',
MATCH(TAG,SUM,Title)
AGAINST ('Landlord') as RelevanceA
FROM rft_searchlawssum
WHERE MATCH (TAG,SUM,Title) AGAINST('Landlord' IN BOOLEAN MODE)
ORDER BY RelevanceB DESC

This returns better column names
But I'm in the same situation, ordering by RelevanceA is not possible.. (with my limited understanding, anyway)

Old Pedant
05-12-2011, 02:41 AM
Ummm...yes, it is.

Because RelevanceB is a field in *BOTH* SELECTS, by virtue of the fact that it *IS* a field in the first SELECT and so the corresponding field in the other SELECT will get that same name, the ORDER BY will apply to the COMBINED UNION.

Now...

The problem you may have is that the relevance in one of those SELECTs may not be comparable to the relevance in the other.

So maybe you'll have to apply a multiplier to one of them to get them into the same range of values.

But you should be able to see that by simply inspecting the results.

mike22
05-12-2011, 03:39 AM
Ah ha, that does make sense.

From your last example/explanation, I suspected it might have already been ordering by both, I just couldnt tell.

For some reason, I'm getting results, but yet the relevance score is 0 ???
(surely if it was 0, the data shouldn't even be pulled in from the Boolean match)

The word 'Landlord' is sometimes in the url, sometimes in title and sometimes in the summary (and sometimes in 2 or all 3). From what I've read, a 0 should only happen if its not present in any, shouldn't it?

It finds the right result, but all of the relevance scores are 0 (so doesn't ever order them 'right', before considering adding a co-efficient to make the scores relative)

Old Pedant
05-12-2011, 06:27 AM
What happens if you run each SELECT separately? Do you see any relevance value numbers then?

mike22
05-12-2011, 08:58 PM
Now I see my issue, since its a new project, it' currently a small database

More than 50% of the current rows contain the word "Landlord". I found this mentioned about fullText searching:


if a search word is found in more than 50% of the
records of the table, then the word is thrown out as a factor in the
weighting

-I've since inserted a few dummy rows, and I do indeed get values back for RelevanceB (for results of the first and second query!). I hadn't investigated the issue enough, Old Pedant you certainly walked through it with me..

I hope this thread helps anyone else with a similar issue. Thank you Old Pedant



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum