SQL to retrieve relevant matches like Google's "Searches related to"
Since someone helped me on this forum, I thought I would give a little something back. My design might not be the greatest, alternative designs would also be appreciated
If you do a Google search and notice the links at the bottom "Searches related to", this design try's to do a similar thing:
Searches related to Design
"searches related to" - relevant search terms:
What my design really does, is return the most common queries that other users were searching (1500 seconds before and after) at the time they searched the same term q
$q = php to get the users query (make sure you have made this safe)
$ip = php to get the users ip (also, make safe)
Log all queries that users make, log the query term (query), log the users ip address (ip) and log the timestamp (ts). I also chuck away duplicates queried (q) from the same person (same ip) within the same hour (ts > ( NOW() -3600 )
Next, when a user searches for terms q, find all the ip addresses that have searched the same q:
"INSERT INTO searchqueries(query, ts, IP)".
" SELECT '".$q."', NOW(), '".$ip."'".
" FROM DUAL".
" WHERE NOT EXISTS".
" (SELECT query, ts, IP FROM searchqueries WHERE".
" (query = '".$q."' AND ts > ( NOW() -3600 ) AND IP = '".$ip."')".
" ) LIMIT 1";
::SQL1:: select ip as q_ip, ts as q_ts from searchqueries where query = ".$q.";
for each of these ips get all of the queries (around the hour at the time the user queried q) for each q_ip (and q_ts)
::SQL2:: select query as relatedqueries from searchqueries where ip = q_ip and (ts > (q_ts - 1500)) and (ts < (q_ts + 1500))
combining the ideas of SQL1and SQL2
::SQL1+2:: select rq.query as relatedqueries, q_ip, q_ts from searchqueries rq, (select ip as q_ip, ts as q_ts from searchqueries where query = "Your Query") as t where ip = q_ip and (ts > (q_ts - 1500)) and (ts < (q_ts + 1500))
now we need to count all of the relatedqueries, order by the most common then return only the top 11
::SQL3:: SELECT relatedqueries, count(relatedqueries) as c FROM ::SQL1+2:: group by query order by c desc, Limit 0, 11
now combine the queries into a another single SQL statement
SELECT relatedqueries, count( relatedqueries ) AS c
SELECT rq.query AS relatedqueries, q_ip, q_ts
FROM searchqueries rq, (
SELECT ip AS q_ip, ts AS q_ts
WHERE query = '".$q."'
) AS t
WHERE ip = q_ip
ts > ( q_ts -8500 )
ts < ( q_ts +8500 )
) AS x
GROUP BY relatedqueries
ORDER BY c DESC
LIMIT 0 , 11
You might want to check the returning matches, and chuck away the one that is an exact match of q (unless you want it)
So we now have 2 SQL statements above, 1 to insert the users query, another to find the relevant matches
You will now have a list (potentially up to 11 matches) of "searches related to", or "searches that other people were searching when also searching this query". This design should get more accurate the more people use it, the beauty is that you don't have to put in your own data (people are making the related queries every time they query more than once within the same couple of hours).
Obviously, you can go into the DB and tweak things / add rows to begin with, but I haven't found much of a need to do this so far, it seems to be working well. Something I might add is a spell checker, and then dont add and any wrongly spelt queries / profanities (but this will be done with server side php code)