View Full Version : How do I make quicker mysql searches?
Gary Williams
04-02-2008, 02:54 PM
Hi All,
In an application, I need to search a simplified UK postcode file of some 30,000 records and, after some data crunching and calculations, I get the results page from the server in 7 seconds (on average). I now need to use the complete postcode file of 1,760,000 records. This slows the response from the server to 15 seconds (on average). How can I speed up this process? I guess I can transfer the application to a server based on a faster processor, but what can I do to help the MySQL Database Server?
Regards
Gary
Stooshie
04-02-2008, 04:28 PM
There are a number of things you can do.
Optimise the database.
Make sure the table you are searching is properly indexed. In particular, make sure there is an index against the postcode column.
Optimise the SQL queries
Select the minimum amount from the database that you can. In other words don't use
SELECT *
FROM table
but use
SELECT thecolumnineed1, thecolumnineed2
FROM table
Also cache the queries. this means passing a parameter in the SQL rather than the variable directly (good from a security point of view as well).
Optimise the connections
Assuming you are using php with mysql use pconnect to create a persistent connection so that the code isn't re-connecting each time you run the query.
Without seeing your queries I can't say very much more.
abduraooft
04-02-2008, 04:38 PM
I'd recommend to readUsing MySQL Full-text Searching (http://devzone.zend.com/node/view/id/1304), and http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
but If you are using shared hosting, you may not be able to change the default minimum-word-length
Also you could use sessions to limit the number of search by a single user within a predefined time
NancyJ
04-02-2008, 05:27 PM
Unless you have *a lot* of columns or are returning a large dataset then specifying the columns you need is unlikely to make a significant impact in your query times. In long running queries, incorrect/absent indexes are the most likely cause of the problem.
Posting your code and table structure would be helpful.
Stooshie
04-02-2008, 06:27 PM
NancyJ, I get your point however if one of the fields is, say, a description field that isn't used then a lot of info is getting pulled back you don't need. On top of which the db does a "select columns from tab" query before it runs a "select *" query.
NancyJ
04-02-2008, 06:53 PM
Given the queries are taking 7-15 seconds, unless these description fields contain entire books or the db server belongs in a museum, excess data transmission is not the problem. We're not talking about knocking off a few milliseconds to super-optimise here, it would have to be 100s of MB of data to take that long.
Either the searched field isn't indexed or its not being used (probably because of a " like '%foo%' query)... or he's grabbing the whole of the table and searching in PHP, from the question, thats a possibility.
Stooshie
04-03-2008, 12:22 AM
Given the queries are taking 7-15 seconds, unless these description fields contain entire books or the db server belongs in a museum, excess data transmission is not the problem.
true!
Gary Williams
04-03-2008, 09:27 AM
Hi All,
Thanks for all the replies. In my past applications, the databases have been relatively small so my lack of refinement with coding searches and setting up database tables has not been an issue, hence my ignorance of indexes, etc.
I have attached a 50 record sample of the postcode database (MySQL) I now need to use (it's a csv file, "postcode.txt"). The empty fields and companynumber field are a hang over from when all 3 companies postcodes were stored in one table. I've now split the table so that each company has its' own table. The values stored in fields AreaLoad, BuildingsAreaCode, ContentsAreaCode, FloodArea, SubsidenceRate, FloodRate are used to draw weighting %'s from a 'rates' table for a subsequent calculation.
The postcode (AB10 1AA) posted on the data entry form is split into the 3 parts, ie, outbound (AB10), the sector (1), and the walk (AA) and used to inform the search of fields postcode, postcodesubsector and postcodewalk.
The SQL code I use for searching for a specific postcode record is:
strSQL = "SELECT * FROM " & strApplicablePostcodeTable & " WHERE companynumber = " & RSCompanies("companynumber") & " AND postcode = '" & strPostcodePart & "' AND postcodewalk = '" & strPostcodeWalkPart & "' AND postcodesubsector = '" & strPostcodeSubsectorPart & "'"
Please put me straight on this and show me what I should have done. I'll then be able to use the corrected table/code as a worked example for the rest of my application.
One day I hope to know enough to actually answer someones posts, instead of always asking questions. Thanks for your patience.
Regards
Gary
Stooshie
04-03-2008, 01:42 PM
Thanks for uploading that. Helps a lot.
One thing to think about is the order that you put the WHERE clauses. I'm not sure how variable the companyNumber is (they are all 1 in the database).
The first clause should be the one that will narrow down the search the most. The next clause should be the one that narrows it down the next most and so on until the last.
Here is a suggested order:
strSQL = "
SELECT *
FROM " & strApplicablePostcodeTable & "
WHERE postcode = '" & strPostcodePart & "'
AND postcodewalk = '" & strPostcodeWalkPart & "'
AND postcodesubsector = '" & strPostcodeSubsectorPart & "'
AND companynumber = " & RSCompanies("companynumber") & "
"
I suspect the postcode field(2 chars + 1 num) has the most variation (ie is narrowed down more when a search is carried out) and the the postcodewalk (2 chars) and so on...
Only just had a quick look. I will try and see if there is anything else.
Gary Williams
04-04-2008, 08:11 AM
Hi Stooshie,
I amended the strSQL code by removing the reference to company number (no longer required) and checked the order. I also re-ordered the fields in the database but unfortunately this hasn't made a noticeable difference in speed of searching.
Would the search be quicker if I combined the 3 postcode fields into one so that instead of searching for "AB10", then "AA" and finally "1", just to search for "AB10 1AA" in one go?
You mentioned that I could "cache the queries". This means passing a parameter in the SQL rather than the variable directly (good from a security point of view as well). How would I do this?
Also, NancyJ mentioned that "the searched field isn't indexed". Again what is this and how do I impliment it? I'll do a google search for indexing now to see what I can find.
Thanks and regards
Gary
Gary Williams
04-04-2008, 08:24 AM
Hi abduraooft,
Thanks for the advice on Full Text searching. I've just read a few articles and one article recommends moving the database to an Oracle system for databases with tables containing millions of records. My postcode table is 1.76M records long.
Regards
Gary
Gary Williams
04-04-2008, 08:29 AM
Hi All,
Just had a brainwave(?). How about splitting the current postcode table into 26 smaller tables, one for eack letter of the alphsbet, ie, table 1 for postcodes AA00 - AZ99, table 2 for BA00 - BZ99, etc.
I can quickly select the correct table from the first letter of the postcode, then search the smaller table.
Using sqlyog, it won't take long to create these smaller tables from the master table.
Is this a reasonable solution?
Regards
Gary
Stooshie
04-04-2008, 02:45 PM
Hi there,
Sorry the above didn't help. If the tables aren't indexed then it probably wouldn't help much. (See below for indexes).
I don't think combining the fields into 1 postcode field would help.
Unless you could ensure that data passed in the query was in the correct postcode format using a regular expression at the point of input. This would mean that you would not have to do a LIKE query (especially if you have to prefix and postfix the % character (e.g. LIKE '%INPUT%')).
So, if you can ensure postcode format then you could combine them.
As far as cahching queries goes, I only know how in PHP. Perhaps google may help? :-)
For the indexing, if you have access to phpMyAdmin with your provider then it's fairly easy. Go to the structure page for the table then click the lightning strike(index) icon for the column you want.
From the command line the query is:
ALTER TABLE mytable ADD INDEX (mycolumn)
Splitting the tables could help. If the "decision process" to decide which table to use doesn't take too long (It shouldn't as your code is just analysing the first letter).
All the above in combination may help quite a bit.
Hope this helps.
ps remember to index all 26 tables if you decide to take that route.
NancyJ
04-04-2008, 06:57 PM
Take what Stooshie said about the where clauses but apply that to your indexes. You want to create an index that contains all those fields in the order indicated by Stooshie ie. The field that with the most variation should be the first field in the index, the second field should be the one with the second most variation... etc.
If you're not indexing your table then each query has to examine every single one of those 1.7 million rows.
...also, just to add, 26 tables is completely unnecessary - just index your tables
Gary Williams
04-09-2008, 02:30 PM
Hi NancyJ & Stooshie,
Only one word for it - BRILLIANT!
My search and display times are down from 15 seconds to 3 Seconds.
This is what I did.
1. Changed the table type from Innob to myISAM (Read that was necessary for indexing).
2. Reordered the fields in the postcode table to be the same as the SQL string, the field with the most variations first, etc.
3. Created 3 new index's, now 4 in total as the recordid field was already set as the Primary key.
PRIMARY 1762919
Postcode 2923
PostcodeWalk 400
PostcodeSubSector 9
Should I have only set one new index with all 3 fields in it?
Anyway, the results are now so fast.
Many, many thanks.
Gary
NancyJ
04-11-2008, 07:08 PM
3 seconds is still pretty long, try merging the 3 indexes into 1 (in the order of most->least variance)
You can also use the keyword 'explain' before your queries to get an explaination of what is going on internally.
Also, InnoDB can be indexed just the same as myIsam, however myIsam is probably the correct table type for this, given your description of its use.
Gary Williams
04-14-2008, 12:16 PM
Hi NancyJ,
I merged the indexes and deleted the 3 original, individual, indexes. The results now display in 2 seconds! This may seem long to you, but this is the round trip time from me hitting the button on the form on the website, over to the server for searching, etc, then back to the website for display. Would you still expect a quicker result?
How do you use 'explain'? Do you mean strSQL = Explain ( Select .....)?
I'll go look for a tutorial on choice of table type.
Thanks
Gary
NancyJ
04-14-2008, 12:54 PM
If 2 seconds includes the download time then you're probably getting a pretty good query time.
To use explain you simple put the word 'explain' in front of the query. eg. EXPLAIN SELECT * FROM TABLE This returns information about the query rather than the result. Do you use phpMyAdmin or similar to interface with your DB?
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.