View Full Version : Any way to improve this slow query?
cyphix
11-12-2008, 05:02 PM
Hey guys...... I have been doing soem query texting with indexes & needed help with a query on if it can be made faster or not?
I have a table with about 650k records in it & I am running a query like this..
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
bank_detail
WHERE
bank='Westpac'
AND
other_id IS NULL
ORDER BY
bank_detail_id DESC
This query generally returns about 50k results depending on the bank name I out in.....
I have tried several different index methods & the best I can seem to get is a query time of about 11.5 seconds..... the indexes I have tried & their results are below:
No indexes ---------------------------------------------- Avg. 14 seconds
Single indexes on: bank, other_id - Avg. 14 seconds
Composite index on bank, other_id - Avg. 14 seconds
Composite index on bank, other_id, bank_detail_id - Avg. 11.5 Seconds
Composite Unique index on bank, other_id - Avg. 13 seconds
Composite Unique index on index on bank, other_id, bank_detail_id - Avg. 11.5 Seconds
I thought an improvement of only max 2.5 when adding indexes was pretty poor... :confused:
My EXPLAIN showed the below results..
Select type: Simple
Type: ref
key_len: 259
ref: const,const
rows: 50680
extra: Using where
Any help would be greatly appreciated! :)
Fumigator
11-12-2008, 07:42 PM
The only thing that jumps out at me is a very very large index (259). Is there a numeric key for each bank that you can store and use instead? That will certainly speed the query up.
Assuming you can't do that, you may want to trim down the size of your bank column (I can't imagine a bank name over 80 characters, let alone 259).
The other thing is, do you need all 50,000 records? Have you tried limiting the results? Also, did you try an index that combines bank + bank_detail_id? As that is the sort order, perhaps an index in that order would speed it up a bit.
cyphix
11-13-2008, 02:12 PM
As for the key_len @ 259..... yes, the bank field was set to varchar 255.... didn't think that mattered as varchar uses the max data space no matter what size you set it to, so I usually always set my varhcar fields @ 255 (don't know how it affects indexes though)..... either way, I changed it to 80 & it had no affect. :(
Yes I wanted to get all the results so I can't limit them... tried that other index you suggested but no change there either.
Also tried setting the banks up in another table & using id's in the bank_detail table & it actually made the query time about 5 second worse using the below query with an index on bank.bank_name..
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
bank_detail
INNER JOIN banks ON bank_detail.bank=banks.id
WHERE
banks.bank_name='Westpac'
AND
bank_detail.other_id IS NULL
ORDER BY
bank_detail.bank_detail_id DESC
Any other suggestions?
mdg583
11-13-2008, 07:15 PM
How about a non-composite index on bank_detail_id? (like a normal index). Since you are ordering by bank_detail_id after the 50000 are fetched.
Actually, I guess that is probably already your primary key...
I can do a MATCH AGAINST query on 2.6 million city names (FULL_TEXT index) in about 1 or 2 seconds.
Daniel Israel
11-13-2008, 07:42 PM
How about a non-composite index on bank_detail_id? (like a normal index). Since you are ordering by bank_detail_id after the 50000 are fetched.
Actually, I guess that is probably already your primary key...
I can do a MATCH AGAINST query on 2.6 million city names (FULL_TEXT index) in about 1 or 2 seconds.
Which begs another question...
if you do this:
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
bank_detail
INNER JOIN banks ON bank_detail.bank=banks.id
how large is the data set?
I know this sounds goofy. But with 2 HUGE tables, I've actually had success with something like this:
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
(SELECT * FROM bank_detail WHERE other_id IS NULL) bd
INNER JOIN (SELECT * FROM banks WHERE bank_name='Westpac') b
ON bd.bank=b.id
ORDER BY
bd.bank_detail_id DESC
I don't know if it's because I was joining on so many fewer records or my indexing just sucked... but it might be worth a try...
cyphix
11-14-2008, 12:33 PM
how large is the data set?
I know this sounds goofy. But with 2 HUGE tables, I've actually had success with something like this:
Get's everything ;) (Over 600k rows)
I know this sounds goofy. But with 2 HUGE tables, I've actually had success with something like this:
Haha no dice...... 22 seconds on that one.
tosbourn
11-14-2008, 01:18 PM
What Engine are you using?
cyphix
11-14-2008, 04:57 PM
MyISAM
oracleguy
11-15-2008, 03:36 AM
Also tried setting the banks up in another table & using id's in the bank_detail table & it actually made the query time about 5 second worse using the below query with an index on bank.bank_name..
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
bank_detail
INNER JOIN banks ON bank_detail.bank=banks.id
WHERE
banks.bank_name='Westpac'
AND
bank_detail.other_id IS NULL
ORDER BY
bank_detail.bank_detail_id DESC
Any other suggestions?
Well when you did that why are you joining the bank table like that? Just use a sub query to get the ID number from the banks table. And make sure you have an index on the ID column in the bank_detail table. That might help the speed issue.
Separating the bank names into a separate table is good database design anyways, that way you don't have a lot of repeated data for no real reason.
cyphix
11-15-2008, 10:23 AM
Ok well I tried this & got about 14 seconds..
SELECT id FROM banks WHERE bank_name='Westpac';
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
bank_detail
WHERE
bank=1
AND
other_id IS NULL
ORDER BY
bank_detail_id DESC
oracleguy
11-15-2008, 07:18 PM
SELECT
bank_detail_id,
account_name,
account_number,
bsb_code,
bank
FROM
bank_detail
WHERE
bank=(SELECT id FROM banks WHERE bank_name='Westpac')
AND
other_id IS NULL
ORDER BY
bank_detail_id DESC
The query would look more like this. Eventually the speed might be limited to the speed of the server you are running on.
cyphix
11-16-2008, 03:49 PM
Ahhh ok...... ran that & got 11.7 seconds. :)
Don't think this query can be improved much more..... thx everyone for their help & ideas however!! :thumbsup:
If you can post your create table statements then we can check to see how your indexes are set up and other things which can help make them more efficient.
bazz
cyphix
11-17-2008, 10:36 AM
Here we go:
CREATE TABLE `bank_detail` (
`bank_detail_id` mediumint(20) unsigned NOT NULL auto_increment,
`account_name` varchar(255) NOT NULL default '',
`account_number` varchar(255) NOT NULL default '',
`bsb_code` varchar(255) NOT NULL default '',
`bank` tinyint(3) unsigned NOT NULL default '0',
`other_id` mediumint(20) unsigned default NULL,
`customer_id` int(15) unsigned NOT NULL default '0'
PRIMARY KEY (`bank_detail_id`),
UNIQUE KEY `NewIndex1` (`bank`,`other_id`,`bank_detail_id`)
);
CREATE TABLE `banks` (
`id` tinyint(3) unsigned NOT NULL auto_increment,
`bank_name` varchar(80) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `NewIndex1` (`bank_name`),
UNIQUE KEY `NewIndex2` (`id`,`bank_name`)
);
ah I thought so; you have an auto increment PK. As I have recently learned, your pk needs not to be either auto_increment nor numerical. It can be anything, so long as it would be unique.
I would consider using account number as the pk OR (if it is possible that two different banks might cause this to be duplicated), I would try a composite PK of bank and account number.
your create table statements could then be:
CREATE TABLE bank_detail
(account_number varchar(32) NOT NULL
, bank tinyint(3) unsigned NOT NULL
, account_name varchar(255) NOT NULL
, bsb_code varchar(255) NOT NULL
, other_id mediumint(20) unsigned default NULL
, customer_id int(15) unsigned NOT NULL
PRIMARY KEY (account_number,bank),
UNIQUE KEY NewIndex1 (account_number,bank,other_id)
);
CREATE TABLE banks
(bank_name varchar(80) NOT NULL
, PRIMARY KEY (bank_name)
, UNIQUE KEY NewIndex1 (bank_name)
);
You also show that you have a customer table. why? can you have two customers with the same bank account? not really. So you need to consider putting the customer data inside bank_detail.
Also, does BSB_code need to be 255? surely they aren't that long?
These changes will make your db run more efficiently and should help shorten your query times.:)
hth
bazz
Fumigator
11-17-2008, 04:52 PM
Imagine a customer that has 50 accounts, or 500 accounts. This happens all the time with big companies. Without a customer table, you're now duplicating the same data 500 times. One big advantage with normalized data is you can get rid of this kind of duplication.
Yeh I know it's duplicated but not in relation to each account and duplication in itself isn't all bad?
For example, if there are 50 bank accounts where the account holder is called 'john' should we put that name into another table?
bazz
Fumigator
11-17-2008, 08:01 PM
To fully normalize, yes, "john" would go in another table. But you're right, it would be no big deal to just store "john" 50 times in the account table.
Now what about john's address-- street, city, state, postal code, and country? And phone numbers-- fax, cell, work, home? How about john's credit info and business relationships? How many times do you want to store john's emplyment history and statement preferences? You're looking at a nightmare of data duplication, and to answer your question, YES, data duplication is BAD, very BAD. You really want to have to update 50 rows every time john moves? To make matters worse, what if your application only changes one of john's addresses, and doesn't look for any other accounts john owns? And how do you even know it's the same john that owns all of those 50 accounts? What if on 3 of those accounts, the data entry operator spells his name wrong, so you've now got 3 of 50 accounts no longer getting updated when john moves?
I've worked on systems that aren't normalized and all of these issues and more are the very reason data normalization is so important.
I agree with you that databases should be normalised - I would suggest to at least 3rd normal form.
My comment was based on each person having just one account. If they have more than one account, then, OK then the address details could be put into a separate table and in that instance I would suggest two tables. One for customer personal details and another for addresses.
CREATE TABLE IF NOT EXISTS customers
( customer_id INT NOT NULL AUTO_INCREMENT
, last_name VARCHAR(50) NOT NULL
, first_name VARCHAR (50) NOT NULL
, username VARCHAR (12) NOT NULL
, password VARCHAR (12) NOT NULL
, phone VARCHAR (24) NOT NULL
, mobile VARCHAR (24) NOT NULL
, fax VARCHAR (24) NOT NULL
, email VARCHAR (64) NOT NULL
, web VARCHAR (64) NOT NULL
, PRIMARY KEY (customer_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS customer_addresses
( customer_id INT NOT NULL
, address_type CHAR(2) NOT NULL #b=billing s=shipping p=postal
, name_number VARCHAR (32) NOT NULL
, address_1 VARCHAR (32) NOT NULL
, address_2 VARCHAR (32) NOT NULL
, address_3 VARCHAR (32) NOT NULL
, town_city VARCHAR (32) NOT NULL
, county_state VARCHAR (32) NOT NULL
, post_code VARCHAR (32) NOT NULL
, country VARCHAR (32) NOT NULL
, PRIMARY KEY (customer_id,address_type)
, CONSTRAINT customeraddresses_customer_fk
FOREIGN KEY (customer_id)
REFERENCES customers (customer_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
But, while useful to future readers of this thread is secondary to the issue raised by the OP, which I still think would be imporved for him/her, if the tables were composed as in my earlier post.
bazz
vBulletin® v3.8.2, Copyright ©2000-2009, Jelsoft Enterprises Ltd.