I have a table and query which is causing performance issues. It was fine when small, but now has about 6 million rows in production and it's growing fast. Before I start rewriting stuff or creating an archive I am wondering if there is a quick win possible here with better indexing and would be very thankful if anyone can help I think there is a quick fix because the query is fast without the "AND visible = 'Y'", but I need help if anyone can please
My profiling tool tells me "Very bad, very slow: Join uses key prefix or non-unique key. Key compared with = or <=>)".
The query is:
Code:
SELECT type FROM orderhistory WHERE (order_id = '943581' OR ticket_id = '943581') AND visible = 'Y' limit 1
Table structure:
Code:
CREATE TABLE `orderhistory` (
`uniqueid` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(16) NOT NULL,
`ticket_id` int(16) NOT NULL,
`type` char(8) NOT NULL,
`date` int(11) unsigned NOT NULL,
`notes` text NOT NULL,
`ontime` char(1) NOT NULL,
`postedby` varchar(96) NOT NULL,
`visible` char(1) NOT NULL DEFAULT 'Y',
`ip` char(16) NOT NULL,
PRIMARY KEY (`uniqueid`),
KEY `postedby` (`postedby`),
KEY `type` (`type`,`ontime`,`date`),
KEY `order_id` (`order_id`,`type`,`uniqueid`),
KEY `lookup order` (`order_id`,`ticket_id`,`visible`,`date`),
KEY `ticket_id` (`ticket_id`),
KEY `visible` (`visible`)
) ENGINE=InnoDB AUTO_INCREMENT=5907756 DEFAULT CHARSET=utf8
EXPLAIN SELECT type FROM orderhistory WHERE (order_id = '943581' OR ticket_id = '943581') AND visible = 'Y' limit 1
__________________
Found a flower or bug and don't know what it is ? agrozoo.net galery
if you don't spot search button at once, there is search form: agrozoo.net galery search
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE orderhistory ref order_id,lookup order,ticket_id,visible visible 3 const 2709231 Using where
The first rule whenever such simple query gets slow, this is my huntch, is to make a key on cols in question, that is only on order,ticket_id,visible which is not current case.
Try that, and see if it helps.
If explain shows that it does not use that new key of yours, you can force use of your key via sql command, google for that, doh I'm not certain that that works.
__________________
Found a flower or bug and don't know what it is ? agrozoo.net galery
if you don't spot search button at once, there is search form: agrozoo.net galery search
explain SELECT type FROM orderhistory FORCE INDEX (`visible_order_id_ticket_id`) WHERE (order_id = '944704' OR ticket_id = '944704') AND visible = 'Y' limit 1
Code:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE orderhistory ref visible_order_id_ticket_id visible_order_id_ticket_id 3 const 1922326 Using where
we now have a faster query (but it's still scanning 1.9 million rows for some reason).
If I removed and visible = 'Y' I get a good quality query:
explain SELECT type FROM orderhistory WHERE (order_id = '944704' OR ticket_id = '944704') limit 1
Code:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE orderhistory index_merge order_id,lookup order,ticket_id lookup order,ticket_id 4,4 NULL 5 Using sort_union(lookup order,ticket_id); Using where
What am I missing? I've googled so hard for the solution here.
I should have paid more attention in database class
order_id = '944704' OR ticket_id = '944704'
order_id = 944704 OR ticket_id = 944704
would do the same, mybe mysql is doing some conversion there
also I would try
`visible` char(1) NOT NULL DEFAULT 'Y',
instead try:
`visible` INT NOT NULL DEFAULT 1//as 1 = 'Y', 0 = 'N'
EDIT: since you mentioned database class, do you have connection pooling ?
If not, opening connection each time can eat substantial amount of that x second of yours.
__________________
Found a flower or bug and don't know what it is ? agrozoo.net galery
if you don't spot search button at once, there is search form: agrozoo.net galery search
Thanks again for the input. I just tried it with INT matching as you suggested but this benchmarks the same as with the apostrophes so I guess mysql is correcting the query before processing.
Also, I just tried your other idea creating another column called visible2 and populated that with INT 1 /0, added index but the query still scans 2 million rows.
I am not sure about connection pooling - that's new to me - I'll read up on it, but I am only connecting to the DB one time per pageview. I'm using Amazon AWS RDS on default settings. Btw, this query is run only occassionally and I then cache it in memcached to prevent repeat visits from hammering the DB. But, 80% are cache misses as they are new customers. It's just a sloooow query, and I need that visible flag.
Guys, clutching at straws here, but I feel like I've exhausted my possibilities (or at least my time) trying to fix this. I don't suppose upgrading from 5.1.57 to 5.5.x is likely to just solve this with a better preprocesor , is it? Or am I still still misunderstanding mysql fundamentals?
I think you have the essence of the problem: MySQL can't decide WHICH index to use when you are looking at an OR condition.
One of the HUGE problems with MySQL is that it can *NOT* use just a PART of an index unless that PART is at the beginning of the index.
So if you had (for example) an index on ( orderi_id, ticket_id, visible2 ) [all 3 fields], MySQL would NOT be able to use that index with your OR where clause *except* for order_id. It wouldn't be able to use ( order_id, visibie2 ) because the "parts" there are not contiguous. It wouldn't be able to use ( ticket_id ) at all, because that "part" isn't at the start of the index.
It's a real deficiency in MySQL, not shared by all other DBs.
So your only real choice is to have two indexes: ( order_id, visible2 ) and ( ticket_id, visible_2 ) and then, as you noted, make two separate queries.
Now..
You *could* do:
Code:
SELECT U.* FROM
( SELECT * FROM orderhistory WHERE order_id = 555444 AND visible2 =1
UNION
SELECT * FROM orderhistory WHERE ticket_id = 555444 AND visible2 =1
) AS U
And that should work, assuming you ndeed have two indexes: (order_id, visible2) and (ticket_id,visible2).
But it won't likely be the most efficient way. Assume, for the sake of argument, that indeed you find a match based on order_id and visible2. With the UNION query, MySQL will *still* have to make the second query based on ticket_id.
So, indeed, the two separate queries is likely the best choice. If you have *ANY* inkling which is more likely to match--ticket_id or order_id--then make that query first, so you only have to make the second less likely query if you don't get a match on the first.
NOTE: You *could* do all this in a STORED PROCEDURE. And if you really have no idea which is most likely to match (ticket_id or order_id) then that would be a fine choice. But, again, if you do have (say) some way of guessing which is more likely in any given circumstance, you'd be better off with the two separate queries, in most-likely order.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.