PDA

View Full Version : Help Optimizing Query


Daniel Israel
05-27-2009, 09:46 PM
Hi, I was hoping someone here would have a good idea to help me optimize a query. I have a database that's keeping track of a users proficiency in a test. The user asks to take a test, then the test is constructed of a random sample (and order) of questions. There are 2 tables, tracking, and instance.

The tracking table keeps track of users test instance and score
The instance table tracks the test instances (questions and responses)

They look like this:

CREATE TABLE `ert_tracking` (
`id` int(11) NOT NULL auto_increment,
`user_id` varchar(50) NOT NULL,
`completed` tinyint(1) NOT NULL default '0',
`data` int(11) NOT NULL,
`instance` varchar(50) NOT NULL,
`created` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `UserIndex` (`user_id`)
);


CREATE TABLE IF NOT EXISTS `ert_test_instance` (
`id` int(11) NOT NULL auto_increment,
`instance` varchar(50) NOT NULL,
`question` int(11) NOT NULL,
`answer` int(11) NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `instance` (`instance`)
);

So what I need to do is to get a list of all the questions that have been presented to a given user. So I use a subquery to get the test instances from that user and query the instance table. It looks like this:

SELECT * FROM ert_test_instance WHERE instance IN (SELECT instance FROM ert_tracking WHERE user_id='<SOME ID>')

This seems to be taking a long time. So I did an explain... the ert_tracking table has like 2000 rows and the row count in the EXPLAIN (for the subquery) is 16, so that's dandy, but the primary query says it has no key to use (although I clearly defined an instance key and reference it) so it's going through all several thousand of the instance rows...

I know I can do a join (which works great), but the next step is to delete all the instance items for a particular user, so a join won't help me on the delete (or will it?)

Any ideas on this one? Thanks.

Old Pedant
05-27-2009, 11:41 PM
Yes, you can use a JOIN when doing a DELETE in MySQL.

http://dev.mysql.com/doc/refman/5.1/en/delete.html

Daniel Israel
05-27-2009, 11:47 PM
Yes, you can use a JOIN when doing a DELETE in MySQL.

http://dev.mysql.com/doc/refman/5.1/en/delete.html


It looks like that would delete from the tracking table as well, which isn't what I want to do.

bazz
05-28-2009, 12:27 AM
MIght you need to add a where clause after the suub query?
where instance = 'value'

I am just awake ( perhaps not) and I think you could index your column for 'instance' in ert_tracking. this might reduce the need for a full table scan?

As separate issue, is user_id unique? it should then be your PK. and if in ert_test_instance, surely 'instance', 'question', 'answer' is unique as a composite key? would do away with the need for auto_increment PK's.

I was trying to write you a query and then thought - you need to read up on foreign keys to be able to join your tables efficiently. which version of mysql are you using? >4.0 I think allows for FK usage.

bazz

Daniel Israel
05-28-2009, 12:42 AM
MIght you need to add a where clause after the suub query?
where instance = 'value'

I am just awake ( perhaps not) and I think you could index your column for 'instance' in ert_tracking. this might reduce the need for a full table scan?


I can try that... It definitely won't HURT... :)

As separate issue, is user_id unique? it should then be your PK. and if in ert_test_instance, surely 'instance', 'question', 'answer' is unique as a composite key? would do away with the need for auto_increment PK's.


No. user_id is not unique in the tracking table (the user might take the same test twice). I could make a composite key, but I *NEVER* query on instance, question, and answer at the same time.

I was trying to write you a query and then thought - you need to read up on foreign keys to be able to join your tables efficiently. which version of mysql are you using? >4.0 I think allows for FK usage.

bazz

I can join just fine, and it works great for the SELECT... but it's the DELETE I'm worried about because I'll only want to delete from the instance table, not the tracking table.

Thanks!

bazz
05-28-2009, 01:26 AM
No. user_id is not unique in the tracking table (the user might take the same test twice). I could make a composite key, but I *NEVER* query on instance, question, and answer at the same time.


I may be wrong because I am still learning but composite keys ~ or what they comprise ~ has nothing to do with your queries. Maybe you would only need to use instance and question, to establish a unique key?

At a guess that instance means the 'test name' then you could use a composite key of instance, user_id and timestamp/date. if they could take the same test twice on the same day, then timestamp would worko instead of date. Otherwise date should suffice.

You can join only on unique/primary keys. (not sure abuot unqiue), so instance would need to be a unique key/pk if it is needed for a join in your delete statement.

bazz

Daniel Israel
05-28-2009, 09:03 PM
I may be wrong because I am still learning but composite keys ~ or what they comprise ~ has nothing to do with your queries. Maybe you would only need to use instance and question, to establish a unique key?

At a guess that instance means the 'test name' then you could use a composite key of instance, user_id and timestamp/date. if they could take the same test twice on the same day, then timestamp would worko instead of date. Otherwise date should suffice.

You can join only on unique/primary keys. (not sure abuot unqiue), so instance would need to be a unique key/pk if it is needed for a join in your delete statement.

bazz

instance is a unique ID (GUID actually) that's generated when the test starts. When the user selects a test, the system randomly selects questions (associated with the test, there are several tests in the system) and gives it an instance ID. The user can take the same test many times, but the questions (and order) will not be the same.

The instance and question would, indeed, be unique, but since I never query on both instance and question, it's not useful. I'm wondering if the instance index is not used because of subquery in the "IN" clause?