...

View Full Version : How to speed up my query?



XmisterIS
12-19-2011, 10:24 AM
I have a couple of tables that I am querying together (a join or inner join? Not sure what it's called).

Anyway, here's the background:

I have a grid of cartesian points, each point has a name, and various other attributes. Note that this grid is changeable.

I have a set of items which sit at various grid points. Note that when the grid chages, the item locations need to change with the grid (i.e. each item is referenced to a grid point itself and not to its location).

There are two tables, created as follows:

Firstly, the grid of points:


create table gridPoints (
pointId int(32) unsigned not null auto_increment primary key, -- index into the table.
gridX double precision,
gridY double precision,
... (various other bits and pieces).
);

Secondly, the grid of items:


create table gridItems (
pointId int(32) unsigned not null, -- The reference to the item's location.
... (various other bits and peices).
);

So that's straightforward.

Now, I have written a bit of PHP which returns the items that are within a certain distance of a user-selected cartesian coordinate.

The MySQL query is trivial, it's just pythagoras:



select gi.*, gp.* from gridItems gi, gridPoints gp where
sqrt(pow(gp.gridX - coordX, 2) + pow(gp.gridY - coordY, 2)) < distance
and gi.pointId = gp.pointId;
-- coordX and coordY are numerical values created by the PHP code.

You'll agree that that is extremely simple,

*HOWEVER*

There is a problem!!! It's very simple ... and very slow. The reason is that gridPoints contains in excess of 3,000,000 (yes, 3 million) rows!! gridItems contains about 10,000 rows. The huge number of rows in gridPoints makes the distance query run very very slowly.

One obvious solution is to change gridItems so that it looks like this:



create table gridItems (
pointId int(32) unsigned not null, -- The reference to the item's location.
gridX double precision,
gridY double precision,
... (various other bits and peices).
);

I.e., so that when an item row is added, it's grid X and Y components are copied from the gridPoints table, that way I avoid having the join on a table with 3,000,000 rows. That will make the query much faster. BUT, it also means that the position data is duplicated, which means that when the grid positions change, I need to change all the grid X and grid Y positions in the gridItems table too ... it's clear that's an accident waiting to happen! (in my experience, duplicating data is rarely a good idea).

So ...is there a method that one of you MySQL gurus knows that I can use to speed up my query as it is, without needing to duplicate data?

Thanks in advance :)

guelphdad
12-19-2011, 01:29 PM
please run a SHOW CREATE TABLE statement on each of your tables and show us the output here. Paste between CODE tags.

While you may think 3 million rows is a lot, it is still relatively small in database terms.

If you are missing useful indexes that would be the first step in improving performance.

XmisterIS
12-19-2011, 02:30 PM
One learns a new command every day!!

Ok here, we go. Note that this is on the real tables, not the example ones as shown in my original post. tw_geolocations is the real gridPoints table and tw_members is the real gridItems table.

easting and northing in the geolocations are WGS84 grid easting and northing coordinates (analagous to x and y).


mysql> show create table tw_geolocations;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tw_geolocations | CREATE TABLE `tw_geolocations` (
`row_id` int(32) unsigned NOT NULL,
`outcode` varchar(4) NOT NULL,
`incode` varchar(3) NOT NULL,
`easting` double NOT NULL,
`northing` double NOT NULL,
`district` text NOT NULL,
PRIMARY KEY (`row_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> show create table tw_members;
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tw_members | CREATE TABLE `tw_members` (
`row_id` int(32) unsigned NOT NULL AUTO_INCREMENT,
`when_created` datetime NOT NULL,
`username` text NOT NULL,
`email` text NOT NULL,
`passwordEncrypted` text NOT NULL,
`actid` text NOT NULL,
`access` enum('pending','waiting','member','admin','su','suspended') DEFAULT NULL,
PRIMARY KEY (`row_id`)
) ENGINE=MyISAM AUTO_INCREMENT=41692 DEFAULT CHARSET=latin1 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Old Pedant
12-19-2011, 10:18 PM
I have done something very similar to this when selecting zip codes to be within X miles of a home zip code.

The "trick" is to PRE-select the points that have any possibility at all of being in range.

And, of course, that is simple to do.

You just exclude those where the x or y coordinate is more than distance away from your target point.

And now, if the x and y coordinates are indexed, you end up having to do the actual distance calculation on a much smaller set of points.

The rough idea of the code would be this:

SELECT primaryKey
FROM gridPoints
WHERE gridx BETWEEN coordX-distance AND coordX+distance
AND gridy BETWEEN coordY-distance AND coordY+distance

Make that a NESTED SELECT of your main query.

That is:


SELECT list, of, fields
FROM gridItems AS gi,
gridPoints AS gp,
( SELECT primaryKey
FROM gridPoints
WHERE gridx BETWEEN coordX-distance AND coordX+distance
AND gridy BETWEEN coordY-distance AND coordY+distance ) AS PRE
WHERE
gp.primarykey = PRE.primarykey
AND (pow(gp.gridX - coordX, 2) + pow(gp.gridY - coordY, 2)) < distanceSquared
AND gi.pointId = gp.pointId;

Notes:
(1) Don't use SELECT * (even SELECT gi.*, gp.*), please! Select ONLY the fields you actually need in your PHP code.
(2) Calculate the values shown in red in PHP code, ahead of time.
(3) Of course make sure that gridX and gridY are INDEXED in your table.
(4) BETWEEN *can and will* make use of the indexes on gridX and gridY better than other conditions would.
(5) It's a minor gain, but don't do the SQRT calculation. Square the distance, as shown, to avoid the need for that.

Give it a shot. How much it will improve performance will, of course, depend on how many points lie within the square area that the inner SELECT limits you to but are then outside the circular radius, compared to how many points you eliminate with the pre-selecton of the square.

Old Pedant
12-19-2011, 10:21 PM
p.s.: I think that doing DESCRIBE tablename is easier to read than SHOW CREATE tablename



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum