View Full Version : Change query to do math in php first

12-20-2011, 11:52 PM
I'm developing a site on the Joomla platform that uses a component called sobi2 with a radius search plugin for a directory of meetings around the country which is rather large (around 100,000 items).

One thing to mention right from the beginning here is the way this database is designed for this sobi2 component is pretty funky, every field value gets it's own row instead of being in a column which makes thing complicated right from the start.

To give you a quick idea of our hardware set up, we're using a dedicated dual core box with 4gb ram and 2x2tb hdd's for a dedicated web server, and a second dedicated quad core machine with 12gb ram for just the database. I've got the my.cnf file configured pretty well I think, I use mysqltuner to help gauge where I'm at on it and it seems to be pretty good.

We've come to a point where doing a search in this meetings directory causes our cpu to spike. I think one of the biggest issues is the query being done to figure out the radius distance, the first issue I see is it has a couple joins on it's self which is needed due to each field being in a new row instead of just having the longitude and latitude values in their own columns. The second issue I see is it's doing the math within the query which I'm sure is taxing the cpu some.

So with all of that said, I'll just get to the actual query, here's how it is originally:

// $query = "SELECT lat.itemid, IFNULL( ({$this->earthRadius} * SQRT(2 * (1 - COS(RADIANS(lat.data_txt)) * COS({$latitude}) * (SIN(RADIANS(lon.data_txt)) * SIN({$longitude}) + COS(RADIANS(lon.data_txt)) * COS({$longitude})) - SIN(RADIANS(lat.data_txt)) * SIN({$latitude})))), 0 ) AS distance FROM #__sobi2_fields_data AS lat LEFT JOIN #__sobi2_fields_data AS lon ON lat.itemid = lon.itemid WHERE (lat.fieldid = '{$this->latField}'AND lon.fieldid = '{$this->longField}' AND lat.data_txt RLIKE '^[-]{0,1}[0-9]{1,}[0-9\\.]{0,}$' AND lon.data_txt RLIKE '^[-]{0,1}[0-9]{1,}[0-9\\.]{0,}$' ) HAVING distance <= {$radius}";

Now I added a longitude and latitude column to the database and copied the values over to them so I could do away with the joins, then added an index on the new latitude and longitude columns along with the data_txt column. I changed the query to the following where I removed the joins:

$query = "SELECT itemid, IFNULL( ({$this->earthRadius} * SQRT(2 * (1 - COS(RADIANS(latitude)) * COS({$latitude}) * (SIN(RADIANS(longitude)) * SIN({$longitude}) + COS(RADIANS(longitude)) * COS({$longitude})) - SIN(RADIANS(latitude)) * SIN({$latitude})))), 0 ) AS distance FROM #__sobi2_fields_data WHERE (fieldid = '{$this->latField}' AND latitude RLIKE '^[-]{0,1}[0-9]{1,}[0-9\\.]{0,}$' AND longitude RLIKE '^[-]{0,1}[0-9]{1,}[0-9\\.]{0,}$' ) HAVING distance <= {$radius}";

Although this query works, the cpu goes even higher so I'm actually losing efficiency, obviously it's not using my index as I'd hoped.

What would you guys recommend to make this query more efficient?
I'm working on breaking the math out of the query now and running it in php first then just passing the values to the query, I'm assuming that should help some as I've always been told not to do math in mysql. Plus it would be putting the load from the math end on the web server instead of on the database server by doing that as well. Any thoughts on that?

Old Pedant
12-21-2011, 02:02 AM
Look at post #4 in this thread from two days ago:

But first of all, what kinds of distances are you talking about? And what kind of accuracy to you need? The formula you are using there is accurate to about 10 meters over a few hundred miles (maybe even a couple of thousand). If accuracy within a quarter mile over a distance of, say, 100 miles is adequate, you don't need that forumula. The Pythagoriean theorem is perfectly adequate. Though you do have to account the differences in longitudinal distance that depends on the latitude. You do that by *NOT* using latitude and longitude in degrees but by instead precalculating them in miles (or kilometers).

To say the DB design sucks is any understatement, but not much we can do about that except suggest you find a platform that uses a *good* database design.

I'm not clear at all why you think you need to use RLIKE on the latitude and longitude numbers. Given that you say you added those fields to the table, why in the world did you add them as STRINGS???? They should be real (double precision) numeric fields. If you wanted to take into account the possibility of invalid values, you should have just used NULLs to mark invalid latitude or longitude. RLIKE is going to perform like crap.

12-21-2011, 04:30 PM
Thanks for the reply Old Pendant,

The rlike was part of the original query that was part of that plugin, and is already on the todo list to remove, all I really changed in the query to this point was to remove the joins. I believe the reason they had that in there was due to how the code was set up to determine if a person was searching by zipcode or by city and state. I know like statements can prevent indexes from being used, do you think this is the case here?

The search options we're using go up to 100 miles so we just need to be accurate to that amount.

Here's where I have the query now which seems to be performing better, I'm still gauging it over several more searches to see how it goes but it seems to have shaved a couple of seconds off of the search time and is being a little nicer to the cpu:

$query = "SELECT itemid, IFNULL( ({$this->earthRadius} * SQRT(2 * (1 - COS(RADIANS(latitude)) * COS({$latitude}) * (SIN(RADIANS(longitude)) * SIN({$longitude}) + COS(RADIANS(longitude)) * COS({$longitude})) - SIN(RADIANS(latitude)) * SIN({$latitude})))), 0 ) AS distance FROM #__sobi2_fields_data HAVING distance <= {$radius}";

So at this point it no longer uses any joins and the rlike is gone, in fact the entire where clause is gone now.
You would recommend reworking this math though? Would you still keep it within the query or would you try and work the math in php first? I've always heard not to do math in sql if you don't have to but digging around Google it seems a lot of people think doing your math in sql is faster than php. I'm by no means a mysql or database expert so I'm wide open to suggestions on the whole thing.

Old Pedant
12-21-2011, 09:08 PM
Again, go read that other thread. I get the strong impression you did not do so.

If you want real performance, you want to *AVOID* doing the distance calculations for every latitude and longitude.

So if 100 miles is the max distance, then simply precalculate a *SQUARE* (or approximately a square) that is 200 miles on each side. Then *only* choose records where the lat/long is inside that square to finally run the radius test on.

Making a "square" based on longitude is a little tricky, because of course the number of mile in one degree of longitude depends on what the latitude is. But you can approximate that by using the furthest north latitude and, from it, calculating what 100 miles would be in tems of degrees and minutes of longitude is at that latitude. That will make your square more like a trapezoid, where the top is 200 miles long and the bottom is maybe 210 miles long, but that will be more than adequate to limit your actual radius calculations to a small subset of all the points in the table.

So, one more time, go read that other thread.

12-21-2011, 11:06 PM
Right, I read through your other post and I know in there you're doing your calculation in php, which is how I intended changing this query from the beginning, the reason I asked about doing calculations in mysql as opposed to php is that I'm reading in various places that the calculation would be faster in the query than in php, which is opposite of what I've always understood as the case.

I still intend to take the calculations out of the query either way as my main concern at the moment is server resources and this takes that overhead out of mysql and off of our database server and spreads that over to the web server, but I'm still curious on your thoughts on calculation efficiency in one opposed to the other forgetting resource usage for the moment (hypothetical).

Old Pedant
12-21-2011, 11:49 PM

Usually, you *want* to do the actual radius calculations in MySQL. If you don't do that, then you will be getting many many records from the query that you don't want and then simply throwing them away in PHP. And that's what those people who say is is more efficient in MySQL mean.

Any single calculation will be more efficient in PHP than in MySQL. Not by tons, but clearly by some measurable amount.

My code is, I think, giving you about the best of all worlds. You use PHP to calculate the limits of the square (latitude +/- radius by longitude +/- radius...so a square that is radius*2 on each side...roughly). Then you feed that into MySQL and, if latitude and longitude fields are indexed, you get a really efficient SELECT of *ONLY* those points that fall within the square.

*THEN* you can do the radius calculation in MySQL.

But, if you wanted, you could at that point opt to get all the points within the square and do the radius calculation in PHP. Assuming roughly equal distribution of the points within the square and within the limiting radius, you will be *keeping* PI*radius*radius divided by 4*radius*radius proportion of the points. That is, you will be keeping more than 3/4ths of the points and only throwing away roughly 21% of the points.

Whether you do the radius calculation in MySQL or PHP becomes pretty much a wash at this juncture.

As I mentioned, *IF* you would pre-compute your latitude and longitude in miles (or kms) instead of leaving them in degrees, you could simplify the actual radius calculation down to what I showed in that other thread: delta-X squared plus delta-Y squared.

But even if you don't, you will be vastly cutting down on the number of points where you have to do the ugly calculations.

Here's how I actually do this with zip codes:

That's done in ASP code, using an Access database, so it's actually much *less* efficient than PHP with MySQL would be. And you can see it's plenty fast.

The big "trick" that I use--getting the right miles of longitude for the given longitude--is done thus:

I add two fields to the table, latitudeMiles and longitudeMPD (Miles Per Degree).

Then I run an update query on the table:

SET latitudeMiles = 69.1*latitude,
longitudeMPD = 69.1*cos(latitude*3.14159265/180);

69.1 is accurate enough for zip code calculations, at least.

And now I have that longitudeMPD value that allows me to create the "square" around the target point based on miles rather than on degrees.

12-22-2011, 04:31 PM
Excellent excellent explanation.
Thanks again for all the help.