...

View Full Version : Help with a PHP-MySQL Query



muneeba9071
11-12-2012, 07:48 PM
I have a MySQL table with which stores LocationA, LocationB and distance. There are some duplicate entries, for example.



LocationA LocationB Distance
London Birmingham 180 miles
Birmingham London 180 miles.


As you can see, the second record is a duplicate because we already know the distance of London=>Birmingham using the first row. I have over 382,000 rows in total and these 382,000 can be reduced by 50% if the duplicates are removed.

I have tried the following script, but it stops in the middle for some reasons, and I want a MySQL query to do the job:



$results_locations = $wpdb->get_results("SELECT * FROM locations ");
foreach($results_locations as $location) {

$a_to_b = $wpdb->get_row("SELECT * FROM locations WHERE from_location = '{$location->from_location}' && to_location = '{$location->to_location}' ");
$b_to_a = $wpdb->get_row("SELECT * FROM locations WHERE from_location = '{$location->to_location}' && to_location = '{$location->from_location}' ");

if($a_to_b->id && $b_to_a->id) {
$wpdb->query("DELETE FROM locations WHERE id = '{$b_to_a->id}' ");
}

}


I would really appreciate if someone can help writing a MySQL query to fix this issue.

Many Thanks in advance!

mlseim
11-13-2012, 03:28 AM
Always use Google first ...
https://www.google.com/search?q=php+mysql+eliminate+duplicates&btnG=Search

It will allow you to get an answer much quicker than here.

See examples using this:
SELECT DISTINCT


.

muneeba9071
11-13-2012, 12:10 PM
mlseim. If it was a simple duplicate issue, I would have fixed it myself. But you didn't read my query correctly. It's not a straightforward duplicate, but rows with same A-B, B-A destinations that I wand removed.

alemcherry
11-13-2012, 01:22 PM
I guess this is for a one time operation, so you need not waste time looking for the most optimum method. The PHP script is getting timed out probably because there is a maximum timeout of 30 seconds or so. You can simply run it multiple times or set_time_limit(0). Just echo something after each deletion so you know something is happening. Few referesh will do the work, unless you have a huge database.

One simple suggestion, you just need one query inside the loop. First query is redundant, since you already have it.

I am sure there is more efficient way to do it, but since this is a one time job, there is no point in wasting time.

Fou-Lu
11-13-2012, 05:32 PM
I wouldn't suggest PHP have anything to do with removal of "duplicates" (or redundancies in this case).
Lets see. If you self join the table and join on lhs.locationA = rhs.locationB, that should let you find the duplicate rows. Now, if you issue a delete on that would that work? No, that would end up with every recordset anyway.
You should check with the SQL guys for the query to issue here. There should be a way to remove the redundant records using only SQL. The tricky part will be to only fetch one of the records, not both of the records, but this should be completely a SQL based job, not a PHP one.

Now this said, I would suggest that this also isn't really a problem. Assuming you are not seeing diskspace or performance issues (unless you don't index the table properly), then theoretically its not incorrect. I'd see that as multiple paths between two nodes available on a directional graph. This design would work if you had a one-way street for example (thinking city level instead of country level since I doubt there exists a one way highway), but in practice this may be more to your benefit to retain both directions. This way you could flag a single direction as being down for road repairs for example, and therefore remove the relation between point B to point A, but retain the relation between point A and point B.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum