...

View Full Version : Deleting duplictates from Mysql table



Juniper747
06-18-2011, 05:30 AM
I have a table which lists friend relationships, with the following columns: requested_id, requesting_id, connection_type, request_date

In some cases (due to multiple friend requests between the same two members), I end up with duplicate entries in my table.

I am trying to come up with a mysql query which would DELETE, all of the duplicate rows BUT would leave only the most recent row.
I was thinking something along the lines of:



if($duplicatecounter > 1){
mysql_query("DELETE friends WHERE (($id IN (requesting_id, requested_id)) AND ($sid IN (requesting_id, requested_id)))" LIMIT 1) or die("could not delete the rows");
}


But this does only appears to delete one row, but there may be more than only 1 duplicate. And it does not take into consideration the request_date... Any suggestions?

Old Pedant
06-18-2011, 07:10 AM
If *ALL* of those fields are duplicated, then you won't be able to do this unless there is also another field in the table that gives you a unique key. For example, if the table has an auto_increment field as its primary key, then you are fine.

But if, for example, request_date is actually a DATETIME field that was given a value of NOW() when it was created (so that two requests would have to match TO THE SECOND to be duplicates), then it's not hard.

And no, you should *NOT* involve PHP in this, at all.

Just SQL.

But question: Do you consider that a record with (requesting_id, requested_id) of (77,33) is a duplicate of (33,77)? That is, does a friendship record work both ways? Or only one way?

Incidentally, if request_date really is a date-only value, so the 100% duplicates are likely to occur, then an easy way to fix this would be to go ahead and add an auto_increment field to the table. Yes, MySQL lets you do that ex post facto.

Old Pedant
06-18-2011, 07:20 AM
Here's how you can do it if (77,33) is *NOT* the same as (33,77) *AND* if your request_date is kept to the second:



DELETE FROM friends
USING friends,
( SELECT MAX(request_date) AS maxdate, requesting_id, requested_id
FROM friends GROUP BY requesting_id, requested_id ) AS M
WHERE friends.requesting_id = M.requesting_id
AND friend.requested_id = M.requested_id
AND friend.request_date != M.maxdate

(Tested with MySQL verstion 5.5 and it works.)

It's a lot tougher if (77,33) *is* the same as (33,77). But possible.

Old Pedant
06-18-2011, 08:07 AM
Since I figured it out, thought I might as well show the query you can use if (33,77) and (77,33) *are* to be treated as duplicates. This still only works if request_date is kept to the second.



DELETE FROM friends
USING friends,
( SELECT IF(requesting_id<requested_id,requesting_id,requested_id) AS id1,
IF(requesting_id>requested_id,requesting_id,requested_id) AS id2,
MAX(request_date) AS maxdate
FROM friends
GROUP BY IF(requesting_id<requested_id,requesting_id,requested_id),
IF(requesting_id>requested_id,requesting_id,requested_id)
) AS m
WHERE IF(friends.requesting_id<friends.requested_id,friends.requesting_id,friends.requested_id) = m.id1
AND IF(friends.requesting_id>friends.requested_id,friends.requesting_id,friends.requested_id) = m.id2
AND request_date <> m.maxdate


Isn't that ugly? But still much better than getting PHP involved.

Juniper747
06-18-2011, 03:39 PM
Thanks for your response. To answer your questions, Yes there is an auto-increment field, and NO the request_date (which is based on a NOW() value) does not have to be duplicate. And I only want to delete the duplicates in the situation that (65,22) and (65,22) *or* (33,77) and (77,33) *are* to be treated as duplicates (although I have my INSERT set up so that the latter situation should not occur, I guess it's better to be safe, in case someone fools around with my GET's), and I want to keep only one row which has the later "request_date".

So I assume that the code in your last post would do this, correct? But just be sure, the request_date does NOT have to be duplicate, only the requested_id, requesting_id... Just wanted to confirm before I try it out. Thanks.

Juniper747
06-18-2011, 09:27 PM
Since I figured it out, thought I might as well show the query you can use if (33,77) and (77,33) *are* to be treated as duplicates. This still only works if request_date is kept to the second.



DELETE FROM friends
USING friends,
( SELECT IF(requesting_id<requested_id,requesting_id,requested_id) AS id1,
IF(requesting_id>requested_id,requesting_id,requested_id) AS id2,
MAX(request_date) AS maxdate
FROM friends
GROUP BY IF(requesting_id<requested_id,requesting_id,requested_id),
IF(requesting_id>requested_id,requesting_id,requested_id)
) AS m
WHERE IF(friends.requesting_id<friends.requested_id,friends.requesting_id,friends.requested_id) = m.id1
AND IF(friends.requesting_id>friends.requested_id,friends.requesting_id,friends.requested_id) = m.id2
AND request_date <> m.maxdate


Isn't that ugly? But still much better than getting PHP involved.

Old Pedant,

I tested the code (above) and it works great. Except... it will not delete the duplicate rows unless the request_date is different. How could I modify the query to not be dependent on a duplicate date?

Old Pedant
06-18-2011, 09:42 PM
Since you have an auto_increment field, and since I assume that it is ordered by when the records were added, why not just use it in place of request_date throughout the above query?

Assuming it is called "recnum", just use:


DELETE FROM friends
USING friends,
( SELECT IF(requesting_id<requested_id,requesting_id,requested_id) AS id1,
IF(requesting_id>requested_id,requesting_id,requested_id) AS id2,
MAX(recnum) AS maxrec
FROM friends
GROUP BY IF(requesting_id<requested_id,requesting_id,requested_id),
IF(requesting_id>requested_id,requesting_id,requested_id)
) AS m
WHERE IF(friends.requesting_id<friends.requested_id,friends.requesting_id,friends.requested_id) = m.id1
AND IF(friends.requesting_id>friends.requested_id,friends.requesting_id,friends.requested_id) = m.id2
AND friends.recnum <> m.maxrec

Juniper747
06-18-2011, 10:38 PM
Worked perfectly, thanks!



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum