Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New Coder
    Join Date
    Apr 2011
    Posts
    92
    Thanks
    26
    Thanked 0 Times in 0 Posts

    Deleting duplictates from Mysql table

    I have a table which lists friend relationships, with the following columns: requested_id, requesting_id, connection_type, request_date

    In some cases (due to multiple friend requests between the same two members), I end up with duplicate entries in my table.

    I am trying to come up with a mysql query which would DELETE, all of the duplicate rows BUT would leave only the most recent row.
    I was thinking something along the lines of:

    Code:
    if($duplicatecounter > 1){
    	mysql_query("DELETE friends WHERE (($id IN (requesting_id, requested_id)) AND ($sid IN (requesting_id, requested_id)))" LIMIT 1) or die("could not delete the rows");
    }
    But this does only appears to delete one row, but there may be more than only 1 duplicate. And it does not take into consideration the request_date... Any suggestions?

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,189
    Thanks
    75
    Thanked 4,341 Times in 4,307 Posts
    If *ALL* of those fields are duplicated, then you won't be able to do this unless there is also another field in the table that gives you a unique key. For example, if the table has an auto_increment field as its primary key, then you are fine.

    But if, for example, request_date is actually a DATETIME field that was given a value of NOW() when it was created (so that two requests would have to match TO THE SECOND to be duplicates), then it's not hard.

    And no, you should *NOT* involve PHP in this, at all.

    Just SQL.

    But question: Do you consider that a record with (requesting_id, requested_id) of (77,33) is a duplicate of (33,77)? That is, does a friendship record work both ways? Or only one way?

    Incidentally, if request_date really is a date-only value, so the 100% duplicates are likely to occur, then an easy way to fix this would be to go ahead and add an auto_increment field to the table. Yes, MySQL lets you do that ex post facto.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,189
    Thanks
    75
    Thanked 4,341 Times in 4,307 Posts
    Here's how you can do it if (77,33) is *NOT* the same as (33,77) *AND* if your request_date is kept to the second:

    Code:
    DELETE FROM friends
    USING friends, 
          ( SELECT MAX(request_date) AS maxdate, requesting_id, requested_id
            FROM friends GROUP BY requesting_id, requested_id ) AS M
    WHERE friends.requesting_id = M.requesting_id
      AND friend.requested_id = M.requested_id
      AND friend.request_date != M.maxdate
    (Tested with MySQL verstion 5.5 and it works.)

    It's a lot tougher if (77,33) *is* the same as (33,77). But possible.
    Last edited by Old Pedant; 06-18-2011 at 06:33 AM.

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,189
    Thanks
    75
    Thanked 4,341 Times in 4,307 Posts
    Since I figured it out, thought I might as well show the query you can use if (33,77) and (77,33) *are* to be treated as duplicates. This still only works if request_date is kept to the second.

    Code:
    DELETE FROM friends
    USING friends,
          ( SELECT IF(requesting_id<requested_id,requesting_id,requested_id) AS id1, 
                   IF(requesting_id>requested_id,requesting_id,requested_id) AS id2, 
                   MAX(request_date) AS maxdate
            FROM friends
            GROUP BY IF(requesting_id<requested_id,requesting_id,requested_id), 
                     IF(requesting_id>requested_id,requesting_id,requested_id) 
          ) AS m
    WHERE IF(friends.requesting_id<friends.requested_id,friends.requesting_id,friends.requested_id) = m.id1
      AND IF(friends.requesting_id>friends.requested_id,friends.requesting_id,friends.requested_id) = m.id2
      AND request_date <> m.maxdate
    Isn't that ugly? But still much better than getting PHP involved.

  • #5
    New Coder
    Join Date
    Apr 2011
    Posts
    92
    Thanks
    26
    Thanked 0 Times in 0 Posts
    Thanks for your response. To answer your questions, Yes there is an auto-increment field, and NO the request_date (which is based on a NOW() value) does not have to be duplicate. And I only want to delete the duplicates in the situation that (65,22) and (65,22) *or* (33,77) and (77,33) *are* to be treated as duplicates (although I have my INSERT set up so that the latter situation should not occur, I guess it's better to be safe, in case someone fools around with my GET's), and I want to keep only one row which has the later "request_date".

    So I assume that the code in your last post would do this, correct? But just be sure, the request_date does NOT have to be duplicate, only the requested_id, requesting_id... Just wanted to confirm before I try it out. Thanks.

  • #6
    New Coder
    Join Date
    Apr 2011
    Posts
    92
    Thanks
    26
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Since I figured it out, thought I might as well show the query you can use if (33,77) and (77,33) *are* to be treated as duplicates. This still only works if request_date is kept to the second.

    Code:
    DELETE FROM friends
    USING friends,
          ( SELECT IF(requesting_id<requested_id,requesting_id,requested_id) AS id1, 
                   IF(requesting_id>requested_id,requesting_id,requested_id) AS id2, 
                   MAX(request_date) AS maxdate
            FROM friends
            GROUP BY IF(requesting_id<requested_id,requesting_id,requested_id), 
                     IF(requesting_id>requested_id,requesting_id,requested_id) 
          ) AS m
    WHERE IF(friends.requesting_id<friends.requested_id,friends.requesting_id,friends.requested_id) = m.id1
      AND IF(friends.requesting_id>friends.requested_id,friends.requesting_id,friends.requested_id) = m.id2
      AND request_date <> m.maxdate
    Isn't that ugly? But still much better than getting PHP involved.
    Old Pedant,

    I tested the code (above) and it works great. Except... it will not delete the duplicate rows unless the request_date is different. How could I modify the query to not be dependent on a duplicate date?

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,189
    Thanks
    75
    Thanked 4,341 Times in 4,307 Posts
    Since you have an auto_increment field, and since I assume that it is ordered by when the records were added, why not just use it in place of request_date throughout the above query?

    Assuming it is called "recnum", just use:
    Code:
    DELETE FROM friends
    USING friends,
          ( SELECT IF(requesting_id<requested_id,requesting_id,requested_id) AS id1, 
                   IF(requesting_id>requested_id,requesting_id,requested_id) AS id2, 
                   MAX(recnum) AS maxrec
            FROM friends
            GROUP BY IF(requesting_id<requested_id,requesting_id,requested_id), 
                     IF(requesting_id>requested_id,requesting_id,requested_id) 
          ) AS m
    WHERE IF(friends.requesting_id<friends.requested_id,friends.requesting_id,friends.requested_id) = m.id1
      AND IF(friends.requesting_id>friends.requested_id,friends.requesting_id,friends.requested_id) = m.id2
      AND friends.recnum <> m.maxrec

  • Users who have thanked Old Pedant for this post:

    Juniper747 (06-18-2011)

  • #8
    New Coder
    Join Date
    Apr 2011
    Posts
    92
    Thanks
    26
    Thanked 0 Times in 0 Posts
    Worked perfectly, thanks!


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •