Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder
    Join Date
    Apr 2006
    Posts
    311
    Thanks
    17
    Thanked 0 Times in 0 Posts

    Complicated (sort-of) duplicates query

    Hey All,

    I've been struggling for a couple of days now with a duplicates issue which leaves me... well -- just plain stumped

    In a relationship table "contacts_orders" (id,contact_id,orders_id,date,deleted) it is acceptable and even desirable
    for there to be TWO records with the SAME orders_id, each having a DISTINCT contact_id (1 sellers_agent_id, 1 buyers_agent_id)
    and also for there to be many instances of a single contact_id having been recorded for distinct orders(unique orders_id's.)

    On the occasion where there are duplicates -- that is, the SAME contact_id recorded MORE THAN ONCE for a given orders_id --
    HOW BEST MIGHT I CHECK FOR AND ELIMINATE THESE DUPES??

    IS IT POSSIBLE IN A SINGLE QUERY(sub-query) STATEMENT?
    Something like ...
    PHP Code:
    DELETE FROM contacts_orders GROUP BY orders_idcontact_id HAVING count(*)>

  • #2
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    You can use those two columns as your primary key, or you can simply create a unique index based on those two columns, and avoid duplicates by keeping them from existing in the first place.

    To clean up your table as it is you can't use a GROUP BY on a DELETE query, so just write a little script that selects the offending rows and then deletes them. Is it really that important that you do it all in one query?

  • #3
    Senior Coder TheShaner's Avatar
    Join Date
    Sep 2005
    Location
    Orlando, FL
    Posts
    1,126
    Thanks
    2
    Thanked 40 Times in 40 Posts
    The query below I wrote will show all records with duplicate contact_ids and orders_ids. Then you'll have to select the method to choose which record to delete (like maybe keeping the record with the lowest id number and deleting the others). You won't be able to do it in one query. This will require a nice little script to complete this.

    To select duplicate records:
    Code:
    SELECT tbl1.id, tbl1.contact_id, tbl1.orders_id
    FROM contacts_orders tbl1
    WHERE tbl1.contact_id IN
    (
    SELECT tbl2.contact_id 
    FROM contacts_orders tbl2 
    GROUP BY  tbl2.contact_id,  tbl2.orders_id 
    HAVING Count(*) > 1  
    AND tbl2.orders_id = tbl1.orders_id
    )
    ORDER BY tbl1.contact_id, tbl1.orders_id;
    -Shane

  • #4
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,026
    Thanks
    2
    Thanked 315 Times in 307 Posts
    The following is something else to try (tested - seemed like an interesting exercise.)

    Because you cannot DELETE anything in an outer query that is also contained in the inner query, you can use a TEMPORARY table to hold information about the rows that are duplicates.

    This will form the temp table -
    PHP Code:
    $query "CREATE TEMPORARY TABLE new_tbl SELECT contact_id, orders_id
    FROM contacts_orders GROUP BY contact_id, orders_id HAVING COUNT(*) > 1"

    The following will delete one of the duplicate rows each time it is executed. You need to limit the number of rows being deleted to one so that you don't delete all the matching rows.
    PHP Code:
    $query "DELETE FROM contacts_orders
    WHERE (contact_id, orders_id) IN (SELECT contact_id, orders_id FROM new_tbl)
    ORDER BY contact_id, orders_id, date LIMIT 1"

    By executing both of these queries in a loop (I just kept refreshing the page) until the number of affected rows is zero, you can delete all the duplicates.

    You need to modify the ORDER BY clause, as already pointed out by TheShaner, to cause the desired row within each duplicate set to be deleted.

    Edit: I am assuming that you have already modified the code that inserts these so that it checkes for an existing contact_id, orders_id pair and does not insert a duplicate.
    Last edited by CFMaBiSmAd; 03-20-2007 at 06:39 PM. Reason: Emphasized both, in case someone tried to only loop with the second query
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #5
    Super Moderator guelphdad's Avatar
    Join Date
    Mar 2006
    Location
    St. Catharines, Ontario Canada
    Posts
    2,633
    Thanks
    4
    Thanked 148 Times in 139 Posts
    simplest method:

    Code:
    ALTER IGNORE TABLE contacts_orders
    ADD UNIQUE INDEX dropthislater (contact_id,orders_id)
    this will delete any rows where contact_id and orders_id are the same, leaving one of each of those rows behind. You don't get to choose which rows are kept though. If you have two rows the same but an id column with two different values, with this method you don't choose which id value is kept.


    then you can leave that index in place so it doesn't allow that situation to happen again, or drop the index.

  • #6
    Regular Coder
    Join Date
    Apr 2006
    Posts
    311
    Thanks
    17
    Thanked 0 Times in 0 Posts
    WOW!!!
    If thhis is'nt the poster-child for a WELL RESPONDED thread.
    Much to digest. Thanks one and all!

    CFMaBiSmAd,
    While I think your suggestion of a temp_table looks pretty short & sweet (was headed that way - in a really ugly fashion -- myself) your post-script assertion that the offending code should be rectified is rendered moot (along with the need for tmp_table and accompanying php script) -- by Fumigator and guelphdad's suggestion of creating a unique index based on the two id columns.
    To wit, even Fumigator didn't get it entirely right on the first try -- no need to clean up table - and I was going to have ask how to index coulmns in tandem in a follow-up post

    Fumigator,
    guelphdad answered your question for both of us: No, it really is not that important to do it all in one query, just to do it effeciently... like so...
    ALTER IGNORE TABLE contacts_orders
    ADD UNIQUE INDEX dropthislater (contact_id,orders_id)


    I didn't know I could index columns in tandem like that!! COOL!!!
    Thanks guelphdad!
    Last edited by fuzzy1; 03-20-2007 at 07:52 PM. Reason: MY BAD


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •