Go Back   CodingForums.com > :: Server side development > MySQL

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 02-18-2013, 01:29 AM   PM User | #1
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
Delete Duplicates

I'm running the following query to gather a list of duplicates based on 2 fields....lnktxt and cattxt:

SELECT P1.lnktxt, P1.cattxt, P1.dsptxt
FROM TABLE AS P1,
( SELECT lnktxt, cattxt, dsptxt, COUNT(*) AS howmany
FROM TABLE
GROUP BY lnktxt, cattxt, dsptxt
HAVING howmany > 1 ) AS P2
WHERE P1.lnktxt = P2.lnktxt AND P1.cattxt = P2.cattxt

It is displaying fine, and in PHPMyAdmin when I check the boxes to delete the individual records, it is not releasing the P1.lnktxt and the SQL server is erroring out. Is there a better method to gather and display a list of duplicates?
I don't want them to be automatically deleted, I would like to review the records first and select which one to delete.
mharrison is offline   Reply With Quote
Old 02-18-2013, 04:38 PM   PM User | #2
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
If you don't have something that uniquely identifies each record, you can't select which one to delete. In other words, if you don't have a PRIMARY KEY in your (very badly named) TABLE then you can't do this easily.

If you do have a primary key, just SELECT it along with the other fields and *probably* PhpMyAdmin will be able to do it.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 02-18-2013, 06:43 PM   PM User | #3
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
Quote:
Originally Posted by Old Pedant View Post
If you don't have something that uniquely identifies each record, you can't select which one to delete. In other words, if you don't have a PRIMARY KEY in your (very badly named) TABLE then you can't do this easily.

If you do have a primary key, just SELECT it along with the other fields and *probably* PhpMyAdmin will be able to do it.

Well, I used TABLE just to mask the table name...so yes, I would suppose my ruse would be a badly named table.

The reason I asked the question is I recently switched hosts and the previous hosts PHPMyAdmin allowed me to use the exact query (with the correct table name) with no problem, but the new hosts phpmyadmin does not.

And no, I do not have a primary key defined as I was not the original creator of the database and now that I have 23k+ records, I was never sure of the implications of trying to define one now, and especially because there isn't a field that can be truly unique...a link in the lnktxt field would have the same dsptxt field info, but could have a multiple entries, each with a unique cattxt....I would almost have to create a brand new field and populate it with data just to be able to have a primary key.

Last edited by mharrison; 02-18-2013 at 06:49 PM.. Reason: Additional Info
mharrison is offline   Reply With Quote
Old 02-18-2013, 08:39 PM   PM User | #4
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
No no...it's easy!

Just do this:
Code:
ALTER TABLE tablename ADD uniqueid INT AUTO_INCREMENT PRIMARY KEY;
And that will not only add the field, make it a primary key, but it will also RIGHT THEN AND THERE populate *ALL* the existing records with the field!

Presto.

(If you want to, after removing the duplicates you could then remove the column, but I wouldn't bother.)
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

Last edited by Old Pedant; 02-18-2013 at 08:47 PM..
Old Pedant is offline   Reply With Quote
Old 02-19-2013, 02:29 AM   PM User | #5
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
Ok, so now my table has a primary key. When I use any simple query that counts the 2 fields I want to reference for duplicates, it will show me that the record has 2 or more matches, but it does not let me delete one of them.

Example:
SELECT lnktxt, cattxt, COUNT(*) c FROM tablenam GROUP BY lnktxt HAVING c > 1;
mharrison is offline   Reply With Quote
Old 02-19-2013, 03:25 AM   PM User | #6
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
You *MUST* also SELECT the primary key field!

If you do that, in PHPMyAdmin, it should then be able to delete the records based on that primary key.

I don't use PHPMyAdmin, but I could certainly build a web page that would allow this in under an hour, for example.

Code:
SELECT P1.uniqueid, P1.lnktxt, P1.cattxt, P1.dsptxt
FROM TABLE AS P1,
( SELECT lnktxt, cattxt, dsptxt, COUNT(*) AS howmany
FROM TABLE
GROUP BY lnktxt, cattxt, dsptxt
HAVING howmany > 1 ) AS P2
WHERE P1.lnktxt = P2.lnktxt AND P1.cattxt = P2.cattxt AND P1.dsptxt = P2.dsptxt
ORDER BY lnktxt, cattxt, dsptxt
And if you are going to count two records as identical only if all THREE fields agree (lnktxt, cattxt, dsptxt -- which is the meaning of your GROUP BY and HAVING there), then you really should also include those in the WHERE clause, as shown.

If you don't care about whether dsptxt is duplicated or not, then don't use it in the subquery.

Also, if you don't specify an ORDER BY clause, there is no guarantee that identical (except for uniqueid) records will be even close together in the output.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 02-19-2013, 04:57 AM   PM User | #7
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
Even with the modifications to the original query you listed, the problem I am having is not being able to find the records, it's being able to delete them. PHPMyAdmin will show me the records and I can select the records I want to delete, but when it writes a query to delete the records, it errors out because it is still trying to use P1.lnktxt and so on.

Perhaps I should just document the uniqueid's and then write my own query to delete them that way.

My frustration is that with the version of phpmyadmin on my old webhost, I could delete the records by selecting which ones I wanted to delete right from phpmyadmin, but with my new webhost, I cannot as it errors out.
mharrison is offline   Reply With Quote
Old 02-19-2013, 07:35 PM   PM User | #8
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Well, like I said. Write your own code to do this. It's trivial. I don't use PHP, but it will be something as simple as this:
Code:
<form action="myDelete.php" method="post">
<table border="1" cellpadding="3">
<tr>
    <th>DELETE</th><th>Lnktxt</th><th>Cattxt</th><th>Dsptxt</th>
</tr>
<?php
... make your db connection ...
$sql = "SELECT P1.uniqueid, P1.lnktxt, P1.cattxt, P1.dsptxt
        FROM TABLE AS P1,
        ( SELECT lnktxt, cattxt, dsptxt, COUNT(*) AS howmany
          FROM TABLE
          GROUP BY lnktxt, cattxt, dsptxt
          HAVING howmany > 1 ) AS P2
        WHERE P1.lnktxt = P2.lnktxt AND P1.cattxt = P2.cattxt AND P1.dsptxt = P2.dsptxt
        ORDER BY lnktxt, cattxt, dsptxt ";

$result = mysql_query($sql);

while ( row = mysql_fetch_assoc($result) )
{
    $id = row["uniqueid"];
    echo '<tr><td><input type="checkbox" name="delete[]" value="' .$id . '" />' . $id . "</td>\n";
    echo "<td>...the other fields...</td></tr>\n";
}
?>
</table>
<input type="submit" value="delete checked rows">
</form>
Now, as I said, I don't use PHP, but I *THINK* your "myDelete.php" page would be as simple as this:
Code:
<?php
...make db connection ...
$sql = "DELETE FROM TABLE 
        WHERE uniqueid IN (" . implode(",",$_POST["delete"]) . ")";
mysql_query($sql);
?>
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 02-22-2013, 06:27 PM   PM User | #9
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
The php code only errors out, but I'm going to approach this from a different angle.

I have 3 columns I want to deal with here....
lnktxt
cattxt
datetxt

I want to find all records that are the same in lnktxt that also have the same value in cattxt. I then want to delete the record whose datetxt record is older (ie... delete 2012-11-12 rather than 2013-01-25)

I can find the records but I am not sure how I would write the delete statement to be able to distinguish the date criteria, if it can even be done.
mharrison is offline   Reply With Quote
Old 02-22-2013, 07:07 PM   PM User | #10
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
** SIGH **

Why didn't you mention the DATETXT field *long* ago??

Okay, now we only have one problem: Why is that field called dateTXT? That would seem to indicate to me that you are (foolishly) storing the date as a text (e.g., VARCHAR) value, rather than as a TIMESTAMP or DATETIME value.

Is that so?

If it is, then were you at least consistent in the format of the dates you stored in it?

Can you select, say, 10 records and show what they all have in that field?

On the other hand, if that field really is a TIMESTAMP or DATETIME field, and only the name is misleading, then we are ready to go.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 02-22-2013, 07:20 PM   PM User | #11
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
It is a DATE value field...I didn't name it and haven't bother to change it yet due to other concerns with getting the website fully functional...

Last edited by mharrison; 02-22-2013 at 07:23 PM..
mharrison is offline   Reply With Quote
Old 02-22-2013, 09:07 PM   PM User | #12
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Perfect then.

And trivial:
Code:
DELETE t1 
FROM tablename AS t1, tablename AS t2
WHERE t1.lnktxt = t2.lnktxt
  AND t1.cattxt = t2.cattxt
  AND t1.datetxt < t2.datetxt
ALSO: If you happen to have 3 (or more!) records with the same lnktxt and cattxt values, that will delete *all* of them except the one with the latest datetxt field.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Users who have thanked Old Pedant for this post:
mharrison (02-22-2013)
Old 02-22-2013, 09:31 PM   PM User | #13
mharrison
New Coder

 
Join Date: Dec 2012
Posts: 52
Thanks: 12
Thanked 0 Times in 0 Posts
mharrison is an unknown quantity at this point
Quote:
Originally Posted by Old Pedant View Post
Perfect then.

And trivial:
Code:
DELETE t1 
FROM tablename AS t1, tablename AS t2
WHERE t1.lnktxt = t2.lnktxt
  AND t1.cattxt = t2.cattxt
  AND t1.datetxt < t2.datetxt
ALSO: If you happen to have 3 (or more!) records with the same lnktxt and cattxt values, that will delete *all* of them except the one with the latest datetxt field.
That works wonderfully! One final question on this mission....is there any way to modify it so if it runs across an instance where all 3 fields are the same, it can drop all but one?
mharrison is offline   Reply With Quote
Old 02-22-2013, 10:11 PM   PM User | #14
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Yes, if you added that AUTO_INCREMENT field.

Say you named it "uniqueid".

Then:
Code:
DELETE t1 
FROM tablename AS t1, tablename AS t2
WHERE t1.lnktxt = t2.lnktxt
  AND t1.cattxt = t2.cattxt
  AND t1.datetxt = t2.datetxt
  AND t1.uniqueid < t2.uniqueid
But run the other query first, so that you get rid of the ones that *DO* differ by date. And only then run this one as a sort of cleanup.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Users who have thanked Old Pedant for this post:
mharrison (02-22-2013)
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:38 AM.


Advertisement
Log in to turn off these ads.