PDA

View Full Version : Deleting Duplicate Rows


rookiecoder
08-11-2009, 10:38 AM
Hi
I have the following table structure

CREATE TABLE IF NOT EXISTS `certificate` (
`id` int(11) NOT NULL auto_increment,
`seafarer_id` int(11) default NULL,
`name` varchar(64) default NULL,
`doc_id` int(11) default NULL,
`filetype` varchar(64) default NULL,
PRIMARY KEY (`id`)
)


Now, I have several rows where doc_id is same , and out of those rows I want to delete the ones whose filetype is null . What would be the best way to do so ?
e.g out of the five rows below, only row 1 should get deleted

(1, 10, 'doc1', 100, NULL)
(2, 10, 'doc10', 100, 'exe')
(3, 11, 'doc2', 200, NULL)
(4, 10, 'doc3', 300, NULL)
(5, 12, 'doc12', 110, 'pdf)


Thanks a lot !

bazz
08-11-2009, 10:56 AM
Until someone comes along with a better way, I would try it like this


select doc_id
from table
where filetype not null

That should return the doc_ids where you have 'exe', 'pdf' etc

build an array of those values and then


delete
from table
where doc_id IN (doc_id_array)

substitute doc_id_array with your php code for the array built from the first query.

That should remove the records where doc_id has a null value
NOT tested, just a suggestion

Let me know how you get on and backup the data before deleting anything.

bazz

rookiecoder
08-11-2009, 11:55 AM
Thanks bazz, but I don't think this solution will work for me. I don't want to delete rows where doc_id is null . (Infact in no record is it null)

I only want those records to be deleted who have a matching a doc_id and have their filetype equal to null.

Also, I am using phpadmin (as this is a one time solution). So I don't need to use PHP scripts

rookiecoder
08-11-2009, 01:46 PM
Alright, after few hit and trials found that the following statements do the trick


SELECT certificate.id
FROM `certificate` , (

SELECT id, doc_id, count( * )
FROM certificate
GROUP BY doc_id
HAVING count( * ) >1
) AS cert
WHERE certificate.doc_id = cert.doc_id
AND certificate.filetype IS NULL



Got the list , verified a few and had these rows deleted.

bazz
08-11-2009, 08:11 PM
Thanks bazz, but I don't think this solution will work for me. I don't want to delete rows where doc_id is null . (Infact in no record is it null)


You siad you wanted to delete rows where doc_id is duplicated and filetype is null.


I only want those records to be deleted who have a matching a doc_id and have their filetype equal to null.


My first query gets the list of doc_ids where they have an exe file or pdf.

my second query deletes on the following basis:
1. that you have a doc_id with a filetype that is not null
2. that doc_id is duplicated

if you wanted it to delete where doc_id dupe is null add this line.



delete
from table
where doc_id IN (doc_id_array)
and doc_id IS NULL



Also, I am using phpadmin (as this is a one time solution). So I don't need to use PHP scripts


Oops I misread that quote and thought it said you had to use php.
If you got sorted then I am pleased, however, none of that script (your post) deletes your duplicate rows.

bazz

Old Pedant
08-12-2009, 12:13 AM
Doesn't this do it?

DELETE FROM table
WHERE id IN (
SELECT t1.id
FROM table AS t1, table AS t2
WHERE t1.docid = t2.docid
AND t1.id <> t2.id
AND t1.filetype IS NULL
AND t2.filetype IS NOT NULL )

We join the table to itself, trying to find two records where the docid is the same but the id (PK) is different. This guarantees that we have two distinct records. Then we further qualify it that the record we are actually selecting must have a filetype that is NULL while the *other* record's filetype is *NOT* NULL. Presto.

No?

Wouldn't work without the id being the PK, but as it is...