...

View Full Version : Finding Duplicate records



phantom007
02-27-2009, 05:49 AM
Hi

I have a table called tbl_members with 5,000 records

I have 3 colums,

memberID
member_name
email_address

I want to group all members who have similar email address. The select clause should also include memberID and member_name.

Can you help me setting up a query for this please?

Thanx

Old Pedant
02-27-2009, 09:53 PM
Define "similar"???

Identical? Same domain? Same except for upper/lower case???

What if you have 3 records with identical email address but 3 different memberID's and/or member_name values?

Please be more specific in your request. Thanks.

phantom007
02-28-2009, 03:31 AM
Similar means which two or more members have same email address

Old Pedant
02-28-2009, 06:17 AM
You didn't answer the rest of the questions.

What do you want to happen, for example, if you have two *IDENTICAL* records??? (Same email, same member_id, same membername.)

Or two out of three of those are identical?

Or or or or ...

I'll take a stab at this on the assumption that member_id is an autoincrementing field and so will never have duplicates. Apparently, you didn't make email an indexed field that disallows duplicates, and so you've got people signing up more than once, yes?

Okay...


SELECT M1.email_address, M1.memberID, M1.member_name, M2.memberID, M2.member_name
FROM Members AS M1, Members AS M2
WHERE M1.email_address = M2.email_address
AND M1.memberID < M2.memberID
ORDER BY M1.email_address


Now, if a member appears three times, that will end up showing three records. Example:


abc@xyz.com :: 173 :: A.Jones :: 192 :: Adam Jones
abc@xyz.com :: 173 :: A.Jones :: 244 :: A.P.Jones
abc@xyz.com :: 192 :: Adam Jones :: 244 :: A.P.Jones

That's probably less than ideal, but how many triple registrations do you expect to find?

Clearly whatever we do we'd need to return 2 records in a situation like that, one for each of the two "duplicates". So I don't think the extra one is too onerous.

It's much more of a pain in the rear to only get 2 records from that situation, but we can do it if it's important.

Alternatively, we *could* get all the duplicates in a single record *providing* you give me an UPPER LIMIT on the number of duplicates you want to handle in such a single record. (That is, if you tell me there are no cases with more than 4 memberid's to a single email address and it turns out there is actually a case with 5, then my one-record query would return...my head hurts...umm...I guess only two. But couldn't be just 1 because no space in the record if I only hold space for 4.)

So now you see why I wanted you to spell out the specifications a lot more than you did.

Old Pedant
02-28-2009, 06:22 AM
It occurs to me that I *could* write a query that would return records like this:


'PRIMARY' :: email-address :: id :: name
'DUPE' :: email-address :: id :: name
'DUPE' :: email-address :: id :: name

if that's preferable.

Another pain in the patootie query.

Again, though, no specs, no point in writing the code. Give me specs, I write code.

phantom007
02-28-2009, 06:40 AM
Hi

I think your query solved my problem.


Thanx for all your help :)

God Bless!



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum