...

View Full Version : how to get just the duplicates based on a field



jasonpc1
08-14-2011, 01:24 PM
I am wanting to get all results where there is a duplicate 'Pid'

Pid is the Product ID.

CREATE TABLE IF NOT EXISTS `products` (
`Pid` int(11) NOT NULL,
`productTitle` text NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


INSERT INTO `products` (`Pid`, `productTitle`) VALUES
(3350, 'Product A'),
(3351, 'Product B'),
(3352, 'Product C'),
(3353, 'Product D'),
(3354, 'Product E'),
(3354, 'Product F'),
(3355, 'Product G'),
(3355, 'Product H'),
(3356, 'Product I'),
(3356, 'Product J');

from this table the results should be.


(3354, 'Product E')
(3354, 'Product F')
(3355, 'Product G')
(3355, 'Product H')
(3356, 'Product I')
(3356, 'Product J')


what formular should i use for this to work.


EDIT:

I have tried the following but this only returns one of the duplicates not all of them.

SELECT * FROM `products` GROUP BY `Pid` HAVING count(*) > 1

sunfighter
08-14-2011, 07:12 PM
Here, work out your own formatting of output.


<?php
require ('./inc/DB_connect.php'); //<<== Connect to DB

$query = "SELECT productTitle, Pid, COUNT(Pid) FROM products GROUP BY Pid";
$result = mysql_query($query) or die(mysql_error());
// Print out result
while($row = mysql_fetch_array($result))
{
if($row['COUNT(Pid)'] > 1)
{
$myID = $row["Pid"];
$query = "SELECT Pid, productTitle FROM products WHERE Pid = $myID";
$results = mysql_query($query);
while($row = mysql_fetch_array($results))
{
echo "Product Code ". $row['Pid'] ." has these products". $row['productTitle'] ." items.";
echo "<br />";
}
echo '----------------------<br>';
}
}
?>

guelphdad
08-15-2011, 04:34 PM
sunfighter, any time you are writing a mysql query to run inside a loop of results from another mysql query you are not taking advantage of mysql's capabilities. Your solution will also not scale well.

What you need is something like this:


SELECT
a.pid,
a.producttitle
FROM
products as A
INNER JOIN (select pid from products group by pid having count(*) > 1) DT
ON
a.pid=DT.pid


note that DT is just an alias for the derived table in the second query.

sunfighter
08-16-2011, 10:16 PM
@guelphdad Thank you I did not know that you could inner join the same table. This works good with only one problem, I can't group the output together. Here is the code I used:

<?php
require ('./inc/DB_connect.php'); //<<== Connect to DB

$query = "
SELECT
a.Pid,
a.productTitle
FROM
products as A
INNER JOIN (select Pid from products group by Pid having count(*) > 1) DT
ON
a.Pid=DT.Pid
";

$result = mysql_query($query) or die(mysql_error());

while($row = mysql_fetch_array($result))
{
echo "Product Code ".$row["Pid"]." has these products ".$row["productTitle"];
echo "<br />";
}
?>
This looks good until I added (3355, 'Lord Nelson') to the end of the table.

The echo writes out in the order the items(rows) are stored. The items do not group. I played a little with it but haven't found a solution. What needs to be done?

PS - what did you mean by
will also not scale well.? Are you talking about deeper searches into the db?

Old Pedant
08-16-2011, 10:46 PM
The echo writes out in the order the items(rows) are stored. The items do not group
Well, of course not. You don't have any ORDER BY on your outer SELECT.

There is *NO NATURAL ORDER BY* to a SELECT. Usually, a database engine will return the records in primary key order (if there is a primary key, which there isn't, on this table) else it will return them in the order they were INSERTed. But there is NO GUARANTEE of either of those!

You must add an ORDER BY if you want them in a particular order.


SELECT
a.Pid,
a.productTitle
FROM
products as A
INNER JOIN (select Pid from products group by Pid having count(*) > 1) DT
ON
a.Pid=DT.Pid
ORDER BY a.Pid;


*************


what did you mean by "will also not scale well."

Really simple: If you had a million records returned by your first query, then your loop in the PHP code would have to do a million repeats of SELECT Pid, productTitle FROM products WHERE Pid = $myID";

In short, instead of doing one query to MySQL, your code would have to do 1,000,0001 queries.

And each query takes *TIME*. More time than you might think. MySQL is in a *separate process* from PHP. That means that the operating system has to suspend PHP, send a message from the PHP process to the MySQL process, start up the MySQL process, get the results from MySQL, suspend MySQL, send the results as a message to PHP, and resume PHP. 1,000,001 times.

Granted, on a dual (or more) CPU system, much of that can be done at the same time or nearly the same time, but it's still much slower than doing only one "round trip" from PHP to MySQL.

[And we won't even talk about the case where MySQL isn't on the same machine as PHP, which is often the case in shared hosting environments and in really busy commercial systems. There the data has to travel from PHP to MySQL across a physical wire between machines and then back again. There, the two things *can't* happen at the same time.]

guelphdad
08-17-2011, 01:25 AM
I do not think you understand what GROUP BY does. I think you mean ORDER BY in a sense a bunch of similar items appear in a row.

GROUP BY collapses a number of like rows into one row, so if you GROUP BY colour on a table of cars with colours and applied a count to it, you could find out how many there are in each GROUP of cars. 6 red and 7 yellow for instance. It doesn't output all the red cars in a row, that is an ORDER BY clause that does that.

Old Pedant
08-17-2011, 02:30 AM
Yeah, I should have said that. Using GROUP BY in place of the ORDER BY that I showed wouldn't help.

sunfighter
08-17-2011, 04:47 PM
Thanks guys.
I thought I had done that(ORDER BY), by guess I didn't.

guelphdad
08-17-2011, 08:52 PM
Sunfighter, In your first response in the thread, you are writing a mysql query. Then you are stepping through the results. While you are stepping through the results you are, inefficiently, running a new query. Instead you need a single query as I've written above.

sunfighter
08-18-2011, 04:50 PM
guelphdad, I have always known that a query inside of a loop was not a good idea, but knew of no other way of getting this type of information. Doing an inner join on the same table was an eye opener. Never ran into that before. I now have to get my brain to use this.

In studying my and your code, I added some echo statements and saw that mine ran through the table more then one time. The first query runs through the entire table to obtain the COUNT and then again to get the information. So it could run through everything a second time. Not practical time wise as you and Old Pedant said.

Your join runs through the table once. Big improvement. I shall change my method of programming.

Old Pedant
08-18-2011, 08:31 PM
The funniest part of all this is that Sunfighter has come way out ahead by learning something new and the original poster, Jason, hasn't bothered to collect his answer, apparently.

guelphdad
08-19-2011, 02:46 AM
sunfighter, you're only following in footsteps where I and others went before you. Glad to see people have an open mind.

Old Pedant
08-19-2011, 03:58 AM
Well... I dunno. Looks to me like he is following in our paw prints, not footsteps.

I mean I'm just judging by your picture and my age.

<grin/>

guelphdad
08-19-2011, 04:03 PM
Compared to a Giant Tortoise you are but a mere youth!



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum