PDA

View Full Version : Resolved Number of Repeated Records in each column.


ubh
11-28-2009, 11:41 AM
Got a table with about 800 records and 30 some odd columns.

I want to go through each column, recognize the range of different values and how many times each value is repeated under each column through out the entire table.

Anyone know what I am talking about here?

ubh
11-28-2009, 01:11 PM
Well this is what I have come up with but it for what ever reason is not listing out compleate duplicates.


$results = mysql_query("SHOW COLUMNS FROM `$sessionName`");
while ($rows = mysql_fetch_assoc($results)) {

if($rows["Field"] == "id" || $rows["Field"] == "classId"){}else{
echo "||||".$rows["Field"]."||||\n";
$dups = mysql_query("SELECT DISTINCT ".$rows['Field'].",COUNT(".$rows['Field'].") AS cat_num FROM `$sessionName` GROUP BY ".$rows['Field']);
if($dups == ""){}else{
while($row = mysql_fetch_array( $dups )) {
if($row["cat_num"] > 1){
echo $rows['classId']." | ". $rows['Field'].":".$row[$rows['Field']]." is repeated by ".$row["cat_num"]."\n";
}
}
}
}
}


My output is telling me a few columns have no duplicates when I know better from viewing the table directly.

Anyone see something missing here?

Old Pedant
11-28-2009, 06:50 PM
Actually, with MySQL (and SQL Server) you can simply do
SELECT COUNT(DISTINCT field1) AS f1Count, COUNT(DISTINCT field2) AS f2Count, ... FROM table
Not sure how it will perform, but has to be better than doing it one field at a time.

Or is that not what you are after?

ubh
11-28-2009, 07:56 PM
The only difference is that with mine, I don't explicitly announce each columns name but rather am running a loop over each column and providing the columns name to the query of question. I noticed that you changed the location of the DISTINCT to be with in the COUNT, but this outputs nothing, just blank. I also noticed that you didn't use the group by feature in your query leaving the output blank as well.

Anyways my issue still remains.

Take column 21, this has about 200 records and the values for each records vareys up to three different values. (show, hide, none).. So again there are 200 hundred records for column 21 and each records could be one of the three values above.

When I run through my query my output it telling me that this column, has no duplicates... huh?

It find other columns duplicates just fine.. but its hit and miss with others.
Here is my revised code now:


$results = mysql_query("SHOW COLUMNS FROM `$sessionName`");
while ($rows = mysql_fetch_assoc($results)) {

if($rows["Field"] == "id" || $rows["Field"] == "classId"){}else{
echo "||||".$rows["Field"]."||||\n";
$dups = mysql_query("SELECT DISTINCT ".$rows['Field'].",COUNT(".$rows['Field'].") AS cat_num FROM `$sessionName` GROUP BY ".$rows['Field']);
if($dups == ""){}else{
while($row = mysql_fetch_array( $dups )) {
if($row["cat_num"] > 0){
echo $row['classId']." | ". $rows['Field'].":".$row[$rows['Field']]." is repeated by ".$row["cat_num"]."\n";
}
}
}
}
}


The query of question:
$dups = mysql_query("SELECT DISTINCT ".$rows['Field'].",COUNT(".$rows['Field'].") AS cat_num FROM `$sessionName` GROUP BY ".$rows['Field']);
if($dups == ""){}else{
while($row = mysql_fetch_array( $dups )) {
if($row["cat_num"] > 0){
echo $row['classId']." | ". $rows['Field'].":".$row[$rows['Field']]." is repeated by ".$row["cat_num"]."\n";
}
}
}

Maybe its a performance issue, and if so, why does it find repeated values for columns after say this troubled column number 21.

Old Pedant
11-28-2009, 08:44 PM
Still not sure what output you are after. Do you just want
fieldA fileldB fieldC
7 3 14

A count of the different values in each field?

Or do you really want to see all of the different values?

If you just want the counts, then COUNT(DISTINCT fieldX) should give it to you. With no GROUP BY.

Go read the MySQL docs on COUNT(DISTINCT).

Oh, w.t.h., here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_count

ubh
11-28-2009, 08:58 PM
http://www.devinrolsen.com/wp-content/themes/typebased/images/smallTable.jpg

The output of this MINI table should be as follows:

In Col1 the "1" value is repeated 2 times.
In Col1 the "3" value is repeated 2 times.
In Col1 the "4" value is repeated 3 times.
In Col1 the "5" value is repeated 1 times.

In Col2 the "display" value is repeated 3 times.
In Col2 the "none" value is repeated 4 times.
In Col2 the "hide" value is repeated 1 times.

In Col3 the "first" value is repeated 3 times.
In Col3 the "last" value is repeated 2 times.
In Col3 the "middle" value is repeated 3 times.


That is what I want to get out of my table. Now imagine that you get this output to work on say col1 and col3 but col2 says there is no repeating values to list.... that's my issue.

ubh
11-29-2009, 06:37 AM
Yes! I finally got it to behave!

Here was the final approach:
First I am using the SHOW COLUMNS to build an array of all the tables column.
Then pass through a while loop and feeding the table column names to my SELECT DISTINCT query.

My issue was the fact that some column names had a darn hyphen in their names + not completely enclosing the SELECT DISTINCT correctly with ` as well as including another AS for the value return in the output...

Here is my code to find DISTINCT values and the amount of times they repeat in each of your tables columns.

$results = mysql_query("SHOW COLUMNS FROM TABLE");
while ($rows = mysql_fetch_assoc($results)) {
echo "||".$rows["Field"]."||\n";
$dups = mysql_query("SELECT DISTINCT `".$rows['Field']."` AS value, COUNT(`".$rows['Field']."`) AS count FROM TABLE GROUP BY `".$rows['Field']."`")or die(mysql_error());
if($dups == ""){}else{
while($row = mysql_fetch_array( $dups )) {
if($row["count"] > 0){
echo "In ".$rows["Field"]." the ".$row["value"]." value is repeated by ".$row["count"]." times \n";
}
}
}

}

Old Pedant
11-29-2009, 08:45 PM
So now that you did this, I just have to ask a simple question: "Why?" <grin/>

ubh
11-29-2009, 09:49 PM
Why this method?
Cause I needed really three things the count, the name of the repeated record and the name of the row thus having to use two AS in the query.

Why all this trouble?
I am working on a CSS engine, currently it only runs reports right now. But thanks to the help and efforts such as your self I am able to look into very large CSS files and run very interesting reports to help optimize web sites even further.

First test involved the Magento Commerce solutions very own css files.. Here is a report of its boxes.css (the largest css file).

http://pastebin.com/m42164770

With these found pieces of data I can proceed onto writing further functions that will help sum up these repeated styles and make theirs or anyone else sloppy CSS more tighter and loading time friendly.

Make sense now?

Old Pedant
11-29-2009, 10:13 PM
WOW indeed.