...

View Full Version : php mysql SELECT DISTINCT not returning most common entry



Bob Fo
12-17-2011, 09:12 PM
I have a table that keeps track of paypal transactions, and the field contains the users username, date, id, and item purchased. I am trying to pull out only the latest distinct purchasers (usernames only) and I have one customer who represents the bulk of the latest purchases, but they are missing from the results.

I have tried group by and select distinct without any luck. Here is the query I am currently using:


$q = "SELECT DISTINCT(username) FROM paypal ORDER BY date DESC LIMIT 20";
$r = mysqli_query($connection,$q) or trigger_error("Query: $q\n<br> /> MySQL Error: " . mysqli_error());

This query does pull out 20 distinct users, but it misses the most common user who should be represented in the last 20 purchases. Any suggestions on what I may be doing wrong?

Thanks in advance.

Fou-Lu
12-18-2011, 12:41 AM
According to this, nothing is wrong. You've pulled out the most recent 20 distinct users from the database. Pull a standard query with no distinct records ordered by date, and visually count the 20 most recent distinct usernames to ensure that it does exist within the most recent entries.
Assuming you have no iteration issues, given the code above this is not a PHP issue, its a SQL one. Moving from PHP to MySQL forum.

Old Pedant
12-18-2011, 04:42 AM
When you us DISTINCT(), you are giving MySQL free rein to pick *ANY ONE* of the records with the same username. So then, when you order by *another* field, you have no idea *WHICH* records MySQL chose when getting those DISTINCT values.

Try this, instead:


SELECT username, MAX(`date`) AS lastPurchase
FROM paypal
GROUP BY username
ORDER BY lastPurchase DESC
LIMIT 20

Incidentally, since DATE is a keyword (builtin function) in MySQL, it's always advisable to put `...` around the field name. As shown.

In general, I try to avoid using DISTINCT because it is so easy for it to hide a multitude of query sins.

Old Pedant
12-18-2011, 04:47 AM
To clarify, suppose you had these records in your DB:


username date
albert Jan 1, 2009
bob Feb 1, 2009
carl Mar 1, 2009
albert Apr 1, 2009

And then you did

SELECT DISTINCT(username) FROM paypal ORDER BY `date` DESC LIMIT 2

When it picks the DISTINCT records, MySQL might will use *just* the FIRST THREE records shown there (why not? The "albert" from Apr 1,2009, is clearly a duplicate so why use it?). And so then when it does ORDER BY `date` DESC naturally the "albert" record from Jan 1,2009, will be the earliest date and so last in the ORDER BY and so the LIMIT will chop it off.

DO NOT ASSUME that MySQL can hear you thinking. MySQL is a worse prognosticator than even most humans.

Bob Fo
12-18-2011, 05:47 PM
Worked like a charm, thank you very much for both the help and the excellent information. Very much appreciated. :)

Bob Fo
01-12-2012, 08:23 PM
I have run into another issue that is similar to the one above, so I figured I would just post it in this same topic. Here is the query:


SELECT
username,
MAX(date_posted) as posted_date,
status,
username,
status_id
FROM status
WHERE username != 'Bob'
GROUP BY username
ORDER BY posted_date DESC
LIMIT 2

What is interesting about the results is that it is returning the correct date order, but the other information related to the date is incorrect, such as the status, status_id, etc.

Here is what it's returning:

username - date - status - username - id

tom 1326397523 status update tom 107
Stell 1326391677 Blab bla Stell 255

The correct date for id 107 is 1324739967

Thanks in advance for any suggestions/comments. :)

Old Pedant
01-13-2012, 08:02 PM
Same thing, basically, as using DISTINCT in the way you first were.

When you use GROUP BY and do *NOT* specify *ALL* the field from the SELECT (except the aggregate function fields...in this case the MAX field), you are giving MySQL free rein to pick ANY of the possible values.

All other databases that I know of won't even permit you to use GROUP BY in this sloppy way; they insist the you include of of the non-aggregate SELECTed fields.

But MySQL is sloppy and allows you to be sloppy. But in the process, you have to let MySQL make the choice of which of several field values to choose. And the likelihood it will choose what you want and expect is minimal.

The correct way to have coded that is

SELECT
username,
MAX(date_posted) as posted_date,
status,
status_id
FROM status
WHERE username != 'Bob'
GROUP BY username, status, status_id
ORDER BY posted_date DESC
LIMIT 2

But now don't be surprised if you don't get the results you want for other reasons.

The correct answer to what you are after here is almost surely a JOIN.


SELECT s.username, s.date_posted, s.status, s.status_id
FROM status AS s,
( SELECT username, MAX(date_posted) as max_posted
FROM status GROUP BY username ) AS m
WHERE s.username = m.username
AND s.date_posted = m.max_posted
ORDER BY s.posted_date DESC
LIMIT 2



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum