Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    New Coder
    Join Date
    Dec 2011
    Posts
    11
    Thanks
    9
    Thanked 1 Time in 1 Post

    php mysql SELECT DISTINCT not returning most common entry

    I have a table that keeps track of paypal transactions, and the field contains the users username, date, id, and item purchased. I am trying to pull out only the latest distinct purchasers (usernames only) and I have one customer who represents the bulk of the latest purchases, but they are missing from the results.

    I have tried group by and select distinct without any luck. Here is the query I am currently using:

    Code:
    $q = "SELECT DISTINCT(username) FROM paypal ORDER BY date DESC LIMIT 20";
    $r = mysqli_query($connection,$q) or trigger_error("Query: $q\n<br> /> MySQL Error: " . mysqli_error());
    This query does pull out 20 distinct users, but it misses the most common user who should be represented in the last 20 purchases. Any suggestions on what I may be doing wrong?

    Thanks in advance.
    Last edited by Bob Fo; 12-17-2011 at 09:16 PM.

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    According to this, nothing is wrong. You've pulled out the most recent 20 distinct users from the database. Pull a standard query with no distinct records ordered by date, and visually count the 20 most recent distinct usernames to ensure that it does exist within the most recent entries.
    Assuming you have no iteration issues, given the code above this is not a PHP issue, its a SQL one. Moving from PHP to MySQL forum.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,133
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    When you us DISTINCT(), you are giving MySQL free rein to pick *ANY ONE* of the records with the same username. So then, when you order by *another* field, you have no idea *WHICH* records MySQL chose when getting those DISTINCT values.

    Try this, instead:
    Code:
    SELECT username, MAX(`date`) AS lastPurchase
    FROM paypal
    GROUP BY username
    ORDER BY lastPurchase DESC
    LIMIT 20
    Incidentally, since DATE is a keyword (builtin function) in MySQL, it's always advisable to put `...` around the field name. As shown.

    In general, I try to avoid using DISTINCT because it is so easy for it to hide a multitude of query sins.
    Last edited by Old Pedant; 12-18-2011 at 04:47 AM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Bob Fo (12-18-2011)

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,133
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    To clarify, suppose you had these records in your DB:
    Code:
    username     date
    albert       Jan 1, 2009
    bob          Feb 1, 2009
    carl          Mar 1, 2009
    albert       Apr 1, 2009
    And then you did
    Code:
    SELECT DISTINCT(username) FROM paypal ORDER BY `date` DESC LIMIT 2
    When it picks the DISTINCT records, MySQL might will use *just* the FIRST THREE records shown there (why not? The "albert" from Apr 1,2009, is clearly a duplicate so why use it?). And so then when it does ORDER BY `date` DESC naturally the "albert" record from Jan 1,2009, will be the earliest date and so last in the ORDER BY and so the LIMIT will chop it off.

    DO NOT ASSUME that MySQL can hear you thinking. MySQL is a worse prognosticator than even most humans.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Bob Fo (12-18-2011)

  • #5
    New Coder
    Join Date
    Dec 2011
    Posts
    11
    Thanks
    9
    Thanked 1 Time in 1 Post
    Worked like a charm, thank you very much for both the help and the excellent information. Very much appreciated.

  • #6
    New Coder
    Join Date
    Dec 2011
    Posts
    11
    Thanks
    9
    Thanked 1 Time in 1 Post
    I have run into another issue that is similar to the one above, so I figured I would just post it in this same topic. Here is the query:

    Code:
    SELECT 
           username,
    MAX(date_posted) as posted_date,
            status, 
            username, 
            status_id
            FROM status
            WHERE username != 'Bob'
    GROUP BY username
            ORDER BY posted_date DESC
            LIMIT 2
    What is interesting about the results is that it is returning the correct date order, but the other information related to the date is incorrect, such as the status, status_id, etc.

    Here is what it's returning:

    username - date - status - username - id

    tom 1326397523 status update tom 107
    Stell 1326391677 Blab bla Stell 255

    The correct date for id 107 is 1324739967

    Thanks in advance for any suggestions/comments.

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,133
    Thanks
    75
    Thanked 4,338 Times in 4,304 Posts
    Same thing, basically, as using DISTINCT in the way you first were.

    When you use GROUP BY and do *NOT* specify *ALL* the field from the SELECT (except the aggregate function fields...in this case the MAX field), you are giving MySQL free rein to pick ANY of the possible values.

    All other databases that I know of won't even permit you to use GROUP BY in this sloppy way; they insist the you include of of the non-aggregate SELECTed fields.

    But MySQL is sloppy and allows you to be sloppy. But in the process, you have to let MySQL make the choice of which of several field values to choose. And the likelihood it will choose what you want and expect is minimal.

    The correct way to have coded that is
    Code:
    SELECT 
           username,
           MAX(date_posted) as posted_date,
            status, 
            status_id
    FROM status
    WHERE username != 'Bob'
    GROUP BY username, status, status_id
    ORDER BY posted_date DESC
    LIMIT 2
    But now don't be surprised if you don't get the results you want for other reasons.

    The correct answer to what you are after here is almost surely a JOIN.
    Code:
    SELECT s.username, s.date_posted, s.status, s.status_id
    FROM status AS s,
         ( SELECT username, MAX(date_posted) as max_posted
           FROM status GROUP BY username ) AS m
    WHERE s.username = m.username
      AND s.date_posted = m.max_posted
    ORDER BY s.posted_date DESC
    LIMIT 2
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Bob Fo (01-13-2012)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •