View Full Version : Count User Earnings, Sort By Top Earner?
Sussex_Chris
10-25-2009, 06:31 PM
Does anybody know the best SQL for this process that I am trying to do:
Basically, I have a table with numbers in it, E.g.
ID - user - earnings
But the earnings column is not a total, there are multiple earnings for each user so I need to count the earnings for each user and then insert the results into a new table and then sort the table by top earner descending.
I was thinking of something like this but it doesn't seem to work:
DROP TABLE `TopEarner`;
CREATE TABLE TopEarner (memberid INT, earnings INT);
INSERT INTO TopEarner(memberid, earnings) SELECT userid, SUM(userpayout) FROM completed GROUP BY userid ORDER BY SUM(userpayout) DESC;
I think you need to query the table twice, by joining on itself.
first thought and not tested.
select
t1.userid
, t2.userpayout
from completed as t1
inner
join (select sum(userpayout)
from completed as t2
where t1.userid = t2.userid
)
group by t1.userid
order by t2.userpayout
you might not need the group by.
or maybe this would be better
select
t1.userid
, pay.userpayout
from completed as t1
LEFT OUTER
JOIN ( SELECT userid
sum(userpayout)
FROM completed as t2
GROUP
BY userid ) AS pay
ON pay.userid = t1.userid
order by pay.user_id
hth
bazz
Sussex_Chris
10-25-2009, 08:17 PM
Thanks Bazz, That code spits out an error though, any ideas how to fix this one:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '.userpayout from completed as t1 inner join (select sum(userpayout) from complet' at line 3
yeh I forgot the comma which I have now added to both queries.
let me know how each 'works', when you have tried them.
bazz
Sussex_Chris
10-25-2009, 08:35 PM
Thanks again Bazz but I'm still getting errors unfortunately.
SQL (1) Error:
#1248 - Every derived table must have its own alias
SQL (2) Error:
#1054 - Unknown column 'pay.userpayout' in 'field list'
I added a comma between userid and sum(userpayout) on that one that I am guessing was missed out.
The whole command would be run on a cron job every 15 minutes, would it be easier to run a foreach userid add up the earnings, put in the table and after all have been entered into the DB sort by xx column DESC?
should be do-able in the one query.
try my latest version of the second query in my first post.
bazz
Old Pedant
10-25-2009, 08:50 PM
*WHY* do you want to create a NEW TABLE????
That's almost always a really BAD idea!
It means that your new table becomes obsolete as soon as even one new record is added to the new table!
Instead, just create a VIEW of the existing table!
CREATE VIEW totalUserPayouts
AS
SELECT userid, SUM(userpayout) AS totalpayout
FROM completed
GROUP BY userid
And then anytime you want the result you can just do
SELECT * FROM totalUserPayouts ORDER BY totalpayout DESC;
Old Pedant
10-25-2009, 08:54 PM
For the life of me, I can't see why you would ever need to join the table to itself, as Bazz suggested. I can't fathom how this answer:
select
t1.userid
, pay.userpayout
from completed as t1
LEFT OUTER
JOIN ( SELECT userid
sum(userpayout)
FROM completed as t2
GROUP
BY userid ) AS pay
ON pay.userid = t1.userid
order by pay.user_id
gives a result different than just
SELECT userid, SUM(userpayout) AS totalpayout
FROM completed
GROUP BY userid
ORDER BY SUM(userpayout) DESC;
In particular, I can't see any point at all in the LEFT JOIN, as you are guaranteed that all the columns in the left table will always have a match in the right pseudo-table.
I'm sorry, but I just can't make sense of the reason for needing/wanting a JOIN, at all.
Old Pedant
10-25-2009, 08:58 PM
You never said WHY your original attempt failed.
Is is possible that it failed just because you used an ORDER BY while doing the INSERT INTO??? That's clearly not necessary or even beneficial, because there is *NO* natural order to a DB table.
Even after doing:
INSERT INTO TopEarner(memberid, earnings)
SELECT userid, SUM(userpayout)
FROM completed
GROUP BY userid
You would still *always* have to do
SELECT memberid, earnings
FROM TopEarner
ORDER BY earnings DESC
anyway, to get the results in earnings order. No matter how you insert the data into the TopEarner table.
Sussex_Chris
10-26-2009, 12:38 AM
Ahh, the problem was the INT. Originally I had no idea what numbers it was producing because it was not allowing the decimal place. I changed to VARCHAR and the following query is working fine:
DROP TABLE `TopEarner`;
CREATE TABLE TopEarner (memberid VARCHAR(255), earnings VARCHAR(255));
INSERT INTO TopEarner(memberid, earnings) SELECT userid, SUM(userpayout) AS totalpayout FROM completed GROUP BY userid ORDER BY SUM(userpayout) DESC;
Thanks a lot for your help guys :)
Old Pedant
10-26-2009, 09:24 PM
Sorry, but I think you are digging yourself a deeper whole.
Your new TopEarner table will be *useless*.
Let's say that you end up with records like this:
Bob :: 103718.17
Ann :: 47221.11
Mark :: 9271.22
If, at some point, you tried to do
SELECT * FROM TopEarner ORDER BY earnings DESC;
You would end up with those records in *THIS* order:
Mark :: 9271.22
Ann :: 47221.11
Bob :: 103718.17
!!!!!
That's because when you order *strings* (which is what VARCHAR is), the comparison is done *character by character*. So the frist characters there--"9" and "4" and "1" are compared and, in descending order, you get the results I show.
You should NEVER NEVER NEVER store numbers (or dates or times) as strings.
That field should either be something like DECIMAL(12,2) or it should maybe be FLOAT or REAL, but *clearly* it should not be VARCHAR!!!
Old Pedant
10-26-2009, 09:25 PM
Oh, and of course all I just wrote would apply to memberid as a string, as well.
Sussex_Chris
10-26-2009, 11:25 PM
Hmm, I have had a cron job running since making it and the results have been correct every time so far. Instead of select * from xxx order by I am leaving without order by and the first query is sorting the data, I will not be using the table for anything else so am I correct in saying that it will always be correct this way?
Example of what the query is producing:
Username Earnings
xxx $249.75
xxx $97.54
xxx $28.525
xxx $15.65
xxx $10.475
xxx $9.045
xxx $8.85
xxx $7.85
xxx $6.425
xxx $5.5
Old Pedant
10-26-2009, 11:45 PM
Relational database do not have any "natural" sort order.
If you are just doing "SELECT * FROM TopEarner" then *probably* you will continue to get results in the same order that the records were added to the DB. But you should *never* count on it. There's nothing in the "contract" for a DB that says what order you will get when you omit the ORDER BY.
Now, you *could* solve the problem by adding an AUTO_NUMBER field to the table. And then using ORDER BY that autonumber field.
But I am truly mystified. You say you are refreshing this table with a CRON job? So if you are indeed rebuilding this table all the time, why *HAVE* the table?? Why not just always do
SELECT userid, SUM(userpayout) AS totalpayout
FROM completed
GROUP BY userid
ORDER BY SUM(userpayout) DESC
LIMIT 10
(where the LIMIT is completely optional, but will return only the given number of records...usually the reason for doing a SUM and ORDER BY like this).
Do you have millions of records? Is that why you don't want to issue this query each time the results are needed? Or maybe this is for a web-based game with the points changing rapidly???
Ehhh...whatever. I guess if it works for you, it works. You probably don't care about DB design principles, et al., you just want it to work.
Coyote6
10-26-2009, 11:58 PM
I would agree with OP here. I do not see the need for the extra table. I think you are better off just executing the select statement. You are just creating more work for yourself to upkeep by not selecting from the table containing the actual data or a view created from that table.
Sussex_Chris
10-27-2009, 12:17 AM
The results will be displayed openly to the public where it has the potential to being loaded 1000+ times per day. The table that it is pulling the results from currently has 15000 records from the first 20 days of the site being live. I was thinking that this query being ran 1000+ times per day would cause a lot more stress than being ran 4 times per hour by a cron job and then the only query is to echo out the results from the smaller table.
I will be changing the cell format to DECIMAL now though, thanks for your input. It would also be interesting to hear your guys optinions of the above though. :)
Old Pedant
10-27-2009, 01:17 AM
Depends on how up-to-the-minute you need/want the display to be.
I suppose if 4 times a day is enough, then you are probably doing the right thing. Though I doubt seriously that you'd notice 1000 hits to the underlying SELECT per day. Now, 1000 per hour? Ehhh...yeah, I'd think about it.
But, truly, if you are going to update 4 times per day and then display those results, why do it by creating a new DB table???? That just means the your PHP or whatever pages have to *still* make a query of the DB every time.
Why not, instead, generate an HTML file that you simply include in the display page(s) via server side includes???
Coyote6
10-27-2009, 01:21 AM
What type of server are you running the MySQL on? Just out of curiosity... I think the new Mac Servers would handle it fine but you may be right if running on older systems... that's just my opinion though. I just don't like not having up to date results.
Sussex_Chris
10-27-2009, 01:37 AM
I was planning on 4 times per hour rather than 4 times per day although I could run it even more often and it would still be a much less load and makes my site less vulnerable to anyone that may attempt an attack.
My server details:
» Intel Dual Core 2.6GHz E7300
» 2GB DDR2 SDRAM
» MySQL Client API version 5.0.81
» PHP Version 5.2.8
That is a good idea regarding creating the pages, I did not think of doing that.
Coyote6
10-27-2009, 02:15 AM
Hmm kind of a toss up... I would think that would handle the select query method but I would recommend running more ram if the system will take it. Best of luck either way. OP's include file is a pretty slick solution too.
Old Pedant
10-27-2009, 02:35 AM
If you want to get even slicker, you can do what I do on one site.
With hundreds of thousands of records, we still want to do some summations/counts that go all the way back to the beginning of time.
Okay, so once a week I kill and recreate a table that has all the sums/counts from day 1 to that point. And now, to get the instantaneous total, I only have to sum/count records that were added since the "backup" point and add them to the cached values.
And of course even this could be improved. No reason to kill/recreate when I could, instead, simply update the table to reflect that weeks entries. I'm seriously looking at getting rid of all the data that is more than a month old by putting it in archive and just depending on the running totals.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.