View Full Version : MySql SUM and GROUP query help
Arnaud
10-04-2009, 11:05 PM
Hi folks,
I need help building a php mysql query in one single table, but I can't figure out how to make it work...
I have one table with the following fields:
ID, TIMESTAMP, TEAMID, GOAL, ASSIST, PENALTY
What I need:
- get results from only the last (oldest) TIMESTAMP in the table
- get the 2 biggest SUMs of ((GOAL + ASSIST) - PENALTY) grouped by TEAMID and the TEAMID values for that last timestamp
Here is how the table could look:
ID, TIMESTAMP, TEAMID, GOAL, ASSIST, PENALTY
1, 1252691100, 15, 1, 1, 1
2, 1252691100, 15, 0, 2, 1
3, 1252691100, 24, 0, 0, 0
4, 1252691100, 24, 1, 2, 3
5, 1252691100, 24, 0, 0, 1
6, 1252691100, 11, 0, 0, 2
7, 1252691100, 11, 0, 0, 1
8, 1252691100, 11, 0, 1, 1
9, 1252691100, 11, 0, 0, 2
10, 1228004512, 15, 0, 0, 1
11, ...
My goal in the above example would be to return:
For the biggest timestamp (1252691100) the 2 TeamIDs that have the biggest total (id 15 and 24):
Team: 15 / Total (SUM): 2
Team: 24 / Total (SUM): -1
I came to the following, that obviously does not return what I want ;)
select *,SUM((GOAL + ASSIST) - PENALTY) as TOTAL from TEAMHST group by TEAMID order by TIMESTAMP desc limit 0,1
Thanks in advance for any help on that!
Old Pedant
10-05-2009, 01:37 AM
Just out of curiosity, WHY would you use a TIMESTAMP field for this instead of an ordinary--and human readable--DATETIME field???
But anyway...
Should be easy enough:
SELECT timestamp, teamid, SUM( goal + assist - penalty ) AS total
FROM TEAMHST
GROUP BY timestamp, teamid
ORDER BY timestamp DESC, total DESC, teamid ASC
LIMIT 2
There are other more complex ways, but not worth it unless this table is huge (thousands of records) and has the right indexes set on it.
CAUTION: If you have 2 or more teams tie for second place, you won't know it. If you have 3 or more tie for first place, you won't know it. If you need to know about such ties, the query will be somewhat more complex!
Old Pedant
10-05-2009, 01:41 AM
And what does PHP have to do with this?
Arnaud
10-05-2009, 08:52 AM
Thank you, will try that.
My query will be in a PHP script, that's the only reason why I mentioned it...
I just want to return the n number of best total for the last entered timestamp. I just want to do a kind of top 5 or top 3, say top 3 so if I have:
- Team1 : Total 155
- Team2 : Total 90
- Team3 : Total 200
- Team4 : Total 25
That should return Team 1 / 2 / 3 and the respective totals.
But in case I have:
- Team1 : Total 155
- Team2 : Total 90
- Team3 : Total 200
- Team4 : Total 90
It should return all 4 teams since Team 2 and 4 are 3rd and equal.
TIA
Arnaud
10-05-2009, 05:38 PM
OK good! This seems to work as expected.
Here is the query again with the modifications so that it is complete:
SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS TOTAL
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID
ORDER BY TIMESTAMP DESC, TOTAL DESC, TEAMID ASC
LIMIT 3
Now could someone please push me to the right direction if I want to achieve:
1. I would like the top 3 teams to show, even if 2 or more have the same total
(top 3 would show 5 teams if some have equal totals)
2. I know it was not a very clever way to build the DB (and I can't change its design now) but I need to get:
- the team name that corresponds to the TEAMID in another table (LNA_TEAM)
- the user name that corresponds to the USERID found in LNA_TEAM in another table (LNA_USER)
(one user can have multiple teams, that's why...)
...so I guess I need to use multiple inner joins, though I could do multiple db queries too. What do you suggest?
And how to put all that together??
Thanks in advance !
Arnaud
10-05-2009, 05:43 PM
There are other more complex ways, but not worth it unless this table is huge (thousands of records) and has the right indexes set on it.
FYI, the table holds around 9'000 records right now and will hold between 50'000 and 75'000 records by the end of the year.
And this could be much bigger if I get more and more users on it, which is the case since a few months, so queries should be built so that it could work with 10x even 50x more records...
Number of records for each timestamp is between 700 and 1000 right now, but this could grow too.
Here is the table structure in case it helps...
--
-- Table structure for table `LNA_TEAMHST`
--
CREATE TABLE `LNA_TEAMHST` (
`ID_LNA_TEAMHST` int(7) NOT NULL auto_increment,
`USERID` int(11) NOT NULL default '0',
`TEAMID` int(11) NOT NULL default '0',
`TIMESTAMP` varchar(14) default NULL,
`PLAYERID` int(5) NOT NULL default '0',
`GOAL` decimal(4,0) default '0',
`ASSIST` decimal(4,0) default '0',
`PENALTY` decimal(4,0) default '0',
`BONUS` decimal(4,0) default '0',
`LINEBONUS` decimal(4,0) default '0',
`LINE` varchar(4) NOT NULL default '',
`OUT` char(3) NOT NULL default '',
PRIMARY KEY (`ID_LNA_TEAMHST`)
) ENGINE=MyISAM AUTO_INCREMENT=9639 DEFAULT CHARSET=latin1 AUTO_INCREMENT=9639 ;
Old Pedant
10-05-2009, 09:43 PM
I don't understand where USERID comes into this. Unless you mean there is only one USERID per TEAMID in the LNA_TEAM table.
I'll make that assumption.
You have one major goof in there:
`TIMESTAMP` varchar(14) default NULL,
Ummm...you should *NEVER* use VARCHAR for date/time values (or for numbers).
It's also a bad idea, from a performance standpoint, to use DECIMAL(4,0). Just use INT, instead. And there is no need to give a size to an INT data type.
And how can you give a field that is a *NUMBER* of any kind a default value that is a *STRING*??? (All your default '0')
*I* think you should have designed that table thus:
CREATE TABLE LNA_TEAMHST (
ID_LNA_TEAMHST int NOT NULL auto_increment,
`USERID` int NOT NULL default 0,
TEAMID int NOT NULL default 0,
`TIMESTAMP` date NOT NULL, --or maybe datetime
PLAYERID int NOT NULL,
GOAL int default 0,
ASSIST int default 0,
PENALTY int default 0,
BONUS int default 0,
LINEBONUS int default 0,
LINE varchar(4) NOT NULL default '',
`OUT` char(3) NOT NULL default '',
PRIMARY KEY (ID_LNA_TEAMHST)
} etc.
As I noted, if your so-called TIMESTAMP field is a DATE (or DATETIME) field, then it becomes human readable. Much more sensible. But no matter what, it should never be a VARCHAR field!
Old Pedant
10-05-2009, 09:46 PM
2. I know it was not a very clever way to build the DB (and I can't change its design now) but I need to get:
- the team name that corresponds to the TEAMID in another table (LNA_TEAM)
- the user name that corresponds to the USERID found in LNA_TEAM in another table (LNA_USER)
No, you are wrong. It is *THE* clever and correct way to build the DB. Period. This part you have exactly correct. (Well, if any given USERID is found in one and only one LNA_TEAM table, then LNA_USER isn't needed...but even then it's not wrong. And if the same USERID *can* appear in more than one LNA_TEAM, then LNA_USER is clearly needed.)
Old Pedant
10-05-2009, 09:48 PM
Question: Do you want this report--top 3 teams with ties--for *ALL* values of TIMESTAMP?? Or for only one given TIMESTAMP, the most recent???
In your first post, it seemed you wanted the top 3 only for the most recent.
Old Pedant
10-05-2009, 10:32 PM
I *think* this does it:
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.USERNAME
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, TOTAL DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.TEAMID = S.TEAMID
AND U.USERID = T.USERID
ORDER BY S.total DESC
It is critically important for performance that the TIMESTAMP field in your table be indexed. It would also surely be a good idea to make TEAMID indexed.
Arnaud
10-05-2009, 10:42 PM
Edit: didn't see your above post before posting that. Can you confirm the above code?
- actually, this would not work since you kinda merged LNA_TEAM and LNA_TEAMHST, please read below. Thx!
All right. Thanks for all that.
As you could figure out I am not an expert... but I take good note of your remarks. Though, as said previously, I don't think it would be a good idea to change the design now, since it's live and running.
You wrote: And how can you give a field that is a *NUMBER* of any kind a default value that is a *STRING*??? (All your default '0')
- what should it be? NULL? - or - as I understand: INT and default '0' ?
You wrote: you should *NEVER* use VARCHAR for date/time values (or for numbers).
- can I keep it like that for now or will that impact what I am trying to achieve?
You wrote: I don't understand where USERID comes into this. Unless you mean there is only one USERID per TEAMID in the LNA_TEAM table.
- My mistake. There are 3 tables to take into account.
1. LNA_TEAMHST that contains both USERID and TEAMID -- and the TIMESTAMP and POINTS (goal, assist, bonus, etc.)
2. LNA_USER that contains the USER NAME
3. LNA_TEAM that contains the TEAM NAME
To one USERID can correspond up to five TEAMIDs. Each TEAMID and USERID are unique of course. Which also means that I could be displaying two teams from the same user in the top 3.
And yes, I want to report only the top 3 teams total scores for the LAST TIMESTAMP
So, to make it clear, I want to display:
- TEAMNAME
- USERNAME
- TOTAL --> SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY )
Top 3 for the last timestamp (having the highest total first, of course)
Thanks again for your help.
Old Pedant
10-05-2009, 10:59 PM
this would not work since you kinda merged LNA_TEAM and LNA_TEAMHST,
Huh???
Where??
What are you referring to???
***********
The problem with leaving TIMESTAMP as varchar:
It depends a *LOT* on ALL your values of TIMESTAMP having the same number of digits!
For example, if you had TIMESTAMP values like this:
9981331130000
9992711840000
10031125160000
then the last time stamp there would be the *smallest* of the three! That's because when strings are compared, they are done *character by character* and the first mis-match determines the sense of the compare. Since "1" is less than "9", that last one is smaller than the other two...even though *AS A NUMBER* it is larger.
I have no idea what value that TIMESTAMP column of yours is supposed to be holding. Is it a MySQL TIMESTAMP value???
Even if you don't change ANYTHING else, I would strongly urge you to change the datatype of the TIMESTAMP field. You *CAN* do that without losing any data, if you do it in steps.
Are those TIMESTAMP values actually dates? Or are they dates *and* times??
Old Pedant
10-05-2009, 11:01 PM
So, to make it clear, I want to display:
- TEAMNAME
- USERNAME
- TOTAL --> SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY )
Top 3 for the last timestamp (having the highest total first, of course)
So far as I can see, that is what my query should be doing.
Now, is it really efficient?? No, probably not.
Probably we could improve it a lot if we first got the latest TIMESTAMP value and then didn't even look at records with another TIMESTAMP. Do that either via two queries from PHP or by using a MySQL Stored Procedure.
Arnaud
10-06-2009, 02:18 PM
OK I understand.
Yes the TIMESTAMP field is a php generated TIMESTAMP using the TIME function: TIME - Returns the current time measured in the number of seconds since the Unix Epoch (January 1 1970 00:00:00 GMT).
Records (and TIMESTAMPS) in the LNA_TEAMHST (read LNA TEAM HISTORY) table are only there for a couple of months and correspond to dates from September 2009 to April 2010, and will never hold more than that (same months next year, and so on) so I don't think that
9981331130000
9992711840000
10031125160000
could ever happen...
But I will backup my DB and change the datatype as suggested.
Huh???
Where??
What are you referring to???
I do not see any reference to LNA_TEAMHST in your example:
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.USERNAME
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, TOTAL DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.TEAMID = S.TEAMID
AND U.USERID = T.USERID
ORDER BY S.total DESC
So I wonder how you could retrieve TIMESTAMP and BONUS, GOAL, etc. that are in LNA_TEAMHST without having that in the query???
Or then I am really missing something in your query :confused:
As I wrote, I have 3 tables:
1. LNA_TEAMHST that contains both USERID and TEAMID -- and the TIMESTAMP and POINTS (goal, assist, bonus, etc.)
2. LNA_USER that contains the USER NAME
3. LNA_TEAM that contains the TEAM NAME
TIA,
Arnaud
Old Pedant
10-06-2009, 08:56 PM
OHWADDADORKIYAM!
LOL! No wonder you thought I was nuts!
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.USERNAME
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, TOTAL DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.TEAMID = S.TEAMID
AND U.USERID = T.USERID
ORDER BY S.total DESC
Honest, I had that in there at one point in time and then did too much CUT after pasting.
Sorry!
Arnaud
10-07-2009, 08:42 PM
Ok, I get to the point but still it's not returning any result...
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.USERNAME
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, TOTAL DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.TEAMID = S.TEAMID
AND U.USERID = T.USERID
ORDER BY S.total DESC
I think the query is a bit too complicated because LNA_TEAMHST holds both TEAMID and USERID.
Btw. USERID in LNA_USER is ID_LNA_USER (TEAMID is TEAMID)
and TEAMID in LNA_TEAM is ID_LNA_TEAM (USERID is USERID)
in LNA_TEAMHST they are both called USERID and TEAMID
I have tried different combinations based on your code but still can't get it to work...
Hope you can help and thanks anyway for the time spent!
Arnaud
10-07-2009, 08:57 PM
This is what I came up with. But it does not seem to return anything...
Corrections are in red, hope they make sense...
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.PSEUDO
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, total DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.ID_LNA_TEAM = S.TEAMID
AND U.ID_LNA_USER = T.USERID
ORDER BY S.total DESC
Old Pedant
10-08-2009, 01:27 AM
Well, try debugging it a bit at a time.
Using a DB tool, of course, not in a PHP or whatever page.
Step 1:
SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, TOTAL DESC LIMIT 3
That should get you the 3 highest totals for the latest timestamp.
If not, then stop there and let's figure out why not.
*************
Step 2:
SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID
That should get you all totals for all teams on all days (timestamps). No ordering. Could be jumbled up mess.
But just checking the logic to make sure that's what you are getting.
********
Step 3:
SELECT S.TIMESTAMP, S.TEAMID, S.total
FROM
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP DESC, TOTAL DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
Now you should get the top 3 teams for the latest date. And if you have any ties that would mean you needed to get more than 3 teams, then they should show up as well.
************
If we get that far and it all works, then there just a bug in joining to the other two tables. Should be simple, then, to figure it out.
************
Again, for better performance, I'd like to see this converted to a stored procedure. But one thing at a time.
Arnaud
10-10-2009, 12:17 PM
Sorry for not getting back to this earlier.
Step 1: doesn't return what I want
Step 2: seems to be returning what you describe (to be checked)
Step 3: of course still doesn't work
Now let me try to figure out why...
Arnaud
10-10-2009, 12:55 PM
Yeah actually, everything works now. Only one GROUP BY clause was missing:
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.PSEUDO
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID
ORDER BY TIMESTAMP DESC, total DESC LIMIT 3 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.ID_LNA_TEAM = S.TEAMID
AND U.ID_LNA_USER = T.USERID
ORDER BY S.total DESC
It works fine now! Thanks for everything, I learned a lot from that one :thumbsup:
Old Pedant
10-10-2009, 08:46 PM
Oh, man, I missed that!
Well, I *did* say that if step 1 gave wrong answer you should stop there and figure out why. <grin style="sickly"/>
So obvious when I think 13 seconds about it! DOH.
Good catch.
Arnaud
10-12-2009, 09:07 PM
Mmmh... I am back on that one!
I realize that when 2 teams have equal totals, they happen to show both, as expected, but twice each.
For example:
1. Team: blue / User: mr blue / Total: 90
2. Team: yellow / User: mr yellow Total: 80
3. Team: green / User: mr green / Total: 55
4. Team: green / User: mr green / Total: 55
5. Team: red / User: mr red / Total: 55
6. Team: red / User: mr red / Total: 55
7. ...
8. ...
...
Here is the query
SELECT S.TIMESTAMP, S.TEAMID, S.total, T.TEAMNAME, U.PSEUDO
FROM LNA_TEAM AS T, LNA_USER AS U,
( SELECT TIMESTAMP, TEAMID, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID ) AS S,
( SELECT TIMESTAMP, SUM( BONUS + LINEBONUS + GOAL + ASSIST - PENALTY ) AS total
FROM LNA_TEAMHST
GROUP BY TIMESTAMP, TEAMID
ORDER BY TIMESTAMP DESC, total DESC LIMIT 10 ) AS best
WHERE S.TIMESTAMP = best.TIMESTAMP AND S.total = best.total
AND T.ID_LNA_TEAM = S.TEAMID
AND U.ID_LNA_USER = T.USERID
ORDER BY S.total DESC
Any idea what causes that?
Old Pedant
10-13-2009, 07:25 AM
Offhand, I don't see it, but there's an easy fix:
SELECT DISTINCT ...
will eliminate the dupes.
********
Oh, think I just saw why. Probably could fix it another way, but the DISTINCT is surely simplest.
Arnaud
10-13-2009, 03:18 PM
Ok, it works. Thanks!
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.