01-04-2012, 11:00 PM
I wrote a query to roll up some information. I normally work with MSSQL and this is exactly how I'd do it there. But Something is off with the group by statement or maybe the join. Here is the query:
select month(InteractionDate), sum(Charged+Tip+PrintPurchase) as Collected, month(PurchaseDate), sum(Cost) as Spent
from admin_Interactions i
LEFT JOIN admin_Expenses e on month(i.Interactiondate) = month(e.PurchaseDate)
where i.Deleted=0 and e.Deleted=0
group by month(InteractionDate), month(PurchaseDate)
The problem is that the join is producing multiple rows of data so the sum is adding it all up and all the numbers are too big. So I understand the issue, and I understand date fields are not ideally the best to join on. But in this case it is the only link between the two tables. It should work. I want to know all the income for January compared to all the expenses for January. January is common. Is there something happening in the month() function that I'm not understanding? How can I get what I'm after?
Thanks for the help.
01-05-2012, 12:01 AM
First of all, you don't HAVE a LEFT JOIN, given that query.
That post was originally posted (by me) in a SQL Server forum, as a matter of fact.
When you put e.Deleted=0 into the WHERE part of that query, you effectively converted your LEFT JOIN to an INNER JOIN.
But even aside from that, I completely disagree that this would work in SQL Server any differently than it does in MySQL.
The results you are getting are 100% expected, in *ANY* relational database.
Consider these records in the two tables (showing only Charged for simplicity):
So use MONTH(...) instead and you have:
Join them on Month(...) and you get these complete records:
Month(iD) Charged Month(pD) Cost
1 100 1 50
1 200 1 50
1 100 1 75
1 200 1 75
And so your sums will indeed be double what you expected.
And the more records there are in the given month, the more this is compounded.
This is called the CARTESIAN PRODUCT of all matching records.
You can *NOT* code this query this way and make it work in any relational DB.
01-05-2012, 12:04 AM
That's actually quite helpful to see it spelled out like that. Thanks. Ok...so...what recommendations exist. Does a cross reference table make sense in this case?
01-05-2012, 12:09 AM
I'm pretty sure this is what you are after:
SELECT * FROM
( SELECT MONTH(InteractionDate) AS collectionMonth, SUM(Charged+Tip+PrintPurchase) as collectionTotal
WHERE deleted = 0
GROUP BY MONTH(InteractionDate)
) AS collections,
( SELECT MONTH(purchaseDate) AS purchaseMonth, SUM(cost) AS purchaseTotal
WHERE deleted = 0
GROUP BY MONTH(purchaseDate)
) AS purchases
WHERE collections.collectionMonth = purchases.purchaseMonth
ORDER BY collections.collectionMonth
You *could* change that to a LEFT JOIN if you really think there might be some month where you have collections without having any purchases. (Or a right join if vice versa.)
01-05-2012, 12:25 AM
Ah yes. Derived tables. I always have a hard time recognizing when is right to use them. I'll try this out and report back tomorrow. We actually do have some months with no expenses and sadly some months without income. Just starting our little business and trying to get the peices in place for the future. Thanks for the help.
01-05-2012, 01:11 AM
Unfortunately, MySQL doesn't suppor FULL OUTER JOIN, which would be what you would need if you have some months with no income, some with not expenses.
So there's no easy way for you to make sure each month shows up in your result with just what you have there.
There are two possible ways to fix this: (1) Have a table of months and make it the left most table and join the two inner SELECTs to it, (2) create a pseudo-table of months.
The separate table would probably be a good idea, but we can hack at it with another pseudo table.
But as long as we are improving thints...What happens when you want to get the YEARs separately??? As written, that prior code will lump January 2012 with January 2011 and January 2013.
SELECT allMonths.theYear, allMonths.theMonth, collections.collectionTotal, purchases.purchaseTotal
( SELECT YEAR(InteractionDate) AS theYear, MONTH(InteractionDate) AS theMonth FROM admin_Interactions
SELECT YEAR(purchaseDate), MONTH(purchaseDate) FROM admin_Expenses
) AS allMonths
( SELECT YEAR(interactionDate) AS Y, MONTH(InteractionDate) AS M, SUM(Charged+Tip+PrintPurchase) as collectionTotal
WHERE deleted = 0
GROUP BY Year(interactionDate), MONTH(InteractionDate)
) AS collections ON collection.M = allMonths.theMonth AND collections.Y = allMonths.theYear
( SELECT YEAR(purchaseDate) AS Y, MONTH(purchaseDate) AS M, SUM(cost) AS purchaseTotal
WHERE deleted = 0
GROUP BY YEAR(purchaseDate), MONTH(purchaseDate)
) AS purchases ON purchases.M = allMonths.theMonth AND purchases.Y = allMonths.theYear
ORDER BY allMonths.theYear, allMonths.theMonth
When you UNION, you automatically get DISTINCT values (unless you use UNION ALL) so the first subquery there will indeed get you all months--one time--that are in either of the two tables.
By this point, the query is getting ugly enough that I would probably rewrite it to join 3 VIEWs together. Each view simply being one of the inner queries.
01-05-2012, 04:12 PM
You're a rockstar. This stuff is awesome for the level that I am at. I agree, the query is getting ugly and clunky. I like the idea of making each join it's own view. I haven't had a chance to do these ideas but I totally see what you're doing. Thanks again.