PDA

View Full Version : SQLQuery for Trailing Days

agg.jjk
12-25-2011, 04:19 AM
I am having trouble writing a sql statement. Here is the scenario:

- My table contains the member_id and calendar_date that members of my social community logged in as follows:

(
Login_Key int NOT NULL AUTO_INCREMENT, -- unique identifier for the login event
Member_ID int NOT NULL, -- the member who logged in
Login_Date DATETIME NOT NULL -- the date they logged in on
)

I am trying to write a SQL query that will allow me to create a trend chart of the following:

- How many members who logged in during the past 7-14 days also logged in during the past 1-7 days
- For example (given today is Jan 14), if a member logged into the community during Jan 1 - 7, they would get counted if they also logged in during Jan 8 - 14.

Since this is a trend of percent of returning users over time, I believe I need to run a 7 trailing days calculation in SQL and also be able to calculate the denominator (all logins within the period) as well as the numerator (all who return during the next 7 days).

Here is some sample data that would be in the source table (DDL specified above):
1 123 12/27/11
2 123 12/27/11 (logged in twice on this date, could have multiple logins on a day but they we do not count more than the first)
3 123 1/12/12
4 123 1/13/12

In case this question is difficult to follow, here is an example of what the result set might look like (if it is slightly different, I can make some adjustments, group by, etc): * Record_No included for the discussion below *

Week_Ending_Date Member_ID Logged_In_Prior_Week Logged_In_Current_Week Record_No
1/10/12 123 YES NO 1 (prior week = 12/27/11 - 1/2/12; current week = 1/3/12 - 1/10/12)
1/11/12 123 NO NO 2 (prior week = 12/28/11 - 1/3/12; current week = 1/4/12 - 1/11/12)
1/12/12 123 NO YES 3 (prior week = 12/29/11 - 1/4/12; current week = 1/5/12 - 1/12/12)
1/13/12 123 NO YES 4 (prior week = 12/30/11 - 1/5/12; current week = 1/6/12 - 1/13/12)

I will count the records that meet my criteria (record 1) and then divide (Logged_In_Current_Week/Logged_In_Prior_Week) and trend it across all of the days (even though Member_ID 123 did not contribute to 1/1/12, 1/11/12, and 1/13/12, other members likely would.

This is not as simple as looking for the week portion of the date and grouping by that as each day the trend report would change (as opposed to changing every seven days). I think I need a trailing 7-day calculation or a moving window. I also need to keep this flexible as someone will invariably want the same calculation for a 30-day look back (with prior period = 60-30 days ago and current period = 30-1 days ago).

I have a DATE dimension table that has a record for each day if that helps. If this is not possible with a query, then I will have to write a cursor to load a table and read from that but it seems possible with a query.

Thank you for contributing some thoughts.

Old Pedant
12-25-2011, 10:23 PM
First, a minor comment: Your LOGIN_KEY field is a waste of time and space. By definition, it is that table's PRIMARY KEY (since an AUTO_INCREMENT field must be the primary key) and yet no other table uses it as a FOREIGN KEY, so it's actually just clogging up your database.

A better way to have defined that table would probably have been

(
Member_ID int NOT NULL, -
)
You'd probably get better performance from that.

Anyway...

I assume you want to do this trend analysis on a per MEMBER_ID basis, given the example you showed?

Yes, having that separate table with all dates would be very handy. The query without it would be pretty complex, but with it it's not hard at all.

I'll assume your all-dates table looks something like this:

CREATE TABLE allDates (
theDate DATETIME );
(and of course it could have other fields, but we don't need them for this query). And I'll also assume the existence of an "allMembers" table.

So...let's tackle it:

SELECT A.theDate,
M.member_ID,
IF(COUNT(H2.*)>0,'yes','no') AS priorweek,
IF(COUNT(H1.*)>0,'yes','no') AS currentweek
FROM allDates AS A
INNER JOIN allMembers AS M
ON DATE(H1.Login_Date) BETWEEN DATE_SUB(A.theDate,INTERVAL 6 DAY) AND A.theDate
AND H1.member_ID = M.member_id
ON DATE(H2.Login_Date) BETWEEN DATE_SUB(A.theDate,INTERVAL 13 DAY) AND DATE_SUB(A.theDate,INTERVAL 7 DAY)
AND H2.member_ID = M.member_id
GROUP BY M.member_id, A.theDate
ORDER BY M.member_id, A.theDate

If you want to limit it to a single member_id, just add

WHERE M.member_id = \$memid
just before the GROUP BY.

Old Pedant
12-25-2011, 10:23 PM
Above is untested, off the top of my head. It should work, but if not let me know.

agg.jjk
12-26-2011, 11:19 PM
Thanks. I agree with the ID column... point well taken.

There is no "ON" clause for the first inner join. Is this intentional?

I am going to test it out.

Thanks again.

Old Pedant
12-27-2011, 02:15 AM
> There is no "ON" clause for the first inner join. Is this intentional?

Yes. And it's possible that MySQL may choke on that. If so, give it a dummy ON clause, such as ON 1 = 1.

We *want* all possible days and all possible members, so we want the full cartesian product--all rows from both tables with no limitations.

If it's not obvious, you can limit the dates test by adding

WHERE a.theDate BETWEEN '2012-1-1' AND '2012-1-31'

or whatever date range you want.