CodingForums.com

CodingForums.com (http://www.codingforums.com/index.php)
-   MySQL (http://www.codingforums.com/forumdisplay.php?f=7)
-   -   something for brain training (http://www.codingforums.com/showthread.php?t=251841)

BubikolRamios 02-17-2012 05:35 AM

something for brain training
 
Google visits my site and I log things:

data
Code:

page1  date1
page2  date2
page3  date3
page4  date4
page1  date5
page2  date6
page1  date7

Need to drill out average date difference
for any page visit.
so if date 1,5,7 would be 30.01.2001,10.01.2001,01.01.2001 respectively
the result for page1 would be 15 (approx)

how to do that, if possible at all ?

Old Pedant 02-17-2012 06:52 AM

It's possible, but it won't be an efficient query. I had to do something like this once upon a time.

Depending on how often you will do this and how much data there is to process, you might be best off doing it in a stored procedure and building a temporary table that holds the pairs.

How much data is there? How often will you run the query?

BubikolRamios 02-17-2012 07:03 AM

like 100.000 records now.
once or twice, just to see how google responds to my
<changefreq>
suggestions.

Old Pedant 02-17-2012 07:26 AM

Here is one way to do it.

You don't HAVE to create the views to make it work--you can do it with nested queries--but it's easier to see how it works by using the views.
COMMENT: First, here is the data I am using. Like my table name?

Code:

mysql> select * from bubikol;
+------+------------+
| page | ondate    |
+------+------------+
|    1 | 2001-01-30 |
|    2 | 2010-04-15 |
|    3 | 2011-10-10 |
|    1 | 2001-01-10 |
|    2 | 2010-02-15 |
|    3 | 2011-10-08 |
|    1 | 2001-01-01 |
+------+------------+

COMMENT: This first view just organizes the data in a reason able way.
Code:

create view bview as
select b1.page, b2.ondate AS startdate, b1.ondate AS enddate
from bubikol as b1, bubikol as b2
where b1.page = b2.page
and b1.ondate > b2.ondate;

COMMENT: Here's a "dump" so you can see what it is doing:

mysql> select * from bview;
+------+------------+------------+
| page | startdate  | enddate    |
+------+------------+------------+
|    1 | 2001-01-10 | 2001-01-30 |
|    2 | 2010-02-15 | 2010-04-15 |
|    3 | 2011-10-08 | 2011-10-10 |
|    1 | 2001-01-01 | 2001-01-30 |
|    1 | 2001-01-01 | 2001-01-10 |
+------+------------+------------+

COMMENT: Then this second view throws out the rows from the first view that we don't want.
Code:

create view bview2 as
select page, startdate, min(enddate) as enddate
from bview
group by page, startdate;

COMMENT: Again, a "dump" so you can see what it did:

mysql> select * from bview2;
+------+------------+------------+
| page | startdate  | enddate    |
+------+------------+------------+
|    1 | 2001-01-01 | 2001-01-10 |
|    1 | 2001-01-10 | 2001-01-30 |
|    2 | 2010-02-15 | 2010-04-15 |
|    3 | 2011-10-08 | 2011-10-10 |
+------+------------+------------+

COMMENT: See how it worked? It chooses the enddate closest to the startdate in each case.


***************************************
COMMENT: This next dump is just so you can see how to get the number of days in each case.
We don't actually use this query, per se, in the answer.

Code:

select page, startdate, enddate, datediff(enddate,startdate) as daysdiff
from bview2
order by page, startdate;

+------+------------+------------+----------+
| page | startdate  | enddate    | daysdiff |
+------+------------+------------+----------+
|    1 | 2001-01-01 | 2001-01-10 |        9 |
|    1 | 2001-01-10 | 2001-01-30 |      20 |
|    2 | 2010-02-15 | 2010-04-15 |      59 |
|    3 | 2011-10-08 | 2011-10-10 |        2 |
+------+------------+------------+----------+

*******************************

COMMENT: And the final result:

Code:

select page, AVG(DATEDIFF(enddate,startdate)) as averagedays
from bview2
group by page
order by page;

+------+-------------+
| page | averagedays |
+------+-------------+
|    1 |    14.5000 |
|    2 |    59.0000 |
|    3 |      2.0000 |
+------+-------------+

Okay?

Old Pedant 02-17-2012 07:57 AM

If you care, here the whole thing is as a single query, no views:
Code:

SELECT page, AVG(DATEDIFF(enddate,startdate)) as averagedays
FROM (
      SELECT page, startdate, min(enddate) as enddate
      FROM (
              SELECT b1.page, b2.ondate AS startdate, b1.ondate AS enddate
              FROM bubikol as b1, bubikol as b2
              WHERE b1.page = b2.page AND b1.ondate > b2.ondate
            ) AS phase1
      GROUP BY page, startdate
    ) AS phase2
GROUP BY page
ORDER BY page;

If it's not obvious, it is the "phase2" (the bview2) that will eat up performance time.

And I tried adding indexes on both fields in my "bubikol" table and found that only the index on page makes any difference at all (according to MySQL's EXPLAIN feature) and even then not that much.

Old Pedant 02-17-2012 08:08 AM

NOTE: Adding an index on page field helps a little, according to EXPLAIN. Adding one on the ondate field does nothing.

BubikolRamios 02-17-2012 08:39 AM

aha, I did not thought of min(enddate)
as
Quote:

and b1.ondate > b2.ondate;
joins to multiple dates.


All times are GMT +1. The time now is 05:10 PM.

Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.