View Full Version : Multiple tables and last 30 days
IronCode
07-16-2003, 03:36 PM
What i'm trying to do is query two tables for the last 30 days and i'm not quite sure how to do that. Here's what i've got but it don't work (i get an error),
SELECT a.count,b.count as Rcount FROM $Table_links a, $Table_referer b WHERE
a.TO_DAYS(NOW()) - TO_DAYS(date) <= 30 && b.TO_DAYS(NOW()) - TO_DAYS(date) <= 30
Any help would be great, thanks in advance for you help.
What's all that ?!
First, if you use count, you need count(*) as rcount. And you can't execute two count like that + count doesn't have a tablename in front of it (it's a count on the view that is created when the two tables are joined.)
If you use count without a group by clause, then the query returs 1 value --> the number of rows.
so
select count(*) as rcount from tablea
will give you the number of rows in tablea
Joining:
SELECT a.*, b,* FROM a INNER JOIN b ON a.var1=b.var2
a en b are tablenames. var1 and var2 are the variabelnames from both table, where the values from var1 match those of var2 (like when b = table with products, that has var2 as the primary key, and a = table with orders that has var1 as foreign key to table b (var 1 contains product ID's.
About the condition, that doesn't look right. You need something like
WHERE ((TO_DAYS(NOW()) - TO_DAYS(a.date)) <= 30) AND ((TO_DAYS(NOW()) - TO_DAYS(b.date))<= 30)
where date is in both tables the name of a date-column
IronCode
07-16-2003, 04:29 PM
lol the count is a colum name in the table, NOT the count() statement for mysql. That works fine with the other queries i have and the only issue with that query is getting the last 30 days working. Now keep in mind that i'm not saying that it's a query that was designed "correctly" and if you can think of a better approach :) But I'm not using a count() statement.
The only link between the two tables would be the last 30 days. But i can't get that part to work. I tried your snippet and that didn't work either.
So will i have to come up with anothe Joining column in order for this to work then?
I'd strongly advice against using names like count or date (date is disallowed under SQL 99)
But on to the problem.
I don't think you completely understand what a join is. It's a way to build records with values in it from two or more table.
For instance. Say i have a table : order, with the fillowing columns : orderID, userID, productID, clientprofile
the records could look like
1 | 1 | 25 | 23
2 | 9 | 23 | 25
So they only contain ID's from other table.
The productstable could look like
ProductID|Description|Price
1 | Book blabla | 25.23
23 |Film | 36.56
25 |CD|14.23
With a join like
select products.price, order.orderID from order inner join products on order.productID=products.price
You can build a records like
Price|OrderID
14.23|2
36.56|1
But you can't lay a relation on something like "the record need to be younger then 30days."
Are you sure you don't need a UNION ?
(SELECT a.count FROM a WHERE (TO_DAYS(NOW()) - TO_DAYS(a.date)) <= 30) UNION (SELECT b.count FROM b WHERE (TO_DAYS(NOW()) - TO_DAYS(b.date))<= 30)
Sou you'll get one recordset with all records from both tables that are younger then 30 days.
IronCode
07-16-2003, 08:34 PM
I've really just started the project and am in a position to redo the data table completely if needed. I'd rather attack an issue now then wait tell i'm so deep that redoing the tables would cause twice the work.
I will change the count and date tables along with any other that would match a SQL statement. I do appreciate the advise.
That would explain a few things with the arrays that are being returned.
As for the union i have n idea what a union is lol, what i'm doing is basically tracking users info on a click tracking app that i'm writing. Each item has it's own table to hold the info (table_ip, table_resolution, table_os, ect) There is nothing htat really links the two tegother at the moment other then the dates at which they where entered.
I would really appreciate some advise on how to approach this. Since now after reading your post i'm concerned that i'm setting myself up for some really bad performance and some serius bugs.
Thanks,
Toby
OK. Lets take this from the top. Starting with
what i'm doing is basically tracking users info on a click tracking app that i'm writing
What's that? Are you registering the ip, os and screenresolution of your clients?
Each item has it's own table to hold the info (table_ip, table_resolution, table_os, ect) There is nothing htat really links the two tegother at the moment other then the dates at which they where entered.
I don't get that. What's an item here? Why do you need a table for all seperate things?
Instead of seperate tables, can't you use 1 table? Like
sessionID|ip|resolution|os|startsession|endsession|userID
This sessionID would be an autonum variabel (not the PHP sessionID). If you have this sessionID, then you can insert that in other tables. Say you have a table like pages.
pageID|pagetitle|url|modifiedate|createdate
Then you could build a pageviews-table like
pageviewID|pageID|sessionID|datetimeview
If you then need a count of the number of page that were voiewe the last 30 days, you only need to select on this lest table, using the datetimeview column as condition, and join with the page and the sessiontable (and from the sessiontable to the userstable) to get all the info you need.
But maybe i'm completely off. So fill us in about the two questions at the top.
IronCode
07-17-2003, 01:20 AM
Your on the right track but lookin in the wrong direction. What i'm writing is just a click tracking program. If you go to a site and surf there site a little the program will take into account your IP, resolution, color depth, referer, and a number of other items including the links you clicked on and the date of your visit.
The "item" reference was to the data that gets collected. Each "item" (Ip address, resolution, referer, ect..) is considered an "item"
So for example here is the IP address table,
IP_ID|ips|date|count
IP_ID = unique auto-increment
ips = user's IP address
date = date info was taken
count = How many times the visitor has clicked something
I have a total of eight tables (browser,color depth,ip address, javascript(enabled/disabled),links,os,referers,resolution) containing info. The more i think about putting it in a single table then more i like it but also wander of the performance after a period of time.
I thought being the newb i am that keeping the data in different tables would be easier for the MySQL server since the tables that are being queried would be smaller and easier to find the info apposed to a large single table containing thousands of entries and having to sort through them.
For example if i wanted to query for jsut the browser totals all i would have to do is query that table and group by browser. The table would be smaller since the links table will be the largest due to there being more links being clicked then there are browsers in total.
So if yopu go to my site and surf around i will take all the above mentioned data insert it into the appropraite tables and as you surf i'm collecting the links that your clicking on also. You may have clicked 15 different links but your browser and resolution has not changed suring the entire "session"
Again the thinking behind the multiple tables was to be able to keep the data completely free of other data making for smaller faster quaries, but then again i'm a newb so help me out with the idea here and point me in the right direction because i'd rather not find out later that i have to restructure everything and recode all the quaries lol
Thanks again
I think you only need 2 tables. ('that is factstables' --> see below)
When you build a datamodel (preparation for datadesign, you indeed need to think in 'levels'. The session is one level, and like you said yourself, the clients 'propertys' wount change during the session. So your clientstable should look like sessionID(autonum)|browser|color depth|ip address| javascript(enabled=1/disabled=0)|os|referer(s) (single)|resolution|sessionstart(datetime or timestamp)|sessionend(idem)|Totallinks?
The pageview is another level.
It would probably be a good idea to have a browsertable (browserID|label --> 50 records ?) and only include the browserID inside the clientstable. Same for the OS (5 records?). Maybe even for the color depth (5? Maybe with some synonys) and resolution (5?). --> that way, you can have only numerical variabels, that can all be indexed (except for the datetimes). This will speed up the db considerably. And the joins will take some off gain off (but probably not all since they are fairly small), but it's easier to maintain and it will enable you to group them when they are enterd the table.
Your links-table would look like
clickID|sessionID|pageID|requesttime (datetime or timestamp)
(all numerical and indexed, except datetime)
You then have a pagetable like
pageID|adress|creationdate|modifiedate|totalhits|timetotalhits
Now, the clienttable, will probably only have a few 10K records. Which is no problem. Even a table with a few 100K records woun't be any problem. Because it's probably usefull info, but you'll rarely need to know instantly how many of your clients used Internet Explore 5.5 up until that precise second. So it would be no problem that this admin or so need to wait 1 second before he gets the results. If you have a very bussy site, or if you do wanna show this sort a statistic frequently, then it would be better to dailly run a query that updates the browsertable with the counts (using "select distinct count(*) as browsercount from clients group by browser" This generates a table with 1 record for each broser + total number of sessions that used it. you could then update the browsertable with this info (even in one query)
So when a new session start, you only insert in the big table (called a 'factstable'). And to get stats, you read them from the small tables (called 'dimensions'). You could update these dimensiontables daily (overnight) or each hour or every time there are less then 50 users on your site or ... You see the power and logic behind relational databases ?--> you record both the details and periodically compute the aggregated figures you frequently need to report. (so the stats aren't 'real time' in this scenario. If you don't have a lot of users, or more precise, if you don't need to deliver these stats frequently and a delay of one second is acceptable, or if the number of hits in your linkstable in reasonable, you can compute them on the fly. But even then, it would be better to keep precomputed values in your pagetable for older session (see below) and just add the two together)
The linkstable is another factstable.
You could do the same for each page there.
Now, i assume that you're not verry interested in the records that are older then a certain date. So you could archive the records after a few months --> insert them into an archive table (possibly even another db) and delete them from the factstable. (and if you compute stats on the fly, increment the number of archived links in the pagetable)
You could then run all sorts of fancy statistics like 'is there a significant difference in OS between page1 visitors and page 2 visitors ?'
To do so, you could then create a big table with all the data (so for each record from the previous linkstable, you could fill in all the session-info and page-info. You could even replace the browserID's etc with the browser-label. This is called a 'flatfile' --> no more dimensions and relations, all the info on 1 record.
Now it will be fairly slow to run query's on that table (because of the number of records), but who cares? You run this in another db or maybe even on another machine? And the pre-computed results, can then be used for your online application. Or you could allow users to run that sort of intensive query's, but the responsetimes will be somewat higher (few seconds probably cause a flatfile is quite quick) and best of all, other users woun't be hindered.
You see? If you analyse which info (aggregation, detail, dimensions) you need to deliver how fast and how frequently to what sort of user (an admin can wait a few second, but a client not etc), you can build your db-design and data-storage and data-analyses around it. And by following relational db-logic (like the 'starscheme-design' above), you may be quite sure that your db will be able to meet future and unforeseable demands.
Need more info, just write back.
IronCode
07-17-2003, 09:00 PM
You know that makes perfectly good since. I'm going to take your advise as it is much more flexable and actualy easier to work with.
I'll post if i need anymore advise :)
Thanks man.
vBulletin® v3.8.2, Copyright ©2000-2009, Jelsoft Enterprises Ltd.