PDA

View Full Version : evaluate my design please!


ConfusedOfLife
07-24-2003, 11:27 AM
Hi

Actually this is a kind of semi mySql-PHP question! I'm writing a news manager, that for each day you can enter multiple news. So, when I wana show the news to the client, I see what date he chose, and then I show the news of that date for him. Well, it's pretty normal till now, but here's the part that you should judge:

When I wana show the news to the client, I also have a link called "Show All News", by pressing that button, the client can see all the news in the db, let's say 10000000 records! So, for doing this, I use forward and backwar buttons and I don't show everything in one page, it means I show it page by page (like phpMyAdmin), so, what I do is that I read all the contents of the db and then put it in my array, and then show some indexes (from 5 to 10 for example) to the client. It means I don't use the mySql LIMIT keyword for doing that, because I need to know the total number of records for showing to the client and also for calculating the records of the last page (in case my client wants to go to the last page). I wana know if it's an efficient way, I mean not using the LIMIT clause, but reading everything, and showing just some indexes. When I wanted to write it, I thought that I have to get all the news from the db, because I need mysql_num_rows anyway, so, I just used one query. Is it going to be so slow when their records grow?

raf
07-24-2003, 02:16 PM
Is this a genuine question ?
You have 10000000 entrys (really ?) and you wana show them all, and you are concerned about the number of entrys on the last page ?
I frankly don't see why you shouldn't use LIMIT. If you use "select count(*) as totrecords from table", you know the number of entrys, so it would be peanuts to compute the number of entrys on the last page.
Selecting the complete table of 10000000 records just to get the count is crazy. Storing all that in an array ?? Why?? You don't really think someone will ever browse through that huge set ? And yes, the query will be extremely slow and generate massive networktraffic and goes straight against basic databasedriven applications logic.

It's better to run 100 querys that get 10 records (i'm just assuming someone is as crazy to look through 100 pages), then to run 1 query to get 1000000 records.

Besides, i can't believe it wouldn't be smarter to archive news-entrys older then X months (for instance one table for each year) and then just search on the tables you need. Noone will run through the complete collection, so if they search for a speciffic entry, it would be better to have them indexed (trhematically, or with keywords, or store each entry under a catagory or whatever) or run a full text search on them.

Anyway, i suppose your a smart guy who knows what he's doing, but i'd recommend condcidering using count(*) to get the number of entrys, and LIMIT for the recordset paging.

ConfusedOfLife
07-25-2003, 12:59 AM
i suppose your a smart guy who knows what he's doing


Thank you! But I'm sorry that I have to disappointe you, don't really count on that!! :D

Ok, Ok, by saying 1000000 records I really didn't mean that, I just meant the worst condition, and now I see what you mean. Lemme clear somethings up, first of all I have all those other way of searching: date search, key word search and the latest news display. This one is only for the archive part that I wana show everything, and of course I don't think that anybody ever wants to see all of that, but then I have to have that option in my app. You're right, I was just curious to see how it looks like from a more experienced's view (I know you don't like the term professional!), and I'll change it, thanks :thumbsup:

Also about the dates part, the part that gives the opportunity to the client to choose his special date, I'm using 3 combo boxes, Year, Month and Date, and these comboxes do not simply show days from 1 till 31 or month from 1 till 12, but rather I get all the dates from the db, and then by some nested PHP loops I get the total years, available, month for each year, and available days for each Year-Month, it means I'm using JS to have my 3 dynamic combo boxes, in other words, if you see a date in my 3 select boxes, it means that it definitely exist. What's your idea about this one? Is it worth it to make that effect or it's better that I do it in a more conventional way?!

raf
07-25-2003, 09:38 AM
A very good function, but wrongly implemented.
Very good, cause in a databasedriven webapp, you should never get the message "sorry, no xxxxx found for the values you selected" --> the values shouldn't be one of the selected values then.
Wrongly implemented, cause this way is not universal (11 % or so doesn't have javascript enabled) + don't do clientside what you can do serverside + your selecting and transporting way to much data (like 3 variabels from all records ...) + processing will be noticably slower on a large table

You need to do this in your sql statement. Like
SELECT distinct day, month, year FROM table GROUP BY day, month, year ORDER BY year desc, month desc, day, desc

Now, this will still be problematic, cause this could turn up a lott of records + it's probably more userfriedly to first display a year box (dynamically build with a 'select distinct year') and then display a month box (with "select dinstinct month from table where year='". $_POST['year'] ) and then the day box (same statement buth with an additional condition). So no clientsided javascript on a big array, but 3 seperated server sided querys.

Performance will be just as good --> maybe a slightly bigger delay between the year - month - day dropdown, but the initial page (year) will load way faster.

Now, since these are archived records --> whenever i can (= when the data is 'frozen') I create a precomputed dataset. So in your case, i'd make a table with that
INSERT INTO date (day,month,year) SELECT distinct day, month, year FROM table GROUP BY day, month, year ORDER BY year desc, month desc, day, desc

And then your querys to generate the dropdown only need to run on this table with all occuring combinations.

ConfusedOfLife
07-25-2003, 11:00 AM
Now, since these are archived records --> whenever i can (= when the data is 'frozen') I create a precomputed dataset. So in your case, i'd make a table with that
INSERT INTO date (day,month,year) SELECT distinct day, month, year FROM table GROUP BY day, month, year ORDER BY year desc, month desc, day, desc

And then your querys to generate the dropdown only need to run on this table with all occuring combinations.


Thank you raf, your ideas, suggestions or whatever you call it are really valuable to me. Speaking about those 11 % of ppl was very cool, because most of the things that I write in here is in Farsi (my language) and most of the times I don't consider lots of ppl, and most of the times I forget that I'm coding server side, it means I shouldn't play with JS functions too much and I shouldn't let them fool me by their beauty!

Ok, by saying making that precomputed table of frozen data (I'm really learning new terms in here), do you mean that whenever you want to make such a selection, you first make that table, and then you do the select thing in that? It means that inserting into a table in that way is faster than querying the original table, right? I think that we should reset your precomputed table whenever we wana make a new query, because maybe the data in our original table was changed/deleted, or are you saying that whenever it happens in the original table, we should change it in the precomputed table too? This way it's gonna be kind of harder I think.

PS: I know that it's not related to here, but do you use classes in your apps? Because makign these queries are very easy, but I'm trying to learn some OO concepts, and I use my collect class for collecting elements that this class itself inherits from the db class, it means basically I try not to use the db class directly and only use it inside the collect class. So, for making each new query, I have to write a new function for the collect class, that's pretty tough, I'm thinking about forgetting this OO thing and go right back to my linear programming!

raf
07-25-2003, 09:27 PM
Basically there are three possible situations
A. the original table is frozen --> no records deleted - inserted.
Here you can create the agregated table once and it doesn't need updating.
B. the original table is operational --> records deleted, created, updated. Here you can create the egragated table periodically. Every day, overnight, each hour, or each time the original table was changed.
Normally, the data in the agregated table wount be completely up to date. (Like for instance the bank-statements in my on-line banking tool --> the saldi get updated overnight).
If it really need to be accurate to the last second (called "real time data") then you need some sort of "on update cascade" or updating mechanism. How, will depend on the sort of data
C. The original table is periodically updated --> in most cases, records are periodically appended, like in a table where newsenry are archived monthly. In this case, you rebuild the table right after the original table is updated --> in your case, it will probably be by appending the new occuring combinations to the aggeration table.


Now, in your case, i'd think that this would be the best sollution:
create an aggregation table like
date | year | month | day | count | archived
if you insert a new news entry, you check if the current dat is already registerd in the aggregation table. If so --> update the countvariable (set count = count + 1). If not, insert new record with current date. If you delete a record, you need to update (set count = count - 1).
So when you build the dropdowns, you select on this table and only on the records with count > 0.

Running a search on this table will be a lot faster, and it doesn't matter if the entry is on your opperational table or an archive table or whatever. With the archive variable, you can check where the entrys for that day are stored.


About classes : I hardly use them. Building OO fraimwork require a lott of analyzing and quite some extra innitial coding before the reusability starts paying off. But i'm not a big expert on it. I've seen some projects though that got stuck in the analyszing fase or where coders weren't disciplined enough to document there code (or to look at the documents) and the pottential went to waste.

It's certainly a good way to approach a coding-project, but i'm not sure the practical implementations are always that great.

ConfusedOfLife
07-25-2003, 09:57 PM
Thank you raf, your comments are really invaluable. About the classes, you are right, it only takes lots of time and then it brings lots of stress! I had some nightmares during the past month if I'm not breaking any OO rule, you know what I mean! It's just all so beautiful, but then the question comes that if it's really that valuable :rolleyes:

raf
07-25-2003, 11:15 PM
It's just all so beautiful, but then the question comes that if it's really that valuable Nice one.

If it's a big app and if you have a large team working on it and it needs to be modular and easy to build further on it,then i suppose it becomes the bestoption (if your business can be translated into objectcs ...). In any case, when you think about bigger apps, you almost automaticallys start to think in term of objects, variabels and methods. But it takes a lott of business knowledge (= knowing the content) to set up a good OO framework ...