luigicannavaro
09-17-2007, 02:38 PM
Hi friends,
I as reading (finally) the article of the creators of Google:
http://infolab.stanford.edu/pub/papers/google.pdf
Could you explain me as it is possible index each webpage, generating a list of keywords and tie these keywords with each webpage?
For example:
The word: "Market". We imagine there are 1000 pages on web where exists the word "market".
In my "ignorance" I would create a database with two columns:
"keywords" and "URL". But IF the word "Market" is found in 1000 pages I would do 1000 records with keyword = "market" + its "URL" ?
Certainly no.
How do you would "link" the keyword "market" with each page where this keyword is found?
Thank you for "setal" your time!
Luigi
Daemonspyre
09-17-2007, 07:23 PM
2 tables would be the way to do this.
Table 1 would be the listing of URLs.
Table 2 would be listings of keywords with a field that relates back to Table 1.
1000 rows is nothing to any decent RDBMS.
And setal == bristles or thick hairs... Not sure I get the pun... :\
luigicannavaro
09-18-2007, 08:18 AM
Yes. 1000 it is a modest example!
But how would be the "configuration" of this table. In this way?
table A
keywords gototableB
Arsenal www.arsenal.com, www.football.com....
Girl www.nicegirl.com...
Liverpol www.arsenal.com, www.football.com....
Nice www.nicegirl.com, www.turism.com
Roma www.turism.com
Sao Paulo www.arsenal.com, www.football.com....
Spice www.nicegirl.com, www.turism.com
table B
www.arsenal.com
www.nicegirl.com
www.football.com
www.nicegirl.com
www.football.com
www.turism.com
Certainly not in this way. Right?
thanks
Luigi
Daemonspyre
09-18-2007, 01:36 PM
You are close, but you are not thinking like a database. Remember that databases are relational, and need to be 'normalized'. You need not have any replicated data where you do not have to have it.
There are a couple ways of doing it, and I will list them below:
Version 1 -- Similar to your idea, but with a more 'database' point of view.
Table 1 - Sites
--------------------------
Key Site
1 www.arsenal.com
2 www.nicegirl.com
3 www.football.com
4 www.turism.com
Table 2 - Keywords
--------------------------
Key Keyword SiteID
1 Arsenal 1
2 Arsenal 3
3 Girl 5
4 Liverpol 1
5 Liverpol 3
6 Nice 2
7 Nice 4
8 Roma 4
9 Sao Paulo 1
10 Sao Paulo 3
11 Spice 2
12 Spice 4
Version 2 -- Similar again, but with 3 tables. This is a little more dynamic and a lot more 'normalized'.
Table 1 - Sites
--------------------------
Key Site
1 www.arsenal.com
2 www.nicegirl.com
3 www.football.com
4 www.turism.com
Table 2 - Keywords
--------------------------
Key Keyword
1 Arsenal
2 Girl
3 Liverpol
4 Nice
5 Roma
6 Sao Paulo
7 Spice
Table 3 - siteKeywords
--------------------------
Key KeyID SiteID
1 1 1
2 1 3
3 2 5
4 3 1
5 3 3
6 4 2
7 4 4
8 5 4
9 6 1
10 6 3
11 7 2
12 7 4
Version 3 -- Totally different, a lot harder programatically (to code), and almost totally based on scripts (not recommended for the feint of heart)
Table 1 - Sites
--------------------------
Key Site KeyID
1 www.arsenal.com 1,3,6
2 www.nicegirl.com 4,7
3 www.football.com 1,3,6
4 www.turism.com 4,7
Table 2 - Keywords
--------------------------
Key Keyword
1 Arsenal
2 Girl
3 Liverpol
4 Nice
5 Roma
6 Sao Paulo
7 Spice
There are a few ideas for you. Honestly, if you are not very comfortable with databases and programming, or if you are a novice, I would go with option 2. Option 3 is for the experienced, and option 1 is for beginners.
HTH!
luigicannavaro
09-18-2007, 02:27 PM
Thank you.
I am tired of "theory" I want to see "practice" and "real code" not "infinitesimal calcul". The solution 1 it is trivial. Certainly between solution 2 and 3 must have a great difference in speed and storage. I think I have a similar code for doing the option 3 in my shares!
Best for you.
Luigi
luigicannavaro
10-18-2007, 03:25 PM
Daemonspyre,
I have decided reopen this thread because someone did a question - in true as ASK - and I am in doubt.
Taking into account your model for TABLE1 AND TABLE2
Table 1 - Sites
--------------------------
Key Site
1 www.arsenal.com
2 www.nicegirl.com
3 www.football.com
4 www.turism.com
Table 2 - Keywords
--------------------------
Key Keyword SiteID
1 Arsenal 1
2 Arsenal 3
3 Girl 5
4 Liverpol 1
5 Liverpol 3
6 Nice 2
7 Nice 4
8 Roma 4
9 Sao Paulo 1
10 Sao Paulo 3
11 Spice 2
12 Spice 4
Now, my question:
If I have a 3rd table with this values:
Table 3 - Source Table
=================
1 Arsenal Liverpool
2 Roma Sao Paulo
3 Nice girl
How do you would did for searching the values of SOURCE TABLE (table3) in keywords (table2) for finding at once, for example: "Arsenal Liverpool" if in table2 these values are in different records? good question?
Best
Luigi