...

View Full Version : The Anatomy of Web Search



luigicannavaro
09-17-2007, 02:38 PM
Hi friends,

I as reading (finally) the article of the creators of Google:
http://infolab.stanford.edu/pub/papers/google.pdf

Could you explain me as it is possible index each webpage, generating a list of keywords and tie these keywords with each webpage?

For example:

The word: "Market". We imagine there are 1000 pages on web where exists the word "market".

In my "ignorance" I would create a database with two columns:
"keywords" and "URL". But IF the word "Market" is found in 1000 pages I would do 1000 records with keyword = "market" + its "URL" ?

Certainly no.

How do you would "link" the keyword "market" with each page where this keyword is found?

Thank you for "setal" your time!

Luigi

Daemonspyre
09-17-2007, 07:23 PM
2 tables would be the way to do this.

Table 1 would be the listing of URLs.

Table 2 would be listings of keywords with a field that relates back to Table 1.

1000 rows is nothing to any decent RDBMS.

And setal == bristles or thick hairs... Not sure I get the pun... :\

luigicannavaro
09-18-2007, 08:18 AM
Yes. 1000 it is a modest example!

But how would be the "configuration" of this table. In this way?

table A

keywords gototableB

Arsenal www.arsenal.com, www.football.com....
Girl www.nicegirl.com...
Liverpol www.arsenal.com, www.football.com....
Nice www.nicegirl.com, www.turism.com
Roma www.turism.com
Sao Paulo www.arsenal.com, www.football.com....
Spice www.nicegirl.com, www.turism.com


table B

www.arsenal.com
www.nicegirl.com
www.football.com
www.nicegirl.com
www.football.com
www.turism.com

Certainly not in this way. Right?

thanks

Luigi

Daemonspyre
09-18-2007, 01:36 PM
You are close, but you are not thinking like a database. Remember that databases are relational, and need to be 'normalized'. You need not have any replicated data where you do not have to have it.

There are a couple ways of doing it, and I will list them below:

Version 1 -- Similar to your idea, but with a more 'database' point of view.



Table 1 - Sites
--------------------------
Key Site
1 www.arsenal.com
2 www.nicegirl.com
3 www.football.com
4 www.turism.com


Table 2 - Keywords
--------------------------
Key Keyword SiteID
1 Arsenal 1
2 Arsenal 3
3 Girl 5
4 Liverpol 1
5 Liverpol 3
6 Nice 2
7 Nice 4
8 Roma 4
9 Sao Paulo 1
10 Sao Paulo 3
11 Spice 2
12 Spice 4


Version 2 -- Similar again, but with 3 tables. This is a little more dynamic and a lot more 'normalized'.



Table 1 - Sites
--------------------------
Key Site
1 www.arsenal.com
2 www.nicegirl.com
3 www.football.com
4 www.turism.com


Table 2 - Keywords
--------------------------
Key Keyword
1 Arsenal
2 Girl
3 Liverpol
4 Nice
5 Roma
6 Sao Paulo
7 Spice

Table 3 - siteKeywords
--------------------------
Key KeyID SiteID
1 1 1
2 1 3
3 2 5
4 3 1
5 3 3
6 4 2
7 4 4
8 5 4
9 6 1
10 6 3
11 7 2
12 7 4


Version 3 -- Totally different, a lot harder programatically (to code), and almost totally based on scripts (not recommended for the feint of heart)



Table 1 - Sites
--------------------------
Key Site KeyID
1 www.arsenal.com 1,3,6
2 www.nicegirl.com 4,7
3 www.football.com 1,3,6
4 www.turism.com 4,7


Table 2 - Keywords
--------------------------
Key Keyword
1 Arsenal
2 Girl
3 Liverpol
4 Nice
5 Roma
6 Sao Paulo
7 Spice


There are a few ideas for you. Honestly, if you are not very comfortable with databases and programming, or if you are a novice, I would go with option 2. Option 3 is for the experienced, and option 1 is for beginners.

HTH!

luigicannavaro
09-18-2007, 02:27 PM
Thank you.

I am tired of "theory" I want to see "practice" and "real code" not "infinitesimal calcul". The solution 1 it is trivial. Certainly between solution 2 and 3 must have a great difference in speed and storage. I think I have a similar code for doing the option 3 in my shares!


Best for you.

Luigi

luigicannavaro
10-18-2007, 03:25 PM
Daemonspyre,

I have decided reopen this thread because someone did a question - in true as ASK - and I am in doubt.

Taking into account your model for TABLE1 AND TABLE2



Table 1 - Sites
--------------------------
Key Site
1 www.arsenal.com
2 www.nicegirl.com
3 www.football.com
4 www.turism.com


Table 2 - Keywords
--------------------------
Key Keyword SiteID
1 Arsenal 1
2 Arsenal 3
3 Girl 5
4 Liverpol 1
5 Liverpol 3
6 Nice 2
7 Nice 4
8 Roma 4
9 Sao Paulo 1
10 Sao Paulo 3
11 Spice 2
12 Spice 4

Now, my question:

If I have a 3rd table with this values:


Table 3 - Source Table
=================
1 Arsenal Liverpool
2 Roma Sao Paulo
3 Nice girl

How do you would did for searching the values of SOURCE TABLE (table3) in keywords (table2) for finding at once, for example: "Arsenal Liverpool" if in table2 these values are in different records? good question?

Best

Luigi



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum