Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder luigicannavaro's Avatar
    Join Date
    Aug 2007
    Posts
    150
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Question The Anatomy of Web Search

    Hi friends,

    I as reading (finally) the article of the creators of Google:
    http://infolab.stanford.edu/pub/papers/google.pdf

    Could you explain me as it is possible index each webpage, generating a list of keywords and tie these keywords with each webpage?

    For example:

    The word: "Market". We imagine there are 1000 pages on web where exists the word "market".

    In my "ignorance" I would create a database with two columns:
    "keywords" and "URL". But IF the word "Market" is found in 1000 pages I would do 1000 records with keyword = "market" + its "URL" ?

    Certainly no.

    How do you would "link" the keyword "market" with each page where this keyword is found?

    Thank you for "setal" your time!

    Luigi

  • #2
    Regular Coder
    Join Date
    Mar 2007
    Posts
    505
    Thanks
    1
    Thanked 19 Times in 19 Posts
    2 tables would be the way to do this.

    Table 1 would be the listing of URLs.

    Table 2 would be listings of keywords with a field that relates back to Table 1.

    1000 rows is nothing to any decent RDBMS.

    And setal == bristles or thick hairs... Not sure I get the pun... :\
    To say my fate is not tied to your fate is like saying, 'Your end of the boat is sinking.' -- Hugh Downs
    Please, if you found my post helpful, pay it forward. Go and help someone else today.

  • #3
    Regular Coder luigicannavaro's Avatar
    Join Date
    Aug 2007
    Posts
    150
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Yes. 1000 it is a modest example!

    But how would be the "configuration" of this table. In this way?

    table A

    keywords gototableB

    Arsenal www.arsenal.com, www.football.com....
    Girl www.nicegirl.com...
    Liverpol www.arsenal.com, www.football.com....
    Nice www.nicegirl.com, www.turism.com
    Roma www.turism.com
    Sao Paulo www.arsenal.com, www.football.com....
    Spice www.nicegirl.com, www.turism.com


    table B

    www.arsenal.com
    www.nicegirl.com
    www.football.com
    www.nicegirl.com
    www.football.com
    www.turism.com

    Certainly not in this way. Right?

    thanks

    Luigi
    Last edited by luigicannavaro; 09-18-2007 at 08:21 AM.

  • #4
    Regular Coder
    Join Date
    Mar 2007
    Posts
    505
    Thanks
    1
    Thanked 19 Times in 19 Posts
    You are close, but you are not thinking like a database. Remember that databases are relational, and need to be 'normalized'. You need not have any replicated data where you do not have to have it.

    There are a couple ways of doing it, and I will list them below:

    Version 1 -- Similar to your idea, but with a more 'database' point of view.

    Code:
    Table 1 - Sites
    --------------------------
    Key     Site
    1	www.arsenal.com
    2	www.nicegirl.com
    3	www.football.com
    4	www.turism.com
    
    
    Table 2 - Keywords
    --------------------------
    Key     Keyword		SiteID
    1	Arsenal		1
    2	Arsenal		3
    3	Girl		5
    4	Liverpol	1
    5	Liverpol	3
    6	Nice		2
    7	Nice		4
    8	Roma		4
    9	Sao Paulo	1
    10	Sao Paulo	3
    11	Spice		2
    12	Spice		4
    Version 2 -- Similar again, but with 3 tables. This is a little more dynamic and a lot more 'normalized'.

    Code:
    Table 1 - Sites
    --------------------------
    Key     Site
    1	www.arsenal.com
    2	www.nicegirl.com
    3	www.football.com
    4	www.turism.com
    
    
    Table 2 - Keywords
    --------------------------
    Key     Keyword
    1	Arsenal
    2	Girl
    3	Liverpol
    4	Nice
    5	Roma
    6	Sao Paulo
    7	Spice
    
    Table 3 - siteKeywords
    --------------------------
    Key     KeyID	SiteID
    1	1	1
    2	1	3
    3	2	5
    4	3	1
    5	3	3
    6	4	2
    7	4	4
    8	5	4
    9	6	1
    10	6	3
    11	7	2
    12	7	4
    Version 3 -- Totally different, a lot harder programatically (to code), and almost totally based on scripts (not recommended for the feint of heart)

    Code:
    Table 1 - Sites
    --------------------------
    Key     Site			KeyID
    1	www.arsenal.com		1,3,6
    2	www.nicegirl.com	4,7
    3	www.football.com	1,3,6
    4	www.turism.com		4,7
    
    
    Table 2 - Keywords
    --------------------------
    Key     Keyword
    1	Arsenal
    2	Girl
    3	Liverpol
    4	Nice
    5	Roma
    6	Sao Paulo
    7	Spice
    There are a few ideas for you. Honestly, if you are not very comfortable with databases and programming, or if you are a novice, I would go with option 2. Option 3 is for the experienced, and option 1 is for beginners.

    HTH!
    To say my fate is not tied to your fate is like saying, 'Your end of the boat is sinking.' -- Hugh Downs
    Please, if you found my post helpful, pay it forward. Go and help someone else today.

  • #5
    Regular Coder luigicannavaro's Avatar
    Join Date
    Aug 2007
    Posts
    150
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Thank you.

    I am tired of "theory" I want to see "practice" and "real code" not "infinitesimal calcul". The solution 1 it is trivial. Certainly between solution 2 and 3 must have a great difference in speed and storage. I think I have a similar code for doing the option 3 in my shares!


    Best for you.

    Luigi

  • #6
    Regular Coder luigicannavaro's Avatar
    Join Date
    Aug 2007
    Posts
    150
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Daemonspyre,

    I have decided reopen this thread because someone did a question - in true as ASK - and I am in doubt.

    Taking into account your model for TABLE1 AND TABLE2

    Table 1 - Sites
    --------------------------
    Key Site
    1 www.arsenal.com
    2 www.nicegirl.com
    3 www.football.com
    4 www.turism.com


    Table 2 - Keywords
    --------------------------
    Key Keyword SiteID
    1 Arsenal 1
    2 Arsenal 3
    3 Girl 5
    4 Liverpol 1
    5 Liverpol 3
    6 Nice 2
    7 Nice 4
    8 Roma 4
    9 Sao Paulo 1
    10 Sao Paulo 3
    11 Spice 2
    12 Spice 4
    Now, my question:

    If I have a 3rd table with this values:
    Table 3 - Source Table
    =================
    1 Arsenal Liverpool
    2 Roma Sao Paulo
    3 Nice girl
    How do you would did for searching the values of SOURCE TABLE (table3) in keywords (table2) for finding at once, for example: "Arsenal Liverpool" if in table2 these values are in different records? good question?

    Best

    Luigi


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •