Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    New Coder
    Join Date
    Jan 2007
    Posts
    57
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Keyword density script

    Hi,

    I was wondering if you can point me into some directions. I'd like to create a keyword density checking script like this one: http://www.seochat.com/seo-tools/keyword-density/

    I was wondering maybe there are some ready-made ones or some modules to help me with my task.

    Main problems I see are how to determine what is actual text and not html tags, and then how to set it to find density for 2-3 word phrases, like in the above example.

    All help is greatly appreciated! Thank you!

  • #2
    New Coder
    Join Date
    May 2009
    Posts
    55
    Thanks
    1
    Thanked 4 Times in 4 Posts
    I don't really know of any sources but this one is a pretty simple program to code. I would use python along with regular expression and urllib2 module to do the job. You can easily determine what is html tags or text using the re(regular expression module) and use urllib2 as a crawler to the page.

  • #3
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,871
    Thanks
    2
    Thanked 164 Times in 159 Posts
    No need to use Python when Perl has plenty of modules designed for HTML retrieval and parsing. Using a regex to parse HTML is very fragile and can easily brake. It's better to use an HTML parser.

    The most often used modules for this type of tack are:
    LWP
    http://search.cpan.org/~gaas/libwww-...826/lib/LWP.pm

    LWP::UserAgent
    http://search.cpan.org/~gaas/libwww-...P/UserAgent.pm

    LWP::Simple
    http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm

    HTML::Parser
    http://search.cpan.org/search%3fmodule=HTML::Parser

    HTML::HeadParser
    http://search.cpan.org/search%3fmodule=HTML::HeadParser

  • #4
    Senior Coder
    Join Date
    Mar 2006
    Posts
    1,274
    Thanks
    2
    Thanked 39 Times in 38 Posts
    There is also HTML::Strip although I have never used it and I'm not sure if its an actual parser.

  • #5
    New Coder
    Join Date
    Jan 2007
    Posts
    57
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Thanks guys - that's a nice start. I'll see them and report on my progress.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •