Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 2 of 2
  1. #1
    New to the CF scene
    Join Date
    Aug 2009
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Web Crawler Project

    Hi! I am currently a college student of Computer Science and our thesis is about Web Crawlers. We want to develop Ordinary Web Crawlers to Hidden Web Crawlers. However, we are having a hard time trying to locate opensource crawlers that would actually "work".

    We have found many open source on SourceForge.net on Java platform. However, we tried to run them using their codes and viola! Can't run it. We resorted to finding any working crawlers on any platform since Java was a little too complicated for us. We only have a month left on developing the crawlers and need assistance as much as possible.

    Please if any of you have tips on how to make a web crawler from scratch, or know a working open source web crawler under Java, VB, C++ or C#, it would be greatly appreciated.

  • #2
    Senior Coder TheShaner's Avatar
    Join Date
    Sep 2005
    Location
    Orlando, FL
    Posts
    1,126
    Thanks
    2
    Thanked 40 Times in 40 Posts
    Not to be mean, but what you state above does not add up at all. The reason I say this is because I graduated with a Bachelors degree in CS 5 years ago. I realize courses and difficulty vary greatly among universities and countries, but even taking that into consideration, no CS student should ever be working on a "thesis" (which typically is reserved for graduate studies in the States) for "developing" (creating, not copying) a web crawler, yet finding Java too complicated. Period. Sorry. Even if the crawler you got from SourceForge does not work, you should at least have a decent template to work from and get it working, because, after all, you're supposed to be developing it and you're supposed to be CS students. I suggest finding an easier "thesis" or searching a bit more efficiently for your web crawler to "develop".

    Go to Wikipedia's article, scroll to the bottom, and see a nice, big list of open-source web crawlers. Wikipedia even has their bot listed that they use for Wikiasearch called GRUB.

    -Shane


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •