Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 2 of 2
Thread: Web Crawler Project
08-06-2009, 11:54 AM #1
- Join Date
- Aug 2009
- Thanked 0 Times in 0 Posts
Web Crawler Project
Hi! I am currently a college student of Computer Science and our thesis is about Web Crawlers. We want to develop Ordinary Web Crawlers to Hidden Web Crawlers. However, we are having a hard time trying to locate opensource crawlers that would actually "work".
We have found many open source on SourceForge.net on Java platform. However, we tried to run them using their codes and viola! Can't run it. We resorted to finding any working crawlers on any platform since Java was a little too complicated for us. We only have a month left on developing the crawlers and need assistance as much as possible.
Please if any of you have tips on how to make a web crawler from scratch, or know a working open source web crawler under Java, VB, C++ or C#, it would be greatly appreciated.
08-06-2009, 06:38 PM #2
- Join Date
- Sep 2005
- Orlando, FL
- Thanked 40 Times in 40 Posts
Not to be mean, but what you state above does not add up at all. The reason I say this is because I graduated with a Bachelors degree in CS 5 years ago. I realize courses and difficulty vary greatly among universities and countries, but even taking that into consideration, no CS student should ever be working on a "thesis" (which typically is reserved for graduate studies in the States) for "developing" (creating, not copying) a web crawler, yet finding Java too complicated. Period. Sorry. Even if the crawler you got from SourceForge does not work, you should at least have a decent template to work from and get it working, because, after all, you're supposed to be developing it and you're supposed to be CS students. I suggest finding an easier "thesis" or searching a bit more efficiently for your web crawler to "develop".
Go to Wikipedia's article, scroll to the bottom, and see a nice, big list of open-source web crawlers. Wikipedia even has their bot listed that they use for Wikiasearch called GRUB.