...

View Full Version : set crawl depth on crawler



cosmicsea
03-25-2010, 10:15 AM
hi im working on making a crawler "just for fun" and right now i have it so it crawls all img tags. anyway what im trying to do is set a crawl depth so that the crawler will crawl more than just the first page that is set. can someone help me with this? thanks.



<?php
mysql_connect("localhost", "name", "pass") or die(mysql_error());
mysql_select_db("data") or die(mysql_error());

$url = "http://domain.com";
$data = file_get_contents($url);

preg_match_all("/<img[^>]*>/", $data, $match);
$list = $match[0];

foreach($match[0] as $list){
mysql_query("INSERT INTO links
(url) VALUES('$list')")
or die(mysql_error());
}
echo '<pre>.';
print_r($match);


?>

mic2100
03-25-2010, 04:05 PM
hi,

basically if u want a script to crawl and locate all the images on a site u might want to build another script to go through that site and collect all the links first. then u change the code u have to goto each of the links and collect the image data. I made sumthing simlar to this for crawling sites and collecting other info, the best way to do it is once u have the other script set up make sure that it stores any links in finds in the database (u may need to run this a few times and make sure that u include text in ur regexp to prevent it collecting external links or you will end up indexing 100000's of pages). Then you only need to do a mysql query and loop though each of the pages and collect the data u require (images in your case).

I might be able to post some code for this once i get home in a few hours. but generally its as bad to build as it sounds.

cosmicsea
03-25-2010, 07:58 PM
hi,

basically if u want a script to crawl and locate all the images on a site u might want to build another script to go through that site and collect all the links first. then u change the code u have to goto each of the links and collect the image data. I made sumthing simlar to this for crawling sites and collecting other info, the best way to do it is once u have the other script set up make sure that it stores any links in finds in the database (u may need to run this a few times and make sure that u include text in ur regexp to prevent it collecting external links or you will end up indexing 100000's of pages). Then you only need to do a mysql query and loop though each of the pages and collect the data u require (images in your case).

I might be able to post some code for this once i get home in a few hours. but generally its as bad to build as it sounds.

yeah that sounds like a good idea. I also wouldn't mind seeing some code from you also if you want. thanks for your reply.

MattF
03-25-2010, 08:15 PM
There are some spider scripts out there already. Have you not thought of possibly using one of those, or are your needs fairly specific?

This is one example:

http://www.sphider.eu/

cosmicsea
03-25-2010, 08:43 PM
There are some spider scripts out there already. Have you not thought of possibly using one of those, or are your needs fairly specific?

This is one example:

http://www.sphider.eu/

well im not looking for anything too specific. I just got bored and wanted to create a image crawler but i want it to crawl more than 1 page. I have seen sphider before but never tried it. i will download it and have a look and see what i can do with it. As long as i can get it to index images then im happy. I want to make a image search engine just for fun and for learning purposes. thanks for mentioning about it. :thumbsup:



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum