Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    Regular Coder
    Join Date
    Jan 2010
    Location
    Washington
    Posts
    223
    Thanks
    34
    Thanked 0 Times in 0 Posts

    set crawl depth on crawler

    hi im working on making a crawler "just for fun" and right now i have it so it crawls all img tags. anyway what im trying to do is set a crawl depth so that the crawler will crawl more than just the first page that is set. can someone help me with this? thanks.


    PHP Code:
    <?php
    mysql_connect
    ("localhost""name""pass") or die(mysql_error());
    mysql_select_db("data") or die(mysql_error());

    $url  "http://domain.com";
    $data file_get_contents($url); 

    preg_match_all("/<img[^>]*>/"$data$match);
    $list $match[0];

    foreach(
    $match[0] as $list){
    mysql_query("INSERT INTO links
    (url) VALUES('$list')"
    )
    or die(
    mysql_error()); 
    }
    echo 
    '<pre>.';
    print_r($match);


    ?>

  • #2
    Regular Coder mic2100's Avatar
    Join Date
    Feb 2006
    Location
    Scunthorpe
    Posts
    562
    Thanks
    15
    Thanked 28 Times in 27 Posts
    hi,

    basically if u want a script to crawl and locate all the images on a site u might want to build another script to go through that site and collect all the links first. then u change the code u have to goto each of the links and collect the image data. I made sumthing simlar to this for crawling sites and collecting other info, the best way to do it is once u have the other script set up make sure that it stores any links in finds in the database (u may need to run this a few times and make sure that u include text in ur regexp to prevent it collecting external links or you will end up indexing 100000's of pages). Then you only need to do a mysql query and loop though each of the pages and collect the data u require (images in your case).

    I might be able to post some code for this once i get home in a few hours. but generally its as bad to build as it sounds.

  • #3
    Regular Coder
    Join Date
    Jan 2010
    Location
    Washington
    Posts
    223
    Thanks
    34
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mic2100 View Post
    hi,

    basically if u want a script to crawl and locate all the images on a site u might want to build another script to go through that site and collect all the links first. then u change the code u have to goto each of the links and collect the image data. I made sumthing simlar to this for crawling sites and collecting other info, the best way to do it is once u have the other script set up make sure that it stores any links in finds in the database (u may need to run this a few times and make sure that u include text in ur regexp to prevent it collecting external links or you will end up indexing 100000's of pages). Then you only need to do a mysql query and loop though each of the pages and collect the data u require (images in your case).

    I might be able to post some code for this once i get home in a few hours. but generally its as bad to build as it sounds.
    yeah that sounds like a good idea. I also wouldn't mind seeing some code from you also if you want. thanks for your reply.

  • #4
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    There are some spider scripts out there already. Have you not thought of possibly using one of those, or are your needs fairly specific?

    This is one example:

    http://www.sphider.eu/

  • Users who have thanked MattF for this post:

    cosmicsea (03-25-2010)

  • #5
    Regular Coder
    Join Date
    Jan 2010
    Location
    Washington
    Posts
    223
    Thanks
    34
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by MattF View Post
    There are some spider scripts out there already. Have you not thought of possibly using one of those, or are your needs fairly specific?

    This is one example:

    http://www.sphider.eu/
    well im not looking for anything too specific. I just got bored and wanted to create a image crawler but i want it to crawl more than 1 page. I have seen sphider before but never tried it. i will download it and have a look and see what i can do with it. As long as i can get it to index images then im happy. I want to make a image search engine just for fun and for learning purposes. thanks for mentioning about it.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •