Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    New to the CF scene
    Join Date
    May 2009
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Fetch URLs from HTML page

    Hello everybody,

    I need your help to fetch URLs from an HTML page. I have HTML of a page in a string variable and I need to fetch all the URLs of images, css, javascript etc. URL can be "http://www.abc.com/images/myimage.jpg" or "images/myimage.jpg" or "myimages/myimage.jpg" or "style/style.css" etc.

    Is there a way to do this with PHP?

    Thanks.

  2. #2
    Senior Coder timgolding's Avatar
    Join Date
    Aug 2006
    Location
    Southampton
    Posts
    1,519
    Thanks
    114
    Thanked 110 Times in 109 Posts
    You need an XML parser or the likes.
    You can not say you know how to do something, until you can teach it to someone else.

  3. #3
    Supreme Master coder! _Aerospace_Eng_'s Avatar
    Join Date
    Dec 2004
    Location
    In a place far, far away...
    Posts
    19,291
    Thanks
    2
    Thanked 1,043 Times in 1,019 Posts
    If you have php 5.1 you can use this
    PHP Code:
    <?php
    $html 
    file_get_contents('filename.html'); 

    $dom = new DOMDocument();
    @
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    // get href from <a> element
    $hrefs $xpath->evaluate("/html/body//a");
    for(
    $i 0$i $hrefs->length$i++)
    {
    $href $hrefs->item(i);
    $url $href->getAttribute('href');
    echo 
    $url.'<br>';
    }
    // get stylesheet href
    $links $xpath->evaluate("/html//link");
    for(
    $i 0$i $links->length$i++)
    {
    $link $links->item(i);
    $url $link->getAttribute('href');
    echo 
    $url.'<br>';
    }
    // get script src
    $scripts $xpath->evaluate("/html//script");
    for(
    $i 0$i $scripts->length$i++)
    {
    $script $scripts->item(i);
    $url $script->getAttribute('src');
    echo 
    $url.'<br>';
    }
    // get all img src
    $imgs $xpath->evaluate("/html/body//img");
    for(
    $i 0$i $imgs->length$i++)
    {
    $img $imgs->item(i);
    $url $img->getAttribute('src');
    echo 
    $url.'<br>';
    }
    ?>
    There may be an easier way but this way should work.
    Last edited by _Aerospace_Eng_; May 28th, 2009 at 03:52 PM.


 

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •