Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    Regular Coder
    Join Date
    Jan 2011
    Posts
    120
    Thanks
    6
    Thanked 2 Times in 2 Posts

    PHP HTML scraper

    1) Project Details: I have a text file stored that has every link in my database. There are 5 versions of each link. I need an html scraper that goes through each link and checks to see if the video is still embedded.

    2) Payment method/ details: Paypal

    PM me with any questions, Thanks!

  • #2
    Supreme Master coder! _Aerospace_Eng_'s Avatar
    Join Date
    Dec 2004
    Location
    In a place far, far away...
    Posts
    19,291
    Thanks
    2
    Thanked 1,043 Times in 1,019 Posts
    So you have a list of urls where the videos might be embedded and then you need to check for the embed code? Did you give your users a certain code to use for embedding? Can you post a sample of your text file? Feel free to remove the urls and replace them with domain.com or something.
    ||||If you are getting paid to do a job, don't ask for help on it!||||

  • #3
    Regular Coder
    Join Date
    Jan 2011
    Posts
    120
    Thanks
    6
    Thanked 2 Times in 2 Posts
    I use this code to print all links from my database to a text file.

    PHP Code:
    <?php
    require_once('mysql_connect.php');    // connect to the database

    $movie_list 'List of all movie links<hr /><br /><table>';
    $sql mysql_query("SELECT movie_id, title, version1, version1_source, version2, version2_source, version3, version3_source, version4, version4_source, version5, version5_source FROM movies ORDER BY movie_id");
    while(
    $row mysql_fetch_array($sql)){
    $movie_id $row['movie_id'];
    $title $row['title'];
    $version1 $row['version1'];
    $version1_source $row['version1_source'];
    $version2 $row['version2'];
    $version2_source $row['version2_source'];
    $version3 $row['version3'];
    $version3_source $row['version3_source'];
    $version4 $row['version4'];
    $version4_source $row['version4_source'];
    $version5 $row['version5'];
    $version5_source $row['version5_source'];
    $movie_list .= '
    <tr>
    <td>' 
    $movie_id '</td><td>' $title '</td><td>' $version1 '</td><td>' $version1_source '</td><td>' $version2 '</td><td>' $version2_source '</td><td>' $version3 '</td><td>' $version3_source '</td><td>' $version4 '</td><td>' $version4_source '</td><td>' $version5 '</td><td>' $version5_source '</td></tr>';
    }

    $movie_list .= '</table>';

    ?>

    <html>
    <?php echo $movie_list?>

    </html>
    Then the beginning of what I've got already to check each of the links for the video player:

    PHP Code:
    <html>
    <table>
    <?php
    $page
    ="http://xxxxxxxxxxx.php";   

    # INITIATE CURL.    
    $curl curl_init();    
    file_get_contents('$page');
    # CURL SETTINGS.    
    curl_setopt($curlCURLOPT_URL,"$page");    
    curl_setopt($curlCURLOPT_RETURNTRANSFER1);    
    curl_setopt($curlCURLOPT_CONNECTTIMEOUT0);    

    # GRAB THE FILE.    
    $html curl_exec($curl);    

    # CLOSE CURL.   
    curl_close($curl);    

    exit; 

    echo 
    $page;
    preg_match_all(   
        
    '/   
    <tr>   
    <td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><td>(.*?)<\/td><\/tr>/s'
    ,   
        
    $page,   
        
    $posts// will contain the blog posts   
        
    PREG_SET_ORDER // formats data into an array of posts   
    );  

    foreach (
    $posts as $post) {
        
    $movie_id $post[1];
        
    $title $post[2];
        
    $version1 $post[3];
        
    $version1_source $post[4];
        
    $version2 $post[5];
        
    $version2_source $post[6];
        
    $version3 $post[7];
        
    $version3_source $post[8];
        
    $version4 $post[9];
        
    $version4_source $post[10];
        
    $version5 $post[11];
        
    $version5_source $post[12];

        
    // do something with data
    I want it to check each link in database to see if the embedded video player is still on the other side of the link, and if it is not, echo out the link, version, id, etc. Sorry if I'm doing a bad job explaining what I need. I don't understand cURL well at all.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •