View Full Version : PHP HTML scraper

10-18-2011, 04:22 AM
1) Project Details: I have a text file stored that has every link in my database. There are 5 versions of each link. I need an html scraper that goes through each link and checks to see if the video is still embedded.

2) Payment method/ details: Paypal

PM me with any questions, Thanks!

10-18-2011, 04:48 AM
So you have a list of urls where the videos might be embedded and then you need to check for the embed code? Did you give your users a certain code to use for embedding? Can you post a sample of your text file? Feel free to remove the urls and replace them with domain.com or something.

10-18-2011, 05:13 AM
I use this code to print all links from my database to a text file.

require_once('mysql_connect.php'); // connect to the database

$movie_list = 'List of all movie links<hr /><br /><table>';
$sql = mysql_query("SELECT movie_id, title, version1, version1_source, version2, version2_source, version3, version3_source, version4, version4_source, version5, version5_source FROM movies ORDER BY movie_id");
while($row = mysql_fetch_array($sql)){
$movie_id = $row['movie_id'];
$title = $row['title'];
$version1 = $row['version1'];
$version1_source = $row['version1_source'];
$version2 = $row['version2'];
$version2_source = $row['version2_source'];
$version3 = $row['version3'];
$version3_source = $row['version3_source'];
$version4 = $row['version4'];
$version4_source = $row['version4_source'];
$version5 = $row['version5'];
$version5_source = $row['version5_source'];
$movie_list .= '
<td>' . $movie_id . '</td><td>' . $title . '</td><td>' . $version1 . '</td><td>' . $version1_source . '</td><td>' . $version2 . '</td><td>' . $version2_source . '</td><td>' . $version3 . '</td><td>' . $version3_source . '</td><td>' . $version4 . '</td><td>' . $version4_source . '</td><td>' . $version5 . '</td><td>' . $version5_source . '</td></tr>';

$movie_list .= '</table>';


<?php echo $movie_list; ?>


Then the beginning of what I've got already to check each of the links for the video player:


$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"$page");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 0);

$html = curl_exec($curl);



echo $page;
$posts, // will contain the blog posts
PREG_SET_ORDER // formats data into an array of posts

foreach ($posts as $post) {
$movie_id = $post[1];
$title = $post[2];
$version1 = $post[3];
$version1_source = $post[4];
$version2 = $post[5];
$version2_source = $post[6];
$version3 = $post[7];
$version3_source = $post[8];
$version4 = $post[9];
$version4_source = $post[10];
$version5 = $post[11];
$version5_source = $post[12];

// do something with data

I want it to check each link in database to see if the embedded video player is still on the other side of the link, and if it is not, echo out the link, version, id, etc. Sorry if I'm doing a bad job explaining what I need. I don't understand cURL well at all.

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum