...

View Full Version : checking if RSS files still exist



Bobafart
10-26-2009, 04:17 AM
Hi, I have a mySQL table that stores RSS/XML file links around the web

I am trying to make a script that I can run every now and then to determine which XML RSS files are still alive and which are dead links.

Having trouble making this script since my IF ELSE logic is failing.

I am currently using the following code:



$sql="SELECT id,sitetitle,sitelink,rssfile,rssdescription FROM myTable ORDER BY id DESC LIMIT 500";
$result=mysql_query($sql);
if($result){
while($row=mysql_fetch_array($result)){
echo '<p>RSS Name: <a href="'.$row[sitelink].'"><span style="font-size:1.4em;">'.$row[sitetitle].'</span></a>\'s id is: '.$row[id].'</p>';
echo '<p>- <a href="'.$row[rssfile].'"><span style="font-size:1em;">'.$row[rssfile].'</span></a></p>';
if(file_exists($row[rssfile])){
}else{
echo '<p>RSS FILE DOES NOT EXIST</p>';
}
echo '<p>'.$row[rssdescription].'</p>';
echo '<p><hr 80%></p>';
}
}

When I run this script the RSS FILE DOES NOT EXIST outputs for every RSS file (even the ones that do exist)

How can I create a script that checks to see other people's RSS files are alive or not?

oesxyl
10-26-2009, 05:29 AM
few steps:
- send a HEAD request to the url of rss, I suppose is $row['sitelink']
- check if answer is 200 Ok, if not the link is dead
- if link is alive check if your local file is older then remote file, if not download
last step is to avoid using bandwidth without reason for both you and the rss provider.

You still have the habit to not quote the indexes in array, :)

best regards

Bobafart
10-26-2009, 12:24 PM
Hi oesyxl

thank for posting. Hope all is well.

Never made a HEAD request before. I googled it and don't really get it. Can you please show me how?

oesxyl
10-26-2009, 03:23 PM
Hi oesyxl

thank for posting. Hope all is well.
you are always welcome, :)


Never made a HEAD request before. I googled it and don't really get it. Can you please show me how?
is something like that:

http://www.smart-it-consulting.com/article.htm?node=133&page=36

best regards

Bobafart
11-01-2009, 11:02 PM
why can't I just use if(file_exists($rssurl)) where $rssurl is the URL of the RSS file?

why send a head request?

Bobafart
11-01-2009, 11:08 PM
I am trying to use php's cURL to see if the RSS files exist or not.

$rssfile is the var I use which contains the URL to each RSS file (in a while loop after querying the db):


$ch = curl_init();

// set url
curl_setopt($ch, CURLOPT_URL, $rssurl);

//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// $output contains the output string
$output = curl_exec($ch);

// close curl resource to free up system resources
curl_close($ch);

Now how do I test to see if the file URL exists or not using a cURL function?

Bobafart
11-01-2009, 11:11 PM
what if I did this as my check?



// check to see if RSS File exists
if(curl_exec($ch) === false){

echo 'error: file does not exist';

}


is this considered a good way to address the problem?

oesxyl
11-02-2009, 07:08 AM
why can't I just use if(file_exists($rssurl)) where $rssurl is the URL of the RSS file?

why send a head request?
as far as I know file_exists don't work with a url, only with files and directory on your computer.
head request is only few bytes, like a request to ask the server if the resourse exists but not to get if. After the server response you know what to do, to get the rss or not.
It save bandwidth for you and the rss provider, :)

best regards

oesxyl
11-02-2009, 10:58 AM
I am trying to use php's cURL to see if the RSS files exist or not.

$rssfile is the var I use which contains the URL to each RSS file (in a while loop after querying the db):


$ch = curl_init();

// set url
curl_setopt($ch, CURLOPT_URL, $rssurl);

//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// $output contains the output string
$output = curl_exec($ch);

// close curl resource to free up system resources
curl_close($ch);

Now how do I test to see if the file URL exists or not using a cURL function?


what if I did this as my check?



// check to see if RSS File exists
if(curl_exec($ch) === false){

echo 'error: file does not exist';

}


is this considered a good way to address the problem?
I don't use php curl extension but I will try to write something and I will come back when I finish with the code.

best regards

oesxyl
11-04-2009, 01:09 AM
<?php

/*
* Curl setings
*/
// user agent, put yours here
$useragent = $_SERVER['HTTP_USER_AGENT'];

/*
* Feed setings
*/
// feed url
$url = '........';
// absolute path to the file where we store the feed
$oldfeed = '.....';

/*
* do a HEAD request
*/
$ch = curl_init();
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_USERAGENT,$useragent);
curl_setopt($ch,CURLOPT_HEADER,true);
curl_setopt($ch,CURLOPT_CUSTOMREQUEST,'HEAD');
$headreq = curl_exec($ch);
curl_close($ch);

/*
* must be 200 Ok and Last-Modified must be newer then mtime of $oldfeed
*/
function filter_lines($fld){ return preg_match("/^(HTTP|Last-Modified)/",$fld); }

$lines = explode("\n",$headreq);
$rest = array_filter($lines,"filter_lines");
$status = array_shift($rest);
if(preg_match("/200\s+OK/i",$status)){
$lastmod = array_shift($rest);
$modified = preg_replace("/Last-Modified:\s+(.+)$/i","\\1",$lastmod);
$timestamp = strtotime($modified);
if((file_exists($oldfeed) && filemtime($oldfeed) < $timestamp) ||
!file_exists($oldfeed)){
/*
* Download the feed in a given file
*/
$ch = curl_init();
$fh = fopen($oldfeed,'w+');
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_USERAGENT,$useragent);
curl_setopt($ch,CURLOPT_FILE,$fh);
curl_exec($ch);
curl_close($ch);
fclose($fh);
print "Feed downloaded\n";
}else{
print "Feed is already updated\n";
}
}else{
print $status."\n";
}

?>


best regards



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum