...

View Full Version : RSS newsfeed and caching the content



bcarl314
09-23-2003, 03:31 PM
I'm signing up for various newsfeeds and they keep telling me that I should cache the content for a couple of hours before making a request to the server.

I'm currently using the code I posted in this (http://www.codingforums.com/showthread.php?s=&threadid=26012) thread.

Is there a way to cache the content? Of do I need to write a function that
1) Checks to see when the last request was,
2) If the request is over 2 hours, get the rss feed and update the last request, else read an XML file which was stored from the last successful request

?

I'd really like to find some way to cache the content , rather than write a new function to handle the above. Taking the Larry Wall approach to this one.

Íkii
09-23-2003, 04:38 PM
The way I run my rss pulls is through a database of feeds each with a retry duration - cron runs the script every 30 mins and updates the feeds where the retry has expired through a simple filemtime call.

I save the parsed xml as html blocks and save that on my server so I don't have to double process it (am not overly au-fait with running stylesheeted xml so opted for plain html)

A simplified version of the file I use is...


<?php
class rss_compile
{
var $ct = ""; // current tag
var $f = 0; // on/off flag
var $ry = array(); // output array
var $dx = -1; // array index

function strt($pa, $n, $a="")
{
// check current tag - if 'item' set flag and increment index
$this->ct = $n;
if($n == "ITEM")
{
$this->f = 1;
$this->dx++;
}
}
function dat($pa, $d)
{
// get data from line if within 'item' block
if($this->f == 1)
{
$d = trim(strip_tags($d));
$this->ry[$this->dx][strtolower($this->ct)] .= $d;
}
}
function nd($pa, $n, $a="")
{
// reset flag when exiting 'item' block
if($n == "ITEM")
{
$this->f = 0;
}
}
function set_params($file, $sc)
{
$this->file = $file;
$this->sc = ($sc == "yes") ? "yes" : "no";
$this->xp = xml_parser_create();
xml_set_object($this->xp, $this);
xml_set_element_handler($this->xp, "strt", "nd");
xml_set_character_data_handler($this->xp, "dat");
xml_parser_set_option($this->xp, XML_OPTION_CASE_FOLDING, TRUE);
xml_parser_set_option($this->xp, XML_OPTION_SKIP_WHITE, TRUE);
if (!($fp = fopen($this->file, "r")))
{
die("Could not read $this->file");
}
while ($xr = fread($fp, 4096))
{
if (!xml_parse($this->xp, $xr, feof($fp)))
{
$log_error = mysql_query("INSERT INTO `glitches` (glitch_id, glitch_ref, glitch_notes) VALUES('','RSS PARSING ERROR - ".$this->file."','".xml_error_string(xml_get_error_code($this->xp))."')");
}
}
xml_parser_free($this->xp);
return $this->ry;
}
}

// the editable stuff
//
//
// array remote file, savename, header text

$rss_data = array();
$rss_data[] = array ( 'http://.................news.rss', 'nme.inc', 'News from NME');
$rss_data[] = array ( 'http://.................more.rdf' 'melodymaker.inc', 'News from Melody Maker');

$save_folder = '/home/site/www/folder/rss_files/';

foreach($rss_data AS $rf)
{
$file_ref = $save_folder.$rf[1];
// add a filemtime test on $file_ref here if needed

$n = new rss_compile;
$oup = $n->set_params($rf[0],"yes");

// some output html

$rss_string = '<span class="rss_title">'.$rf[2].'</span><br /><br />';

for ($x=0; $x<5; $x++)
{
if(trim($oup[$x]['title']) !== "")
{

// more html - note image and span classes - amend to suit

$rss_string .= '<a href="'.$oup[$x]["link"].'" target="teck_window" class="rss_heading">'.$oup[$x]['title'].' &nbsp; <img src="images/read_story.jpg" width="24" height="14" alt="Read Story" border="0" /></a><br /><span class="rss_description">'.$oup[$x]['description'].'</span><br /><br />';
}
}

// mini news - top 5 stories max

if(!file_exists($save_folder. 'mini_' .$rf[1]))
{
touch($save_folder. 'mini_' .$rf[1]);
chmod($save_folder. 'mini_' .$rf[1],0777);
}
$fo = fopen($save_folder. 'mini_' .$rf[1],"w");
fwrite($fo,$rss_string);
fclose($fo);
$cx = count($oup);
for ($x=5; $x<$cx; $x++)
{
if(trim($oup[$x]['title']) !== "")
{

// more html - same as before

$rss_string .= '<a href="'.$oup[$x]["link"].'" target="teck_window" class="rss_heading">'.$oup[$x]['title'].' &nbsp; <img src="images/read_story.jpg" width="24" height="14" alt="Read Story" border="0" /></a><br /><span class="rss_description">'.$oup[$x]['description'].'</span><br /><br />';
}
}

// full feed saving - all stories

if(!file_exists($save_folder.$rf[1]))
{
touch($save_folder.$rf[1]);
chmod($save_folder.$rf[1],0777);
}
$ff = fopen($save_folder.$rf[1],"w");
fwrite($ff,$rss_string);
fclose($ff);
}
}
?>

You could either just cron it at every hour or so or add a few features to run from last modified - shown listing storage is a simple array which I'm sure you can amend to suit your needs.
Note: it save two files per feed, one with the top five stories (prefix 'mini_' ) and one with all stories

lifedeuce
06-21-2004, 10:14 PM
Okii how could I use your code to my specifics? I want to have a program, robots I guess, that would go pull content from url's that I input then have them embedded on my site. Is this possible, can you help?



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum