...

View Full Version : save a page



herloman
07-21-2008, 09:08 PM
There's a website that updates statistics approximately every hour, but they don't archive their pages so I have to physically be at my computer every time they update. I'm looking for a script that I could put on my website that would automatically visit the given URL every hour and save the page in a folder on my server.

p4plus2
07-21-2008, 09:17 PM
Well a link to the site you need info from helps. Otherwise we can not see our content that we are tring to get.... html xml...etc....

herloman
07-21-2008, 09:31 PM
I'd like a script that would be able to save any page. For example, I'd like to archive yahoo.com every hour. The only additional thing the script needs is to save images. Just all the text from the site and the images.

scoop_987
07-21-2008, 09:49 PM
i would suggest a corn job and a script something like:

[code]
<?PHP

//The site to index:
$site = "http://yahoo.com";

//Sets the epoch time of the server
$time_epoch = time();

//Creates a new file for writing:
$fp = fopen("archives/yahoo".$time_epoch, "w");

//Writes the page to the given file:
fwrite($fp, $site);

//Close file
fclose($fp);

?>

Since majority of websites link their images to their own servers, so you shouldnt need to download the images, but what i do suggest is make that script a little more complex and add a CSS download thing, so that it looks correct.

herloman
07-21-2008, 10:01 PM
All this does is make a text file with "http://yahoo.com" in it.
It seems like we're on the right track though. All I need is code that physically takes all the data from the page (ex. yahoo.com) and makes an html file.

p4plus2
07-21-2008, 10:04 PM
after you have the file on server use preg_match(); search for img tags and stylesheets and scripts if needed.....then save those

herloman
07-21-2008, 10:12 PM
Right, but the code doesn't get all that info. It just puts the text: "http://yahoo.com" in a text file.

p4plus2
07-21-2008, 10:14 PM
You need to get the file contents first....there are a few ways you can try I am not sure if any would really work though.

herloman
07-21-2008, 10:17 PM
I did get the file contents. It says http://yahoo.com.

The code is wrong in the first place. There needs to be a function to get page data, but I don't know what the function is.

herloman
07-21-2008, 10:21 PM
like an echo function, for an entire page with specified URL

scoop_987
07-21-2008, 10:24 PM
right ok, thats a good start i suppose, atleast the code "semi" worked, erm... try this to get the HTML file:



<?PHP

//The site to index:
$site = "http://yahoo.com";

//Sets the epoch time of the server
$time_epoch = time();

//Creates a new file for writing:
$fp = fopen("archives/yahoo".$time_epoch.".html", "w");

//Writes the page to the given file:
fwrite($fp, file_get_contents($site));

//Close file
fclose($fp);

//No Promises
?>

p4plus2
07-21-2008, 11:34 PM
date(); function may be better than time(); as time is unix timestamp...not really human readable....

scoop_987
07-22-2008, 07:10 PM
Doesnt really matter... the unix epoch is my prefered choice, i know its not everyones.

p4plus2
07-22-2008, 10:00 PM
I am saying for archiving purposes it may be better to have a human legible data then if you explode file name you can echo the data simply without conversions.

scoop_987
07-23-2008, 08:14 PM
well... its my preference though... so anyone can critize it... But its not hard to modify this:

$time_epoch = time();
To

$time_epoch = "-".date("m-d-y");

Month-Day-Year, just change to suite



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum