Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 15 of 15

Thread: save a page

  1. #1
    New to the CF scene
    Join Date
    Jul 2008
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    save a page

    There's a website that updates statistics approximately every hour, but they don't archive their pages so I have to physically be at my computer every time they update. I'm looking for a script that I could put on my website that would automatically visit the given URL every hour and save the page in a folder on my server.

  • #2
    Regular Coder
    Join Date
    Mar 2008
    Posts
    103
    Thanks
    1
    Thanked 8 Times in 8 Posts
    Well a link to the site you need info from helps. Otherwise we can not see our content that we are tring to get.... html xml...etc....

  • #3
    New to the CF scene
    Join Date
    Jul 2008
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I'd like a script that would be able to save any page. For example, I'd like to archive yahoo.com every hour. The only additional thing the script needs is to save images. Just all the text from the site and the images.

  • #4
    New Coder
    Join Date
    Jul 2008
    Posts
    91
    Thanks
    4
    Thanked 9 Times in 9 Posts
    i would suggest a corn job and a script something like:

    [code]
    <?PHP

    //The site to index:
    $site = "http://yahoo.com";

    //Sets the epoch time of the server
    $time_epoch = time();

    //Creates a new file for writing:
    $fp = fopen("archives/yahoo".$time_epoch, "w");

    //Writes the page to the given file:
    fwrite($fp, $site);

    //Close file
    fclose($fp);

    ?>

    Since majority of websites link their images to their own servers, so you shouldnt need to download the images, but what i do suggest is make that script a little more complex and add a CSS download thing, so that it looks correct.

  • #5
    New to the CF scene
    Join Date
    Jul 2008
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    All this does is make a text file with "http://yahoo.com" in it.
    It seems like we're on the right track though. All I need is code that physically takes all the data from the page (ex. yahoo.com) and makes an html file.

  • #6
    Regular Coder
    Join Date
    Mar 2008
    Posts
    103
    Thanks
    1
    Thanked 8 Times in 8 Posts
    after you have the file on server use preg_match(); search for img tags and stylesheets and scripts if needed.....then save those

  • #7
    New to the CF scene
    Join Date
    Jul 2008
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Right, but the code doesn't get all that info. It just puts the text: "http://yahoo.com" in a text file.

  • #8
    Regular Coder
    Join Date
    Mar 2008
    Posts
    103
    Thanks
    1
    Thanked 8 Times in 8 Posts
    You need to get the file contents first....there are a few ways you can try I am not sure if any would really work though.

  • #9
    New to the CF scene
    Join Date
    Jul 2008
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I did get the file contents. It says http://yahoo.com.

    The code is wrong in the first place. There needs to be a function to get page data, but I don't know what the function is.

  • #10
    New to the CF scene
    Join Date
    Jul 2008
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    like an echo function, for an entire page with specified URL

  • #11
    New Coder
    Join Date
    Jul 2008
    Posts
    91
    Thanks
    4
    Thanked 9 Times in 9 Posts
    right ok, thats a good start i suppose, atleast the code "semi" worked, erm... try this to get the HTML file:

    PHP Code:
    <?PHP

    //The site to index:
    $site "http://yahoo.com";

    //Sets the epoch time of the server
    $time_epoch time();

    //Creates a new file for writing:
    $fp fopen("archives/yahoo".$time_epoch.".html""w");

    //Writes the page to the given file:
    fwrite($fpfile_get_contents($site));

    //Close file
    fclose($fp);

    //No Promises
    ?>
    Last edited by scoop_987; 07-21-2008 at 10:53 PM. Reason: forgot the extension

  • #12
    Regular Coder
    Join Date
    Mar 2008
    Posts
    103
    Thanks
    1
    Thanked 8 Times in 8 Posts
    date(); function may be better than time(); as time is unix timestamp...not really human readable....

  • #13
    New Coder
    Join Date
    Jul 2008
    Posts
    91
    Thanks
    4
    Thanked 9 Times in 9 Posts
    Doesnt really matter... the unix epoch is my prefered choice, i know its not everyones.

  • #14
    Regular Coder
    Join Date
    Mar 2008
    Posts
    103
    Thanks
    1
    Thanked 8 Times in 8 Posts
    I am saying for archiving purposes it may be better to have a human legible data then if you explode file name you can echo the data simply without conversions.

  • #15
    New Coder
    Join Date
    Jul 2008
    Posts
    91
    Thanks
    4
    Thanked 9 Times in 9 Posts
    well... its my preference though... so anyone can critize it... But its not hard to modify this:
    PHP Code:
    $time_epoch time(); 
    To
    PHP Code:
    $time_epoch "-".date("m-d-y"); 
    Month-Day-Year, just change to suite


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •