...

View Full Version : Checking to see if a site is updated



Bobafart
01-07-2008, 01:51 AM
Is there a way in PHP that you can check to see if a specific website document has been updated? (nonRSS, of course).

Run a script, the script fget()'s the URL and then does a check to see if the www.foo.com/index.html or /index.asp or index.php (etc) has been updated since the last check?

kbluhm
01-07-2008, 02:21 AM
Get the page's source code, then sha1()/md5() it and save it.

Next time you check, do the same. If the sha1/md5 has changed, the source-code has changed... so the site has been modified or updated.

Bobafart
01-18-2008, 04:22 PM
Get the page's source code, then sha1()/md5() it and save it.

Next time you check, do the same. If the sha1/md5 has changed, the source-code has changed... so the site has been modified or updated.

what do you mean by "Get the page's source code"?

do you mean using an fget() on the URL?

ie: fget(http://www.cnn.com) ?

kbluhm
01-18-2008, 04:27 PM
By "get te page's source code"... I mean get the page's source code:


$source = file_get_contents( $url );
$source = md5( $source );


If the client writes anything in real-time to the page, it could wreak havoc on your plans. For instance, if they display the current time, users online, and so on. You may want to grab the source and, using a RegExp, grab a portion of the code you know will only be modified with major updates, but that will severely decrease the scripts ability to just plug in any old site and work as desired.

What you could also use is stream_get_meta_data():


$fp = fopen( $url, 'r' );
$data = stream_get_meta_data( $fp );
print_r( $data['wrapper_data'] );
fclose( $fp );

Will give you something like:


Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Fri, 18 Jan 2008 15:31:42 GMT
[2] => Server: Apache/2.2.0 (Fedora)
[3] => Last-Modified: Sat, 23 Dec 2006 20:05:22 GMT
[4] => ETag: "9f251c-795f-15998480"
[5] => Accept-Ranges: bytes
[6] => Content-Length: 31071
[7] => Connection: close
[8] => Content-Type: text/html
)

You could then use the Last-Modified header to check when the requested URL was... last modified.

Actually, if you visit http://www.php.net/stream_get_meta_data, the current top-most comment (by ed at readinged dot com) is probably exactly what you're looking for.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum