koyama
12-17-2008, 05:19 AM
I am having trouble porting a PHP script. It all comes down to some strange behavior of file_get_contents(<url>) together with 404 pages.
Test script:
<?php
// retrieve content from 404 error page
var_dump(file_get_contents('http://www.google.com/xyz'));
?>
Results:
1. My local server (PHP Version 5.2.6-pl7-gentoo):
Does not seem to bother that it is a 404 error page. It reads the URL as if it was a completely normal page.
Output:
string(7200) "<html><head><meta http-equiv=content-type content="text/html;
charset=ISO-8859-1"><title>404 - Page Not Found</title>..."
2. Web server (PHP Version 5.2.8-0.dotdeb.1):
Refuses to retrieve contents of error page.
Output:
Warning: file_get_contents(http://www.google.com/xyz) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.0
404 Not Found in /home/www/ab/index.php on line 3
bool(false)
What is the reason for the difference?
I have done plenty of searches, but I haven't been able to find anyone having ”success” like me retrieving content from error pages using file_get_contents(). So which server is working correctly? Where does it even say in the PHP documentation what should happen with file_get_contents() in the case of 404 error pages?
I am aware that there is the more powerful cURL library for doing similar things, but at this point I would just like to know the reason for the above difference.
Test script:
<?php
// retrieve content from 404 error page
var_dump(file_get_contents('http://www.google.com/xyz'));
?>
Results:
1. My local server (PHP Version 5.2.6-pl7-gentoo):
Does not seem to bother that it is a 404 error page. It reads the URL as if it was a completely normal page.
Output:
string(7200) "<html><head><meta http-equiv=content-type content="text/html;
charset=ISO-8859-1"><title>404 - Page Not Found</title>..."
2. Web server (PHP Version 5.2.8-0.dotdeb.1):
Refuses to retrieve contents of error page.
Output:
Warning: file_get_contents(http://www.google.com/xyz) [function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.0
404 Not Found in /home/www/ab/index.php on line 3
bool(false)
What is the reason for the difference?
I have done plenty of searches, but I haven't been able to find anyone having ”success” like me retrieving content from error pages using file_get_contents(). So which server is working correctly? Where does it even say in the PHP documentation what should happen with file_get_contents() in the case of 404 error pages?
I am aware that there is the more powerful cURL library for doing similar things, but at this point I would just like to know the reason for the above difference.