PDA

View Full Version : Page timing out (returning over 3000 records)


markhartnady
02-04-2004, 04:21 PM
It's 5:30pm. Everyone is leaving the office. But I cant cos this page keeps crashing Apache.

I know nothing about caching or output buffering but this may be the solution to the problem.

I am trying to run a script, which returns about 3,000 records from a MySQL db each containing a URL. On each URL, I am running a function. This takes about 2-4 seconds per URL. So in essence the whole page should take about (3X3000sec = +- 2 hours). This is fine, as I can leave it running overnight, but it keeps timing out.

After about 4 minutes, Apache returns an error and the page stops running.

Heres the code:


<?php
error_reporting(0);
//$fp = fsockopen($url, 80, $errno, $errstr, 30);

include "ez_sql.php";
include "Smarty.class.php";

function format_url ($url) {
$scheme = substr($url, 0, 4);
if ($scheme <> "http") { //cater for https also
$t_url = "http://" . $url;
return $t_url;
} else { return $url; }
}

function is_deadlink ($url) {
$t_url = format_url($url);
$fp = fopen($t_url, "r");
if (!$fp) { return true; }
else { return false; }
}

$sql_sl = "SELECT t1.id_sl_folder as folder_id, t2.id_sl_link as link_id,
trim( t3.url ) AS myurl,
t3.active, t3.system_active, count(t2.id_sl_link) as count_link_id
FROM tb_sl_folder AS t1
LEFT JOIN tb_sl_folder_link AS t2 ON t1.id_sl_folder = t2.id_sl_folder
LEFT JOIN tb_sl_link AS t3 ON t3.id_sl_link = t2.id_sl_link
WHERE t3.url <> ''
GROUP BY t3.url
ORDER BY t3.url";

$result = mysql_query($sql_sl);
header('X-pmaPing: Pong');
ini_set("max_execution_time", "3000");
ob_start();

?>

<HTML>
<head>
<link href="../css/pmapsadmin.css" rel="stylesheet" type="text/css">
</head>
<body bgcolor="#FFFFFF">
<font class="dfont10"><b>

<?
$sing_links = 0;
$totallinks = 0;
while ($row = mysql_fetch_array($result))
{
$totallinks += $row["count_link_id"];
$sing_links += 1;
}
echo "Total no. links: " . $totallinks . "<BR>";
echo "Total no. unrepeated links: " . $sing_links . "<BR>";
echo "Please be patient, this query can take up to 15 minutes.<BR><BR>";
ob_end_flush();
ob_start();
$result = mysql_query($sql_sl);
while ($url_array = mysql_fetch_array($result))
{
$currsl = $url_array["folder_id"];
$id_sl_link = $url_array["link_id"];
echo "Edit Link: <a target=_BLANK href=\"smartlinks_admin.php?currsl=$currsl&op=EditLink&id_sl_link=$id_sl_link\">" . $url_array["link_id"] . "</a>";
echo " URL: <a target=_BLANK href=\"" . format_url($url_array["myurl"]) . "\">" . $url_array["myurl"] . "</a>&nbsp;";
if (is_deadlink(format_url($url_array["myurl"]))) {
$status = "<font color=red>DEAD</font>";
} else { $status = "<font color=green>ACTIVE</font>"; }
echo "Link is: $status<br>";
}

?>
</b></font>
</body>
</HTML>
<?
ob_end_flush();
?>

mordred
02-04-2004, 06:18 PM
This may be the answer to your dilemma:


function is_deadlink ($url) {

$t_url = format_url($url);

$fp = fopen($t_url, "r");

if (!$fp) { return true; }

else { return false; }

}


You are not closing the opened file, so I guess the allocated resources continue to exist and Apache rans out of memory eventually. Whenever you do something with filesystem calls, you must explicitly close the file:


function is_deadlink ($url) {

$t_url = format_url($url);

$fp = fopen($t_url, "r");

if (!$fp) { fclose($fp); return true; }

else { fclose($fp); return false; }

}


Also, I would disbale execution timeout completely by setting


ini_set("max_execution_time", 0);


What I really can't understand but has a certain irony in it, is that you obviously have an error in your script, but you turned off error reporting completely. That's not exactly helpful for debugging... make that rather


error_reporting(E_ALL);

markhartnady
02-05-2004, 08:30 AM
Hmmm, thanks for the help, but Apache still produces an error, and the page times out.

Silly of me to overlook closing the file. Error reporting was off because even though the is_deadlink function works, it produces a PHP error if the URL doesnt open properly.

I think you are right about the is_deadlink function though... The problem lies therein for sure. Do you know of any other (faster, more reliable way) to check for a deadlink?

Thanks again.

mordred
02-05-2004, 11:32 AM
What was the error message of Apache? If it isn't displayed, it's most surely in the erro.log file.

I don't know how fopen() works internally, but you can try to use a socket and send a HEAD request to the desired host, that only fetches HTTP metadata which you can check if the page exists or not, and not the content body.

Quick hack to illustrate the usage:


$host = "www.codingforums.com";
$path = "/";
$fp = fsockopen($host, 80, $errno, $errstr, 30);

if (!$fp) {
echo "$errstr ($errno)<br>\n";
} else {
fputs ($fp, "HEAD $path HTTP/1.0\r\nHost: $host\r\n\r\n");

$str = '';
while (!feof($fp)) {
$str .= fgets ($fp, 128);
}

if (preg_match('~HTTP/1.[01] 200 OK~', $str)) {
echo 'ok';
} else {
echo 'not ok';
}
}
fclose ($fp);


If you think the amount of your links to check is to much, you could try to process the recordset in chunks. Check the first 100 links, send a redirect to another page that checks the next 100 links, etc.

raf
02-05-2004, 12:43 PM
This is quite a resource eating script.

First of, i'm not sure if you should do this over the webserver.
And if you do, then you should make some performance improvements. Like for instance, you now execute the same query twice ! You should never do that and certainly not if the recordset contains 3Krecords.

And the first time that you run the query and iterate through the recordset, you are only doing that to get a count, which is just a waste of resources since you can use a select count(*) for that (which will only return one variable-value pair). There is absolutely no need for your first select.

And what should be the result of this script? A list of 3000 links with there status? Why? Shouldn't a list with only the dead links do?

To test everything out, you best also add a LIMIT 20 to the end of your select (which you should always do when testing, if there is a chance that a very large recordset would be returned).

You could also set up a 2 page mechanisme where in page 1 you do the select with a LIMIT 100 that checks for the page and that then simply sets a flag in your db (something that is way more important in my opinion, then a red link) This can easely be done by building a commadelimited string of all dead links id's, and then use this inside an 'In (" . $coll . ")' clause of an updatestatement.
After the update redirect to page 2 which simply redirects you to page 1.
The loop should stop if mysql_num_rows() < '0'

This way, it also isn't such a big problem if the page errors after 2k records.

Just post back if you would try it like this and need some help.

markhartnady
02-05-2004, 04:06 PM
How ironic! I just checked your reply now, and I had already implemented everything you suggested! Thats just bizarre. Great minds think alike. Thanks for the help tho.

raf
02-05-2004, 05:55 PM
Hmm. Bizare.Great minds think alikeMaybe it's just following standard procedures and mindroads your bound to pick up after a while ...

raf
02-05-2004, 06:22 PM
Hmm. Bizare.Great minds think alikeMaybe it's just following standard procedures and mindroads your bound to pick up after a while ...