...

View Full Version : How can I restructure to avoid script timeout?



tomws
05-08-2008, 05:54 PM
I'm adding in some functionality to an existing invoicing system, which already is running close to timeout length. Testing the addition by itself, I'm timing out, so I would appreciate some input on how I can restructure my logic to make all of this play nicely.

Here's an overview of the existing system.

1. build-invoice-data.php pulls data from the db and assembles an XML file of invoices. The latest invoice run built a 1.2M file. User proceeds (to preview page).

2. preview-invoices.php reads XML file, generates a tabular preview, and prepares some temp tables in the db. Nothing notable here. User proceeds (signals OK to commit invoice run to db).

3. commit-changes.php merges temp tables into live tables and generates aggregate pdf of all invoices for printing. Here's the start of the problem. I'm feeding the XML file into Apache FOP for the PDF generation and it takes a while to do its job, but this step is completing so far.

The addition would be crammed into commit-changes.php and include extracting individual invoices from the XML and passing them through FOP for a PDF to insert in to the db (blob). This would be done so that actual invoices can be attached to the invoice number allowing quicker access than digging through a file of papers. This is the part that times out even by itself. It's currently a foreach that loops through all the XML invoice nodes, builds mini XMLs, and then feeds them to FOP. My last test was able to generate 20 before the 60-second timer expired. :(

To eliminate some suggestions in advance, I can't regenerate a duplicate invoice purely from the db after the fact because each invoice run includes state information that changes immediately afterward. Also, I can't easily modify the function of the PDF generation because the XSL template is built for the current XML structure and I would prefer to stay away from my working template.

Am I stuck with just increasing the script timeout value, or does someone more experienced recognize a time saver somewhere? Threads? Ajaxify?

If it helps suggestions, this app is on a Win2k server, PHP5, Apache2, fop 0.93.

oesxyl
05-09-2008, 01:31 AM
do you want to redesign some parts/all or just fix for a time?

regards

tomws
05-09-2008, 02:57 AM
Not looking for a one-time patch, if that's what you're asking. This part of the system is (ideally) run quarterly, so it's repetitive, but infrequent.

I'm open to suggestions - even if it means rewriting or rearranging some of the earlier portions. That doesn't mean I'll do it, but I'd like some options anyway. :D

CFMaBiSmAd
05-09-2008, 03:43 AM
If this is on a dedicated server, disabling the time limit would be the easiest - set_time_limit(0); or equivalent max_execution_time setting in php.ini or a .htaccess file. Doing this on a server that is not yours will get you into trouble with the hosting company if the script takes up too much of the available processor cycles.

But, a general purpose fix for any batch process is to "mark" the data/records... that need to be processed and then run a timed task (a page in a browser that refreshes itself or a cron/scheduled task) that processes a pre-determined number of the data/records... on each invocation and repeat until all the data/records... have been processed.

tomws
05-09-2008, 01:20 PM
If this is on a dedicated server, disabling the time limit would be the easiest - set_time_limit(0); or equivalent max_execution_time setting in php.ini or a .htaccess file. Doing this on a server that is not yours will get you into trouble with the hosting company if the script takes up too much of the available processor cycles.

Yes, I should have mentioned in the first post, but it slipped my mind. This is one of our own in-house servers. Furthermore, this is the only web app on the server used by anyone but me. Otherwise, the box is just network file storage.

I didn't know about the set_time_limit function. Good information. I was only thinking of the php.ini, which is a less desirable means of changing the timeout.


But, a general purpose fix for any batch process is to "mark" the data/records... that need to be processed and then run a timed task (a page in a browser that refreshes itself or a cron/scheduled task) that processes a pre-determined number of the data/records... on each invocation and repeat until all the data/records... have been processed.

This sounds promising. Let me re-state what you've said in my own words and in a way that would apply to my specific situation. Tell me if I understand correctly. For reference, handler page below would be commit-changes.php in the first post.

- Add a flag node into the XML (back on build-invoice-data.php) that is only used by the handler page.
- Handler page set to refresh at, say, 60 seconds. (Meta tag refresh?)
- Handler page loops through XML while there are unprocessed nodes && node count processed this pass <= 10, for example.
- Go to new end page (where the user downloads the original batch PDF).

oesxyl
05-10-2008, 02:30 AM
1. build-invoice-data.php pulls data from the db and assembles an XML file of invoices. The latest invoice run built a 1.2M file. User proceeds (to preview page).

2. preview-invoices.php reads XML file, generates a tabular preview, and prepares some temp tables in the db. Nothing notable here. User proceeds (signals OK to commit invoice run to db).

3. commit-changes.php merges temp tables into live tables and generates aggregate pdf of all invoices for printing. Here's the start of the problem. I'm feeding the XML file into Apache FOP for the PDF generation and it takes a while to do its job, but this step is completing so far.

The addition would be crammed into commit-changes.php and include extracting individual invoices from the XML and passing them through FOP for a PDF to insert in to the db (blob). This would be done so that actual invoices can be attached to the invoice number allowing quicker access than digging through a file of papers. This is the part that times out even by itself. It's currently a foreach that loops through all the XML invoice nodes, builds mini XMLs, and then feeds them to FOP. My last test was able to generate 20 before the 60-second timer expired. :(

the biggest problem here IMO is at 1), the size of xml file, this make at each step where is used to be loaded again and this is slow. Also the processing time grow faster for biger file, so the solution, IMO, is to use the same logic as CFMaBiSmAd suggest, I mean split in given number of invoices, at each step.
The disavantage is that this is a little bit more complicated then the current solution but this way you don't depend by the number of invoices processed.

how can be done, in my opinion, yours solution for step 3.), using ajax and xquery/xpath:


- Add a flag node into the XML (back on build-invoice-data.php) that is only used by the handler page.
don't need a flag in my opinion, you can use xquery with xpath expression to select only a chunk starting with a given position in the tree and a given number of nodes


- Handler page set to refresh at, say, 60 seconds. (Meta tag refresh?)
javascript, ajax request, you can use the ajax response and setInterval to check and request a new process step


- Handler page loops through XML while there are unprocessed nodes && node count processed this pass <= 10, for example.
increment the parameter in the xpath expression


- Go to new end page (where the user downloads the original batch PDF).
test if the parameter from xpath is less the last(), repeat the loop and because you use ajax you don't need to reload the page, only to update the content using ajax response.

same idea using xquery/xpath and ajax can apply to all steps.

regards

tomws
05-10-2008, 01:21 PM
Thanks. Another good possibility there, I think. That would give me an opportunity to learn how to integrate xpath/xquery, too.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum