Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts

    How can I restructure to avoid script timeout?

    I'm adding in some functionality to an existing invoicing system, which already is running close to timeout length. Testing the addition by itself, I'm timing out, so I would appreciate some input on how I can restructure my logic to make all of this play nicely.

    Here's an overview of the existing system.

    1. build-invoice-data.php pulls data from the db and assembles an XML file of invoices. The latest invoice run built a 1.2M file. User proceeds (to preview page).

    2. preview-invoices.php reads XML file, generates a tabular preview, and prepares some temp tables in the db. Nothing notable here. User proceeds (signals OK to commit invoice run to db).

    3. commit-changes.php merges temp tables into live tables and generates aggregate pdf of all invoices for printing. Here's the start of the problem. I'm feeding the XML file into Apache FOP for the PDF generation and it takes a while to do its job, but this step is completing so far.

    The addition would be crammed into commit-changes.php and include extracting individual invoices from the XML and passing them through FOP for a PDF to insert in to the db (blob). This would be done so that actual invoices can be attached to the invoice number allowing quicker access than digging through a file of papers. This is the part that times out even by itself. It's currently a foreach that loops through all the XML invoice nodes, builds mini XMLs, and then feeds them to FOP. My last test was able to generate 20 before the 60-second timer expired.

    To eliminate some suggestions in advance, I can't regenerate a duplicate invoice purely from the db after the fact because each invoice run includes state information that changes immediately afterward. Also, I can't easily modify the function of the PDF generation because the XSL template is built for the current XML structure and I would prefer to stay away from my working template.

    Am I stuck with just increasing the script timeout value, or does someone more experienced recognize a time saver somewhere? Threads? Ajaxify?

    If it helps suggestions, this app is on a Win2k server, PHP5, Apache2, fop 0.93.

  • #2
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    do you want to redesign some parts/all or just fix for a time?

    regards

  • #3
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    Not looking for a one-time patch, if that's what you're asking. This part of the system is (ideally) run quarterly, so it's repetitive, but infrequent.

    I'm open to suggestions - even if it means rewriting or rearranging some of the earlier portions. That doesn't mean I'll do it, but I'd like some options anyway.

  • #4
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,041
    Thanks
    2
    Thanked 316 Times in 308 Posts
    If this is on a dedicated server, disabling the time limit would be the easiest - set_time_limit(0); or equivalent max_execution_time setting in php.ini or a .htaccess file. Doing this on a server that is not yours will get you into trouble with the hosting company if the script takes up too much of the available processor cycles.

    But, a general purpose fix for any batch process is to "mark" the data/records... that need to be processed and then run a timed task (a page in a browser that refreshes itself or a cron/scheduled task) that processes a pre-determined number of the data/records... on each invocation and repeat until all the data/records... have been processed.
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • Users who have thanked CFMaBiSmAd for this post:

    tomws (05-09-2008)

  • #5
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    Quote Originally Posted by CFMaBiSmAd View Post
    If this is on a dedicated server, disabling the time limit would be the easiest - set_time_limit(0); or equivalent max_execution_time setting in php.ini or a .htaccess file. Doing this on a server that is not yours will get you into trouble with the hosting company if the script takes up too much of the available processor cycles.
    Yes, I should have mentioned in the first post, but it slipped my mind. This is one of our own in-house servers. Furthermore, this is the only web app on the server used by anyone but me. Otherwise, the box is just network file storage.

    I didn't know about the set_time_limit function. Good information. I was only thinking of the php.ini, which is a less desirable means of changing the timeout.

    Quote Originally Posted by CFMaBiSmAd View Post
    But, a general purpose fix for any batch process is to "mark" the data/records... that need to be processed and then run a timed task (a page in a browser that refreshes itself or a cron/scheduled task) that processes a pre-determined number of the data/records... on each invocation and repeat until all the data/records... have been processed.
    This sounds promising. Let me re-state what you've said in my own words and in a way that would apply to my specific situation. Tell me if I understand correctly. For reference, handler page below would be commit-changes.php in the first post.

    - Add a flag node into the XML (back on build-invoice-data.php) that is only used by the handler page.
    - Handler page set to refresh at, say, 60 seconds. (Meta tag refresh?)
    - Handler page loops through XML while there are unprocessed nodes && node count processed this pass <= 10, for example.
    - Go to new end page (where the user downloads the original batch PDF).

  • #6
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by tomws View Post
    1. build-invoice-data.php pulls data from the db and assembles an XML file of invoices. The latest invoice run built a 1.2M file. User proceeds (to preview page).

    2. preview-invoices.php reads XML file, generates a tabular preview, and prepares some temp tables in the db. Nothing notable here. User proceeds (signals OK to commit invoice run to db).

    3. commit-changes.php merges temp tables into live tables and generates aggregate pdf of all invoices for printing. Here's the start of the problem. I'm feeding the XML file into Apache FOP for the PDF generation and it takes a while to do its job, but this step is completing so far.

    The addition would be crammed into commit-changes.php and include extracting individual invoices from the XML and passing them through FOP for a PDF to insert in to the db (blob). This would be done so that actual invoices can be attached to the invoice number allowing quicker access than digging through a file of papers. This is the part that times out even by itself. It's currently a foreach that loops through all the XML invoice nodes, builds mini XMLs, and then feeds them to FOP. My last test was able to generate 20 before the 60-second timer expired.
    the biggest problem here IMO is at 1), the size of xml file, this make at each step where is used to be loaded again and this is slow. Also the processing time grow faster for biger file, so the solution, IMO, is to use the same logic as CFMaBiSmAd suggest, I mean split in given number of invoices, at each step.
    The disavantage is that this is a little bit more complicated then the current solution but this way you don't depend by the number of invoices processed.

    how can be done, in my opinion, yours solution for step 3.), using ajax and xquery/xpath:
    Quote Originally Posted by tomws View Post
    - Add a flag node into the XML (back on build-invoice-data.php) that is only used by the handler page.
    don't need a flag in my opinion, you can use xquery with xpath expression to select only a chunk starting with a given position in the tree and a given number of nodes

    - Handler page set to refresh at, say, 60 seconds. (Meta tag refresh?)
    javascript, ajax request, you can use the ajax response and setInterval to check and request a new process step

    - Handler page loops through XML while there are unprocessed nodes && node count processed this pass <= 10, for example.
    increment the parameter in the xpath expression

    - Go to new end page (where the user downloads the original batch PDF).
    test if the parameter from xpath is less the last(), repeat the loop and because you use ajax you don't need to reload the page, only to update the content using ajax response.

    same idea using xquery/xpath and ajax can apply to all steps.

    regards

  • Users who have thanked oesxyl for this post:

    tomws (05-10-2008)

  • #7
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    Thanks. Another good possibility there, I think. That would give me an opportunity to learn how to integrate xpath/xquery, too.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •