...

View Full Version : Parsing Data from a 50 meg text file.



kairog
12-12-2008, 09:49 AM
Hi, I have a problem with pulling data from a very large file. About 50 megs

I use file_get_contents() for my purpose but the script stops working due to memory problem.

Is there any solution to this?

I hope you could help. Thanks in advance :)

mic2100
12-12-2008, 11:34 AM
hi,

i have had simlar problems trying open large files

i found a bit of code that help me.


ini_set("memory_limit","128M");


it changes the memory limit just for the duration of the script

:)

barkermn01
12-12-2008, 12:07 PM
hi,

i have had simlar problems trying open large files

i found a bit of code that help me.


ini_set("memory_limit","128M");


it changes the memory limit just for the duration of the script

:)

This will only work if php is not in safe mode and your host allows the ini_set as you can temp config alot in the php.ini file with that command so alot of hosts dont allow it you can get php to run a virus from a mod_ file

mic2100
12-12-2008, 05:47 PM
This will only work if php is not in safe mode and your host allows the ini_set as you can temp config alot in the php.ini file with that command so alot of hosts dont allow it you can get php to run a virus from a mod_ file

Yeah sorry i forgot to mention the only real servers i had used this on were ones i had complete control of.

mlseim
12-12-2008, 06:04 PM
Try using Perl instead of PHP.

So the Perl script would be uploaded into your cgi-bin directory,
and of course, it would be Perl scripting. Google search for some
script examples.

I don't know what your memory limit is for Perl, but I know it's
very much larger than PHP.

oesxyl
12-12-2008, 07:52 PM
Hi, I have a problem with pulling data from a very large file. About 50 megs

I use file_get_contents() for my purpose but the script stops working due to memory problem.

Is there any solution to this?

I hope you could help. Thanks in advance :)
for big file is better, no matter what language you use, php or perl, to not store data in memory.
process them as stream, more exactly repeat a cycle read data - process - write results until you process all file.

best regards

oesxyl
12-12-2008, 07:55 PM
Try using Perl instead of PHP.

So the Perl script would be uploaded into your cgi-bin directory,
and of course, it would be Perl scripting. Google search for some
script examples.

I don't know what your memory limit is for Perl, but I know it's
very much larger than PHP.
no matter what language is used perl or php, is not safe to store temporary data in cgi-bin.

best regards

kucerar
12-12-2008, 08:04 PM
Hi, I have a problem with pulling data from a very large file. About 50 megs

I use file_get_contents()

Can't you read it a line at a time?

good luck.

mlseim
12-12-2008, 08:20 PM
no matter what language is used perl or php, is not safe to store temporary data in cgi-bin.

not safe as in the data might be "sensitive" or "private"?
or not safe as it might crash the server?

oesxyl
12-12-2008, 09:09 PM
not safe as in the data might be "sensitive" or "private"?
or not safe as it might crash the server?
I try to avoid to use my english to explain that, :):

http://www.verysimple.com/blog/2006/03/30/securing-your-cgi-bin/

best regards

CFMaBiSmAd
12-12-2008, 09:19 PM
The correct way of handling a large amount of data in a file, especially if the size of the file is expected to continue to grow, is to "page" through that file in smaller, manageable blocks.

However, using parsed/tokenized/interpreted scripting languages like php/perl are 100 times slower at doing the searching, parsing, processing than using the complied code of a database engine. After the amount of data your application uses exceeds a few thousand rows, it is time to put that data into a proper database.

mlseim
12-13-2008, 01:47 AM
oesxyl,
Thanks for that ... a good explanation that I was not aware of.

kairog
12-14-2008, 02:26 AM
Thank you for all your helpful posts.

kairog
12-14-2008, 02:31 AM
for big file is better, no matter what language you use, php or perl, to not store data in memory.
process them as stream, more exactly repeat a cycle read data - process - write results until you process all file.

best regards

Hi oesxyl, I've been coding PHP but this one has been a tough job for me...looping through the file. Do you have any sample code where you can refer me to?

Thanks in advance. :)

oesxyl
12-14-2008, 04:26 AM
Hi oesxyl, I've been coding PHP but this one has been a tough job for me...looping through the file. Do you have any sample code where you can refer me to?

Thanks in advance. :)
somethink like that:


<?php
$chunksize = 8192; // probably bigger
$handle = fopen("http://www.example.com/", "r");
if($handle){
while (!feof($handle)) {
$contents = fread($handle, $chunksize);
// process $contents here
}
fclose($handle);
}
?>

if you need, inside the while loop, you can store some data in variables for later use, write the results of processing into a file, temporary or not, or insert into a database.

best regards

kairog
12-15-2008, 01:33 AM
somethink like that:


<?php
$chunksize = 8192; // probably bigger
$handle = fopen("http://www.example.com/", "r");
if($handle){
while (!feof($handle)) {
$contents = fread($handle, $chunksize);
// process $contents here
}
fclose($handle);
}
?>

if you need, inside the while loop, you can store some data in variables for later use, write the results of processing into a file, temporary or not, or insert into a database.

best regards

Thank you for this script mate.

By the way, the file is very big (50 megs) how can I handle the maximum execution time?

Is there a way to override the execution time inside the loop. example reseting the time to another 60 seconds and so therefore the script will continue to go on.

Or is there any better idea?

oesxyl
12-15-2008, 02:14 AM
Thank you for this script mate.

By the way, the file is very big (50 megs) how can I handle the maximum execution time?

Is there a way to override the execution time inside the loop. example reseting the time to another 60 seconds and so therefore the script will continue to go on.

Or is there any better idea?
you can modify the server settings for maximum execution time but I think is a bad idea since this will affect all scripts.
Another ways:
1. can use some server tools like 'split' ( for linux) or you can write a script to split the file in smaller pieces and process each one.

2. you can estimate/calculate how much of the file you can process in some time up to maximum execution time, save the position in the file, redirect and restore the position after. This will have another restriction, maximum number of redirection.
for save/restore the file position you can use ftell and fseek:

http://www.php.net/manual/en/function.ftell.php
http://www.php.net/manual/en/function.fseek.php

any solution you use optimize the script.
It's hard to suggest something maybe if you give more details we can find a better solution.

if you intend to use fseek, you must upload the file first if is remote:

Note: May not be used on file pointers returned by fopen() if they use the "http://" or "ftp://" formats. fseek() gives also undefined results for append-only streams (opened with "a" flag)
best regards



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum