View Full Version : Help: Splitting large string
H-street
11-14-2003, 11:26 PM
Ok i am having a little dilemma,
i am having a problem processing a large (2.4Million characters) string,
the problem is if i do a
(@array) = split(//,$large_string); i get an out of memory error on a 4Gb machine..
i have found no way of actually allowing me to proc the
split(//,$string)
in any sort of while loop that would allow me to process the split array elements simulaneous as it is splitting it.
I guess essentially what i am looking for is help in processing the split while it is splitting instead of doing them in 2 steps. or even a way of processing each character in the string without splitting it.
Thanks in advance
YUPAPA
11-14-2003, 11:40 PM
How you want to split them?
H-street
11-14-2003, 11:54 PM
no sooner do i ask the question than i find an answer
while ($string =~ /(.)/g) {
$character = $1;
### Process Character
}
Yupapa,
i wanted to split the string into an array of each character in the string (the string contains no white spaces, only alpha numeric characters)
Jeff Mott
11-15-2003, 12:02 AM
You could make the program more memory efficient by not having the entire string in memory at the same time. Write the data to a tempory file, then you can work with small chunks at a time.
H-street
11-15-2003, 12:06 AM
jeff the problem is when i get the character data it comes in as 1 string
something to this effect
my $module = Module->new();
my $data = $module->get_data();
the $data is 2.4Million bases.. i now have to manipulate that data 1 character at a time (the Module is not written by me or i could possibly do that)..
to write it to a file i would have to split it anyways, even into lesser chunks (anyone know howto do that? say split it into 600K piece chunks while still mainting all elements)...
the problem is the data comes in as the 1 large piece, if this ends up being a problem i might end up having to write my own module to do what i want to do (which would be a huge pain in the...)
H-street
11-15-2003, 12:09 AM
and after putting the above code into effect it doesn't look like that is going to work either..
as the script is running my memory usage is climbing , just not as fast as it was with the split(//,$date)..
sigh.. ohh well backto square one.
YUPAPA
11-15-2003, 12:17 AM
HIHI~
You can use sysread and read the data in blocks (binary), so it is not reading it as single record. You can choose how big the block will be, tho you WILL have to deal with words that span 2 blocks~ :)
H-street
11-15-2003, 12:44 AM
YUPAPA,
thanks.. that looks like it will work..
vBulletin® v3.8.2, Copyright ©2000-2010, Jelsoft Enterprises Ltd.