...

View Full Version : Call perl script in loop without recompiling



Jolle
08-21-2009, 02:01 PM
Hi all,

I have two perl scripts :

The first takes 2 files as an input, processes them and puts the output in a third file.

The second takes one directory as an input, and searches all subdirectories for input files, defines the output file and then calls the first script to process the whole thing.

Now, I loop through all subdirectories with this loop :


foreach (@subdirs){
foreach (@dirfiles){
m/(^file1.+)/ and $opt1 = $1;
m/(^file2.+)/ and $opt2 = $1;
}
system ("perl -S script2.pl",$opt1,$opt2);
}
This works, but obviously script2.pl will be recompiled again every time it's called. That's rather clumsy, so I want to optimize this. I cannot combine both scripts in one file, as I need the other script seperately too. Any idea about how to call that script in such a way it is compiled only once, and then takes different arguments each time in the loop?

I've been looking at backticks, eval{}, do{}, require{}, but I didn't find much info on when the code is compiled, or how to pass different arguments each time. They should end up in the @ARGV of script2.pl.


Anybody an idea? Thank you in advance

FishMonger
08-21-2009, 02:36 PM
Convert script2.pl into a module and then load it into your other script via a 'use' or 'require' statement.

FishMonger
08-21-2009, 02:56 PM
If you want to show us both scripts, we should be able to show you how to do it more efficiently.

Jolle
08-21-2009, 03:00 PM
Convert script2.pl into a module and then load it into your other script via a 'use' or 'require' statement.
That would be an option if I didn't need script2.pl on its own as well regularly. If I make it a module, I have to write a third script to run the only subroutine in the module script2.pm. That's quite a way around.

There is some trick I've seen before, but I can't find it anymore.

Jolle
08-21-2009, 03:06 PM
The two scripts. JMTools is a module with some subs I use quite often, like GetNewFile (user interface when a file is not valid or exists already), GetDir (changes to that dir and globs the second argument. Implements GetNewFile is dir does not work, that's why it returns dir as well)

Script 1 :

#!/usr/bin/perl -w
use strict;
our $directory;
BEGIN {
use File::Spec::Functions qw(rel2abs);
use File::Basename qw (dirname);
my $path = rel2abs ($0);
$directory = dirname ($path)
}
use lib $directory;
use JMTools qw(:DEFAULT :DIRS);
#-----------------------------------------------------------
# Initialization
#-----------------------------------------------------------
my $dir;
my @subdirs;
my $help = qq/
BatchExtract help
------------------
BatchExtract uses GenExtract to extract the genes of all sets found in
the specified directory. Each set should be in a seperate directory, with
the species name as directory name. Each directory should contain at least
a genomeInfo.txt file, a genomeFASTA.txt file and a proteomeFASTA.txt file.
After processing, the gene files and protein files are saved in seperate
directories, specified in JMTools.

use : perl BatchExtract.pl [<dirname>]
If <dirname> is not specified, the default GENOME_DIR from JMTools is used.
/;

$dir = shift @ARGV or $dir=$GENOME_DIR;
($dir,@subdirs) = GetDir($dir,"*") unless ($dir eq"-h") and die $help;

#-----------------------------------------------------------
# MAIN
#-----------------------------------------------------------
foreach (@subdirs){
my $g;
my $i;
my $o = $GENE_DIR.$_."genes.txt"; #adapt if necessary
(-d $_) or print "$_ is not a known directory\n" && next;
chdir $_ or print "Could not change to $_\n" && next;
opendir DIR, "." or print "Couldn't open $_\n" && next;
my @dirfiles = glob("*.txt");
closedir DIR;
foreach (@dirfiles){
if($_ =~ /(.+genomeFASTA.txt)/){$g=$1};
if($_ =~ /(.+genomeInfo.txt)/ ){$i=$1};
}
my $param = "g ".$g." i ".$i." o ".$o;
print $param."\n";
system("perl -S GenExtract.pl $param");
chdir "..";
}
1;

Script 2 :

#!/usr/bin/perl -w
use strict;
our $directory;
BEGIN {
use File::Spec::Functions qw(rel2abs);
use File::Basename qw (dirname);
my $path = rel2abs ($0);
$directory = dirname ($path)
}
use lib $directory;
use JMTools;
sub CreateHeader($\%);
sub GetEnd($\%);
sub AddToFile($$$);
#-----------------------------------------------------------
# Initialization
#-----------------------------------------------------------
my $help =qq/
GenExtract extracts the genes from genome files originating from
MicrobesOnline.org. It needs as input the genomeFASTA file and the info file.
Determine following parameters :
g filename_genome.ext
i filename_infosheet.ext
o filename_out.ext
ex: GenExtract.pl g genome.txt i info.txt o genes.txt\n/;

###### READING ARGV

(@ARGV % 2) == 0 or die $help;
my %opt = @ARGV;
defined $opt{i} &&
defined $opt{g} &&
defined $opt{o} or die $help;

# echo set parameters
print qq/\n Following files were specified :
genome file in: $opt{g}
info file in : $opt{i}
genes file out: $opt{o}\n
Big files might take a minute...\n\n/;

#-----------------------------------------------------------
# MAIN
#-----------------------------------------------------------
# Read in gene info file
while(1){ # Open INI or get new file
$opt{i}=GetNewFile($opt{i},0)
unless (open INI,"<$opt{i}") && last;
} # end while (1)

# Creating keys for gene info
my $labels = <INI>;
chomp $labels;
my @Labels = split /\t/,$labels;
shift(@Labels) =~ /locusid/i or die "info file not valid\n"; #locusID is not a key for the data-hash

# Extracting gene info from file
my %ginfo; # hash/key=locusID with hashes/key=@Labels to store gene info
my %pos; # hash/key=start position, value=ID for scrolling through genome seq
my $lastpos = 0; # keeps the previous read-in position
while (my $data=<INI>){
chomp $data;
my @Data = split /\t/,$data;
my $locusID = shift @Data;
for (0..$#Labels){ # assign ginfo for each label or define not available
$ginfo{$locusID}{$Labels[$_]}=$Data[$_]||"NA";
} # end for 0..Labels
if ($ginfo{$locusID}{start} < $lastpos-1000) {last};
# assign start pos seen in + sense for both + and - strands
if ($ginfo{$locusID}{strand} eq "-"){
$lastpos = $ginfo{$locusID}{stop};
$pos{$lastpos} = $locusID;
} else{
$lastpos = $ginfo{$locusID}{start};
$pos{$lastpos}= $locusID;
} # end if
} # end while INI
close INI;

###### GENOME GENE EXTRACTION

while(1){ # open ING or get new file
$opt{g}=GetNewFile($opt{g},0)
unless (open ING,"<$opt{g}") && last;
} # end while (1)

# reading species name from genome-file
my $species=<ING>;
if($species =~ /([a-z]{4,}.*)\sNCBI\s.*ID\s(\d{4,})/i){
$species="ID:".$2."\tname:".$1;

} else {
$species="Species not known";
} # end else
# reading in the genome at once
my $genome ="";
while (<ING>){
chomp $_ unless ($_ =~/>/) && last;
$genome .= $_;
}
# opening the output file
if (-e $opt{o}){
$opt{o}=GetNewFile($opt{o},1);
} #end if
while(1){ # open OUT or get new file
$opt{o}=GetNewFile($opt{o},0)
unless (open OUT,">$opt{o}") && last;
} # end while (1)

# creating output file and header
print OUT ">".$species.
"\tMicrobesonline.com genome splitted in genes.\n";
close OUT;

# Extracting genes with one run through ING

for my $start (sort { $a <=> $b } keys %pos){
my $id = $pos{$start};
my $idheader = CreateHeader($id,%ginfo);
my $end = GetEnd($id,%ginfo);

# construct gene
my $gene = substr($genome,$start-1,$end-$start+1);
# cut noncoding part from input
if ($ginfo{$id}{strand} eq "-"){
$gene = reverse($gene);
$gene =~ tr\ATGC\TACG\;

}
# ADDED 6/8 - reverse and transliterate gene if on - strand

AddToFile($idheader,$gene,$opt{o});
} # end for my start over keys %pos
#

close ING;
print qq/Done!\n /;

#-----------------------------------------------------------
# SUBS
#-----------------------------------------------------------

# creates the header for each gene, based on the info from infofile
sub CreateHeader ($\%) {
my($id,$info)=@_;
my %info = %{$info};
my $header = ">VIMSS:".$id."\tname:".$info{$id}{name}.
"\taccession:".$info{$id}{accession}."\tGI:".$info{$id}{GI}.
"\tstart:".$info{$id}{start}."\tstop:".$info{$id}{stop}."\tstrand:".$info{$id}{strand}.
"\tdesc:".$info{$id}{desc}."\tCOG:".$info{$id}{COG}.
"\tCOGFun:".$info{$id}{COGFun}."\tCOGDesc:".$info{$id}{COGDesc}.
"\tGOLIST:".$info{$id}{GO};
return $header;
} # end sub CreateHeader

# returns the first and the last position of the gene in the right order
sub GetEnd ($\%){
my ($id,$info) = @_;
my %info = %{$info};
my $first = $info{$id}{start};
my $last = $info{$id}{stop};
return $last if ($info{$id}{strand} ne "-");
return $first;
} # end sub GetEnd

sub AddToFile ($$$){
my ($header,$gene,$file)=@_;
open ADD,">>$file" or die $!;
print ADD $header."\n";
while (length($gene)>60){
print ADD substr($gene,0,60)."\n";
$gene = substr($gene,60);
}
print ADD $gene,"\n";
close ADD;

}


1;

FishMonger
08-21-2009, 11:53 PM
You should extend the GenExtract.pl script to accommodate the option of processing a single or multiple files.

getopt, getopts - Process single-character switches with switch clustering
http://search.cpan.org/~nwclark/perl-5.8.9/lib/Getopt/Std.pm

Getopt::Long - Extended processing of command line options
http://search.cpan.org/~jv/Getopt-Long-2.38/lib/Getopt/Long.pm


Why are you using prototypes?

Are you under the false impression that they enforce or check the parameter type?

Jolle
08-22-2009, 12:12 AM
You should extend the GenExtract.pl script to accommodate the option of processing a single or multiple files.
strictly spoken, you're right. And actually, I should rewrite a bit more, the code is rather "verbose" :o But it's one of the first scripts I did for my thesis, it's actually my project for the basic course in Perl I had last year. I don't want to put too much time in it, I was just curious if it would be possible. I can do it with the system thing too, this is not intended to be used by many people.


getopt, getopts - Process single-character switches with switch clustering
http://search.cpan.org/~nwclark/perl-5.8.9/lib/Getopt/Std.pm

Getopt::Long - Extended processing of command line options
http://search.cpan.org/~jv/Getopt-Long-2.38/lib/Getopt/Long.pm
Now there's a tip I'm definitely going to use! thanks a lot!

FishMonger
08-22-2009, 12:20 AM
Here's something else you should read.

"Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl"
http://www.perl.com/language/misc/fmproto.html

Jolle
08-22-2009, 01:19 AM
Here's something else you should read.

"Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl"
http://www.perl.com/language/misc/fmproto.html
hehe. I've had that discussion before actually. It's true, that prototyping in Perl doesn't serve much. But I find it easier to make sure I pass the right things to the right sub. And indeed, only reference prototyping is useful for that, all the rest is bollocks. It might complicate the code a bit, but that part of prototyping really makes debugging a bit easier.

Each to his own, I guess.

Jolle
08-22-2009, 12:01 PM
On a side note : I've been sleeping over it, and I believe Tom is leaving out quite some information when he talks about prototyping. it DOES make a huge difference, both in what you get in the routine and what you can do with it. Depending on what you want to do, you even NEED to use reference prototyping in subs :

1) using reference prototyping, you pass a reference. With huge arrays, this can substantially lower the memory use as you don't make an extra copy of the array like without reference prototyping.

2) Another property of this, is that you can change the arrays in the subcall directly. Sometimes useful, sometimes dangerous, but different from without prototyping.

3) if you pass two arrays or an array and a hash into a subroutine without prototyping, you'll have to reconstruct them in your sub. Not easy if you don't know how long each is exactly.

Next program illustrates why prototyping is sometimes even a very good idea :

#!/usr/bin/perl -w
use strict;
sub withref (\@\@);
my @array1 = (1,2,3,4,5);
my @array2 = (12,11,10,9,8,7,6,5);
print "before noref array 1 :",@array1," - array 2 :",@array2,"\n";
noref (@array1,@array2);
print "after noref array 1 :",@array1," - array 2 :",@array2,"\n";
withref (@array1,@array2);
print "after withref array 1 :",@array1," - array 2 :",@array2,"\n";

sub noref {
my $counter = 1;
while ( @_){
my $myref = shift @_;
print "$counter : Next argument in sub noref : ",$myref,"\n";
$myref = "NOREF";
$counter++;
}
return;
}

sub withref (\@\@) {
my $counter = 1;
while ( @_){
my $myref = shift @_;
print "$counter : Next argument in sub withref : ",$myref,"\n";
@{$myref}[2] = "WITHREF";
$counter++;
}
return;
}

The output :

before noref array 1 :12345 - array 2 :12111098765
1 : Next argument in sub noref : 1
2 : Next argument in sub noref : 2
3 : Next argument in sub noref : 3
4 : Next argument in sub noref : 4
5 : Next argument in sub noref : 5
6 : Next argument in sub noref : 12
7 : Next argument in sub noref : 11
8 : Next argument in sub noref : 10
9 : Next argument in sub noref : 9
10 : Next argument in sub noref : 8
11 : Next argument in sub noref : 7
12 : Next argument in sub noref : 6
13 : Next argument in sub noref : 5
after noref array 1 :12345 - array 2 :12111098765
1 : Next argument in sub withref : ARRAY(0x23afb9c)
2 : Next argument in sub withref : ARRAY(0x23afbdc)
after withref array 1 :12WITHREF45 - array 2 :1211WITHREF98765

Quite a difference, no?

FishMonger
08-22-2009, 01:31 PM
You don't need a prototype to pass a reference.


#!/usr/bin/perl

use warnings; # warnings pragma is preferable over the -w switch
use strict;

sub with_prototype (\@\@);

my @array1 = (1,2,3,4,5);
my @array2 = (12,11,10,9,8,7,6,5);

print "before no_prototype array 1 :",@array1," - array 2 :",@array2,"\n";
no_prototype (\@array1,\@array2);
print "\nafter no_prototype array 1 :",@array1," - array 2 :",@array2,"\n";


with_prototype (@array1,@array2);
print "\nafter with_prototype array 1 :",@array1," - array 2 :",@array2,"\n";

sub no_prototype {
my $counter = 1;
while ( @_){
my $myref = shift @_;
print "$counter : Next argument in sub no_prototype : ",$myref,"\n";
$myref = "no_prototype";
$counter++;
}
return;
}

sub with_prototype (\@\@) {
my $counter = 1;
while ( @_){
my $myref = shift @_;
print "$counter : Next argument in sub with_prototype : ",$myref,"\n";
@{$myref}[2] = "with_prototype";
$counter++;
}
return;
}


before no_prototype array 1 :12345 - array 2 :12111098765
1 : Next argument in sub no_prototype : ARRAY(0x182a2b4)
2 : Next argument in sub no_prototype : ARRAY(0x182a2d4)

after no_prototype array 1 :12345 - array 2 :12111098765
1 : Next argument in sub with_prototype : ARRAY(0x182a2b4)
2 : Next argument in sub with_prototype : ARRAY(0x182a2d4)

after with_prototype array 1 :12with_prototype45 - array 2 :1211with_prototype98765

FishMonger
08-22-2009, 02:14 PM
I missed one error in the sub without prototypes.

$myref = "no_prototype";

Should have been the same as it is in the prototype sub.

@{$myref}[2] = "no_prototype";
or more cleanly written as:

$myref->[2] = "no_prototype";

The updated output

before no_prototype array 1 :12345 - array 2 :12111098765
1 : Next argument in sub no_prototype : ARRAY(0x182a2b4)
2 : Next argument in sub no_prototype : ARRAY(0x182a2d4)

after no_prototype array 1 :12no_prototype45 - array 2 :1211no_prototype98765
1 : Next argument in sub with_prototype : ARRAY(0x182a2b4)
2 : Next argument in sub with_prototype : ARRAY(0x182a2d4)

after with_prototype array 1 :12with_prototype45 - array 2 :1211with_prototype98765

Jolle
08-23-2009, 12:09 AM
You don't need a prototype to pass a reference.
True, touche. Guess I'll have to sleep a bit longer on it next time. :) thx for the correction

FishMonger
08-23-2009, 01:06 AM
True, touche. Guess I'll have to sleep a bit longer on it next time. :)

Ya, I wasn't all that much awake either when I posted. There are a couple other corrections that should be made but I left them out because the key issue for me was the use of prototypes.


thx for the correction
Glad I was able to help. :)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum