Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 14 of 14
  1. #1
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post

    Question Call perl script in loop without recompiling

    Hi all,

    I have two perl scripts :

    The first takes 2 files as an input, processes them and puts the output in a third file.

    The second takes one directory as an input, and searches all subdirectories for input files, defines the output file and then calls the first script to process the whole thing.

    Now, I loop through all subdirectories with this loop :
    Code:
    foreach (@subdirs){
           foreach (@dirfiles){
                 m/(^file1.+)/ and $opt1 = $1;
                 m/(^file2.+)/ and $opt2 = $1;
           }
           system ("perl -S script2.pl",$opt1,$opt2);
    }
    This works, but obviously script2.pl will be recompiled again every time it's called. That's rather clumsy, so I want to optimize this. I cannot combine both scripts in one file, as I need the other script seperately too. Any idea about how to call that script in such a way it is compiled only once, and then takes different arguments each time in the loop?

    I've been looking at backticks, eval{}, do{}, require{}, but I didn't find much info on when the code is compiled, or how to pass different arguments each time. They should end up in the @ARGV of script2.pl.


    Anybody an idea? Thank you in advance

  • #2
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    Convert script2.pl into a module and then load it into your other script via a 'use' or 'require' statement.

  • #3
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    If you want to show us both scripts, we should be able to show you how to do it more efficiently.

  • #4
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by FishMonger View Post
    Convert script2.pl into a module and then load it into your other script via a 'use' or 'require' statement.
    That would be an option if I didn't need script2.pl on its own as well regularly. If I make it a module, I have to write a third script to run the only subroutine in the module script2.pm. That's quite a way around.

    There is some trick I've seen before, but I can't find it anymore.

  • #5
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    The two scripts. JMTools is a module with some subs I use quite often, like GetNewFile (user interface when a file is not valid or exists already), GetDir (changes to that dir and globs the second argument. Implements GetNewFile is dir does not work, that's why it returns dir as well)

    Script 1 :
    Code:
    #!/usr/bin/perl -w
    use strict;
    our $directory;
    BEGIN {
        use File::Spec::Functions qw(rel2abs);
        use File::Basename qw (dirname);
        my $path = rel2abs ($0);
        $directory = dirname ($path)
    }
    use lib $directory;
    use JMTools qw(:DEFAULT :DIRS);
    #-----------------------------------------------------------
    # Initialization
    #-----------------------------------------------------------
    my $dir;
    my @subdirs;
    my $help = qq/
    BatchExtract help
    ------------------
    BatchExtract uses GenExtract to extract the genes of all sets found in
    the specified directory. Each set should be in a seperate directory, with
    the species name as directory name. Each directory should contain at least
    a genomeInfo.txt file, a genomeFASTA.txt file and a proteomeFASTA.txt file.
    After processing, the gene files and protein files are saved in seperate
    directories, specified in JMTools.
    
    use : perl BatchExtract.pl [<dirname>]
    If <dirname> is not specified, the default GENOME_DIR from JMTools is used.
    /;
    
    $dir = shift @ARGV or $dir=$GENOME_DIR;
    ($dir,@subdirs) = GetDir($dir,"*") unless ($dir eq"-h") and die $help;
    
    #-----------------------------------------------------------
    # MAIN
    #-----------------------------------------------------------
    foreach (@subdirs){
        my $g;
        my $i;
        my $o = $GENE_DIR.$_."genes.txt"; #adapt if necessary
        (-d $_) or print "$_ is not a known directory\n" && next;
        chdir $_ or print "Could not change to $_\n" && next;
        opendir DIR, "." or print "Couldn't open $_\n" && next;
        my @dirfiles = glob("*.txt");
        closedir DIR;
        foreach (@dirfiles){
            if($_ =~ /(.+genomeFASTA.txt)/){$g=$1};
            if($_ =~ /(.+genomeInfo.txt)/ ){$i=$1};
        }
        my $param = "g ".$g." i ".$i." o ".$o;
        print $param."\n";
        system("perl -S GenExtract.pl $param");
        chdir "..";
    }
    1;
    Script 2 :
    Code:
    #!/usr/bin/perl -w
    use strict;
    our $directory;
    BEGIN {
        use File::Spec::Functions qw(rel2abs);
        use File::Basename qw (dirname);
        my $path = rel2abs ($0);
        $directory = dirname ($path)
    }
    use lib $directory;
    use JMTools;
    sub CreateHeader($\%);
    sub GetEnd($\%);
    sub AddToFile($$$);
    #-----------------------------------------------------------
    # Initialization
    #-----------------------------------------------------------
    my $help =qq/
    GenExtract extracts the genes from genome files originating from
    MicrobesOnline.org. It needs as input the genomeFASTA file and the info file.
    Determine following parameters :
        g filename_genome.ext
        i filename_infosheet.ext
        o filename_out.ext
    ex: GenExtract.pl g genome.txt i info.txt o genes.txt\n/;
    
    ###### READING ARGV
    
    (@ARGV % 2) == 0 or die $help;
    my %opt = @ARGV;
    defined $opt{i} &&
        defined $opt{g} &&
        defined $opt{o} or die $help;
    
    # echo set parameters
    print qq/\n Following files were specified :
            genome file in: $opt{g}
            info file in :  $opt{i}
            genes file out: $opt{o}\n
    Big files might take a minute...\n\n/;
    
    #-----------------------------------------------------------
    # MAIN
    #-----------------------------------------------------------
    # Read in gene info file
    while(1){ # Open INI or get new file
        $opt{i}=GetNewFile($opt{i},0)
            unless (open INI,"<$opt{i}") && last;
    } # end while (1)
    
    # Creating keys for gene info
    my $labels = <INI>;
    chomp $labels;
    my @Labels = split /\t/,$labels;
    shift(@Labels) =~ /locusid/i or die "info file not valid\n"; #locusID is not a key for the data-hash
    
    # Extracting gene info from file
    my %ginfo; # hash/key=locusID with hashes/key=@Labels to store gene info
    my %pos; # hash/key=start position, value=ID for scrolling through genome seq
    my $lastpos = 0; # keeps the previous read-in position
        while (my $data=<INI>){
            chomp $data;
            my @Data = split /\t/,$data;
            my $locusID = shift @Data;
            for (0..$#Labels){  # assign ginfo for each label or define not available
                $ginfo{$locusID}{$Labels[$_]}=$Data[$_]||"NA"; 
            } # end for 0..Labels
            if ($ginfo{$locusID}{start} < $lastpos-1000) {last};
            # assign start pos seen in + sense for both + and - strands
            if ($ginfo{$locusID}{strand} eq "-"){
                $lastpos = $ginfo{$locusID}{stop};
                $pos{$lastpos} = $locusID;
            } else{
                $lastpos = $ginfo{$locusID}{start};
                $pos{$lastpos}= $locusID;
            } # end if
        } # end while INI
    close INI;
    
    ###### GENOME GENE EXTRACTION
    
    while(1){ # open ING or get new file
        $opt{g}=GetNewFile($opt{g},0)
            unless (open ING,"<$opt{g}") && last;
    } # end while (1)
    
    # reading species name from genome-file
    my $species=<ING>;
        if($species =~ /([a-z]{4,}.*)\sNCBI\s.*ID\s(\d{4,})/i){
            $species="ID:".$2."\tname:".$1;
            
        } else {
            $species="Species not known";
        } # end else
    # reading in the genome at once
    my $genome ="";
    while (<ING>){
        chomp $_ unless ($_ =~/>/) && last;
        $genome .= $_;
    }
    # opening the output file
    if (-e $opt{o}){
        $opt{o}=GetNewFile($opt{o},1);
    } #end if
    while(1){ # open OUT or get new file
        $opt{o}=GetNewFile($opt{o},0)
            unless (open OUT,">$opt{o}") && last;
    } # end while (1)
    
    # creating output file and header
    print OUT ">".$species.
                "\tMicrobesonline.com genome splitted in genes.\n";
    close OUT;
    
    # Extracting genes with one run through ING
    
    for my $start (sort { $a <=> $b } keys %pos){
        my $id = $pos{$start};
        my $idheader = CreateHeader($id,%ginfo);
        my $end = GetEnd($id,%ginfo); 
    
        # construct gene
        my $gene = substr($genome,$start-1,$end-$start+1);
                # cut noncoding part from input
        if ($ginfo{$id}{strand} eq "-"){
            $gene = reverse($gene);
            $gene =~ tr\ATGC\TACG\;
            
        }
                # ADDED 6/8 - reverse and transliterate gene if on - strand
        
        AddToFile($idheader,$gene,$opt{o});
    } # end for my start over keys %pos
    #    
    
    close ING;
    print qq/Done!\n /;
    
    #-----------------------------------------------------------
    # SUBS
    #-----------------------------------------------------------
    
    # creates the header for each gene, based on the info from infofile
    sub CreateHeader ($\%) {
        my($id,$info)=@_;
        my %info = %{$info};
        my $header = ">VIMSS:".$id."\tname:".$info{$id}{name}.
            "\taccession:".$info{$id}{accession}."\tGI:".$info{$id}{GI}.
            "\tstart:".$info{$id}{start}."\tstop:".$info{$id}{stop}."\tstrand:".$info{$id}{strand}.
            "\tdesc:".$info{$id}{desc}."\tCOG:".$info{$id}{COG}.
            "\tCOGFun:".$info{$id}{COGFun}."\tCOGDesc:".$info{$id}{COGDesc}.
            "\tGOLIST:".$info{$id}{GO};
        return $header;
    } # end sub CreateHeader
    
    # returns the first and the last position of the gene in the right order 
    sub GetEnd ($\%){
        my ($id,$info) = @_;
        my %info = %{$info};
        my $first = $info{$id}{start};
        my $last = $info{$id}{stop};
        return $last if ($info{$id}{strand} ne "-");
        return $first;
    } # end sub GetEnd
    
    sub AddToFile ($$$){
        my ($header,$gene,$file)=@_;
        open ADD,">>$file" or die $!;
        print ADD $header."\n";
        while (length($gene)>60){
            print ADD substr($gene,0,60)."\n";
            $gene = substr($gene,60);
        }
        print ADD $gene,"\n";
        close ADD;
        
    }
    
    
    1;

  • #6
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    You should extend the GenExtract.pl script to accommodate the option of processing a single or multiple files.

    getopt, getopts - Process single-character switches with switch clustering
    http://search.cpan.org/~nwclark/perl.../Getopt/Std.pm

    Getopt::Long - Extended processing of command line options
    http://search.cpan.org/~jv/Getopt-Lo...Getopt/Long.pm


    Why are you using prototypes?

    Are you under the false impression that they enforce or check the parameter type?

  • Users who have thanked FishMonger for this post:

    Jolle (08-22-2009)

  • #7
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by FishMonger View Post
    You should extend the GenExtract.pl script to accommodate the option of processing a single or multiple files.
    strictly spoken, you're right. And actually, I should rewrite a bit more, the code is rather "verbose" But it's one of the first scripts I did for my thesis, it's actually my project for the basic course in Perl I had last year. I don't want to put too much time in it, I was just curious if it would be possible. I can do it with the system thing too, this is not intended to be used by many people.

    getopt, getopts - Process single-character switches with switch clustering
    http://search.cpan.org/~nwclark/perl.../Getopt/Std.pm

    Getopt::Long - Extended processing of command line options
    http://search.cpan.org/~jv/Getopt-Lo...Getopt/Long.pm
    Now there's a tip I'm definitely going to use! thanks a lot!

  • #8
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    Here's something else you should read.

    "Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl"
    http://www.perl.com/language/misc/fmproto.html

  • #9
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by FishMonger View Post
    Here's something else you should read.

    "Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl"
    http://www.perl.com/language/misc/fmproto.html
    hehe. I've had that discussion before actually. It's true, that prototyping in Perl doesn't serve much. But I find it easier to make sure I pass the right things to the right sub. And indeed, only reference prototyping is useful for that, all the rest is bollocks. It might complicate the code a bit, but that part of prototyping really makes debugging a bit easier.

    Each to his own, I guess.

  • #10
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    On a side note : I've been sleeping over it, and I believe Tom is leaving out quite some information when he talks about prototyping. it DOES make a huge difference, both in what you get in the routine and what you can do with it. Depending on what you want to do, you even NEED to use reference prototyping in subs :

    1) using reference prototyping, you pass a reference. With huge arrays, this can substantially lower the memory use as you don't make an extra copy of the array like without reference prototyping.

    2) Another property of this, is that you can change the arrays in the subcall directly. Sometimes useful, sometimes dangerous, but different from without prototyping.

    3) if you pass two arrays or an array and a hash into a subroutine without prototyping, you'll have to reconstruct them in your sub. Not easy if you don't know how long each is exactly.

    Next program illustrates why prototyping is sometimes even a very good idea :
    Code:
    #!/usr/bin/perl -w
    use strict;
    sub withref (\@\@);
    my @array1 = (1,2,3,4,5);
    my @array2 = (12,11,10,9,8,7,6,5);
    print "before noref array 1 :",@array1," - array 2 :",@array2,"\n";
    noref (@array1,@array2);
    print "after noref array 1 :",@array1," - array 2 :",@array2,"\n";
    withref (@array1,@array2);
    print "after withref array 1 :",@array1," - array 2 :",@array2,"\n";
    
    sub noref {
        my $counter = 1;
            while ( @_){
            my $myref = shift @_;
            print "$counter : Next argument in sub noref : ",$myref,"\n";
            $myref = "NOREF";
            $counter++;
        }
        return;
    }
    
    sub withref (\@\@) {
        my $counter = 1;
        while ( @_){
            my $myref = shift @_;
            print "$counter : Next argument in sub withref : ",$myref,"\n";
            @{$myref}[2] = "WITHREF";
            $counter++;
        }
        return;
    }
    The output :
    before noref array 1 :12345 - array 2 :12111098765
    1 : Next argument in sub noref : 1
    2 : Next argument in sub noref : 2
    3 : Next argument in sub noref : 3
    4 : Next argument in sub noref : 4
    5 : Next argument in sub noref : 5
    6 : Next argument in sub noref : 12
    7 : Next argument in sub noref : 11
    8 : Next argument in sub noref : 10
    9 : Next argument in sub noref : 9
    10 : Next argument in sub noref : 8
    11 : Next argument in sub noref : 7
    12 : Next argument in sub noref : 6
    13 : Next argument in sub noref : 5
    after noref array 1 :12345 - array 2 :12111098765
    1 : Next argument in sub withref : ARRAY(0x23afb9c)
    2 : Next argument in sub withref : ARRAY(0x23afbdc)
    after withref array 1 :12WITHREF45 - array 2 :1211WITHREF98765
    Quite a difference, no?
    Last edited by Jolle; 08-22-2009 at 12:03 PM.

  • #11
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    You don't need a prototype to pass a reference.

    Code:
    #!/usr/bin/perl
    
    use warnings;  # warnings pragma is preferable over the -w switch
    use strict;
    
    sub with_prototype (\@\@);
    
    my @array1 = (1,2,3,4,5);
    my @array2 = (12,11,10,9,8,7,6,5);
    
    print "before no_prototype array 1 :",@array1," - array 2 :",@array2,"\n";
    no_prototype (\@array1,\@array2);
    print "\nafter no_prototype array 1 :",@array1," - array 2 :",@array2,"\n";
    
    
    with_prototype (@array1,@array2);
    print "\nafter with_prototype array 1 :",@array1," - array 2 :",@array2,"\n";
    
    sub no_prototype {
        my $counter = 1;
            while ( @_){
            my $myref = shift @_;
            print "$counter : Next argument in sub no_prototype : ",$myref,"\n";
            $myref = "no_prototype";
            $counter++;
        }
        return;
    }
    
    sub with_prototype (\@\@) {
        my $counter = 1;
        while ( @_){
            my $myref = shift @_;
            print "$counter : Next argument in sub with_prototype : ",$myref,"\n";
            @{$myref}[2] = "with_prototype";
            $counter++;
        }
        return;
    }
    before no_prototype array 1 :12345 - array 2 :12111098765
    1 : Next argument in sub no_prototype : ARRAY(0x182a2b4)
    2 : Next argument in sub no_prototype : ARRAY(0x182a2d4)

    after no_prototype array 1 :12345 - array 2 :12111098765
    1 : Next argument in sub with_prototype : ARRAY(0x182a2b4)
    2 : Next argument in sub with_prototype : ARRAY(0x182a2d4)

    after with_prototype array 1 :12with_prototype45 - array 2 :1211with_prototype98765

  • #12
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    I missed one error in the sub without prototypes.
    Code:
    $myref = "no_prototype";
    Should have been the same as it is in the prototype sub.
    Code:
    @{$myref}[2] = "no_prototype";
    or more cleanly written as:
    Code:
    $myref->[2] = "no_prototype";
    The updated output
    before no_prototype array 1 :12345 - array 2 :12111098765
    1 : Next argument in sub no_prototype : ARRAY(0x182a2b4)
    2 : Next argument in sub no_prototype : ARRAY(0x182a2d4)

    after no_prototype array 1 :12no_prototype45 - array 2 :1211no_prototype98765
    1 : Next argument in sub with_prototype : ARRAY(0x182a2b4)
    2 : Next argument in sub with_prototype : ARRAY(0x182a2d4)

    after with_prototype array 1 :12with_prototype45 - array 2 :1211with_prototype98765
    Last edited by FishMonger; 08-22-2009 at 02:17 PM.

  • #13
    New Coder
    Join Date
    Aug 2009
    Location
    in front of the keyboard
    Posts
    17
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by FishMonger View Post
    You don't need a prototype to pass a reference.
    True, touche. Guess I'll have to sleep a bit longer on it next time. thx for the correction

  • #14
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    Quote Originally Posted by Jolle View Post
    True, touche. Guess I'll have to sleep a bit longer on it next time.
    Ya, I wasn't all that much awake either when I posted. There are a couple other corrections that should be made but I left them out because the key issue for me was the use of prototypes.

    Quote Originally Posted by Jolle View Post
    thx for the correction
    Glad I was able to help.


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •