Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New to the CF scene
    Join Date
    Apr 2009
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts

    help with de-bugging perl script

    I have written a perl script, and need help to debug it. I am pretty new to Perl, and therefore do not know quite where I have gone wrong. Any help would be MUCH appreciated. The code will be put in below.

    Essentially, it will first read in two protein sequences from an alignment file. It will then go to a subroutine "sub readBlosum", which will read in a scoring matrix from a file, with Z being one axis and X being the other, to get a value for each pair of protein residues. This subroutine will then score the sequence against the matrix. Finally, a bunch of calculations will be done, and then it should spit out a number at the end. However, my Perl debugger is giving me errors, and as I am new to this, I am not quite sure what is wrong.

    Code:
    #!/usr/bin/perl -w
    
    use warnings;
    use strict;
    
    my $protein1 = ("");
    my $protein2 = ("");
    my @seq1 = ("");
    my @seq2 = ("");
    my @line = ("");
    my @seq_line= ("");
    my $readBLOSUM = "blosum62.txt";
    #parse sequences by residues
    while (<STDIN>)
    
           { if ($_ =~ m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g)
             {$protein1=$2;}
            elsif ($_ =~ m/^(Hs[1-9]{8})(\S+)(.*)/g)
                {$protein2 = $2;}       
                
                
                    
            if ($_ =~ /(^\|*|.|:)/n)
                    {null;}
            else
                    {
                    @line = split($_,/\s+/);                  # split the sequence and sequence length
    #number parts of the line to an array with two entries
                    @seq_line = split($line[1],//);     # put sequence to an array by character
                    }
           
           if ($line[0]=m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g)
                {@seq_line-> @seq1}
             elsif ($line[0] =  m/^(Hs[1-9]{8})(\S+)(.*)/g)
                {@seq_line-> @seq2}
                
                
                
    readBlosum;            
        
        
        sub readBlosum{
    open BLOSUM , "blosum62" or die "Unable to open inputFile.";
    my $line = ;
    my @acidIndices = split(/ /, $line);
    my %blosumMatrix = ();
    
    while () {
    my($line) = $_;
    my @scores = split(/ /, $line);
    my $currAcid = $scores[0];
    for ( my $i = 1; $i < scalar @scores; $i++) {
    $blosumMatrix{$currAcid}{$acidIndices[$i]} = $scores[$i];
    
    }
    
    close BLOSUM;
    return \%blosumMatrix;
    }
    
    my $blosumMat = &readBlosum();
    $blosumMat->{”Z”}->{”X”};
    
    #calculate weights of sequence 1/sequence 2 alignment and sum
    #S1 is the similarity score
    
     total;
     lengthseq;
    expscore;
    s1norm;
    unrelscore;
    unrelnorm;
    rawdist;
    scoredist;
    
    
    
    $scoremat=$blosumMat;
    
    sub total {my ($s1, $scoremat);
    		my $s1 = 0;
    		foreach $scoremat {
    		$s1+= $scoremat;}
    		return ($s1);}
    				
    			   
    
    #calculate length
    sub lengthseq{
    	$length = @seq1 ;
    	$len=length($length);}
    	
    
    #calculate expected score
    sub expscore {$expscore= (-0.52) * $len}
    
    #calculate Normalized score
    sub s1norm {$s1norm= $s1 - $expscore}
    
    #calculate unrelated score
    sub unrelscore {$unrelscore=(([7769 + 9529)/2)}
    
    #calculate normalized unrelated score
    sub unrelnorm {$unrelnorm=$unrelscore-$s1norm}
    
    #calculate raw distance
    sub rawdist {$rawdist=-ln($s1norm/$unrelscore)*100}
    
    #calculate scoredist estimate
    sub scoredist {$scoredist=1.1775*$rawdist}
    print $scoredist;

  • #2
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    there are few problems but first you don't close some '}', one to the while and one when you define readBlosum sub.
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    my $readBLOSUM = "blosum62.txt";
    
    my $protein1 = ("");
    my $protein2 = ("");
    my @seq1 = ("");
    my @seq2 = ("");
    my @line = ("");
    my @seq_line= ("");
    
    #parse sequences by residues
    while(<STDIN>){
        if($_ =~ m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g){
    	$protein1=$2;
        }elsif($_ =~ m/^(Hs[1-9]{8})(\S+)(.*)/g){
    	$protein2 = $2;
        }
    
        if($_ =~ /(^\|*|.|:)/n){
    	null;
        }else{
    	@line = split($_,/\s+/);                  # split the sequence and sequence length
    #number parts of the line to an array with two entries
    	@seq_line = split($line[1],//);     # put sequence to an array by character
        }
    
        if($line[0]=m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g){
    	@seq_line-> @seq1
        }elsif($line[0] =  m/^(Hs[1-9]{8})(\S+)(.*)/g){
    	@seq_line-> @seq2
        }
    }
    
    readBlosum;
    
    sub readBlosum{
        open BLOSUM , "blosum62" or die "Unable to open inputFile.";
        my $line = ;
        my @acidIndices = split(/ /, $line);
        my %blosumMatrix = ();
    
        while(){
    	my($line) = $_;
    	my @scores = split(/ /, $line);
    	my $currAcid = $scores[0];
    	for(my $i = 1; $i < scalar @scores; $i++){
    	    $blosumMatrix{$currAcid}{$acidIndices[$i]} = $scores[$i];
    	}
    
    	close BLOSUM;
    	return \%blosumMatrix;
        }
    }
    
    my $blosumMat = &readBlosum();
    $blosumMat->{’¡ÉZ’¡É}->{’¡ÉX’¡É};
    
    #calculate weights of sequence 1/sequence 2 alignment and sum
    #S1 is the similarity score
    
    total;
    lengthseq;
    expscore;
    s1norm;
    unrelscore;
    unrelnorm;
    rawdist;
    scoredist;
    
    $scoremat=$blosumMat;
    
    sub total {
        my ($s1, $scoremat);
        my $s1 = 0;
        foreach $scoremat {
    	$s1+= $scoremat;
        }
        return ($s1);
    }
    
    
    #calculate length
    sub lengthseq {
        $length = @seq1 ;
        $len=length($length);
    }
    
    #calculate expected score
    sub expscore {
        $expscore= (-0.52) * $len;
    }
    
    #calculate Normalized score
    sub s1norm {
        $s1norm= $s1 - $expscore;
    }
    
    #calculate unrelated score
    sub unrelscore {
        $unrelscore = ((7769 + 9529)/2);
    }
    
    #calculate normalized unrelated score
    sub unrelnorm {
        $unrelnorm=$unrelscore-$s1norm;
    }
    
    #calculate raw distance
    sub rawdist {
        $rawdist=-ln($s1norm/$unrelscore)*100;
    }
    
    #calculate scoredist estimate
    sub scoredist {
        $scoredist=1.1775*$rawdist;
    }
    
    print $scoredist;
    I didn't run this script, just indented, please post blosum62.txt or some similar data.

    best regards

  • #3
    New to the CF scene
    Join Date
    Apr 2009
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    .....
    Last edited by roro; 04-16-2009 at 11:27 PM.

  • #4
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by roro View Post
    Attached is BLOSUM62.txt. I would really appreciate it if you could help me get this running.

    Thanks a lot.
    you get data from two sources or I'm wrong. First is from stdin and second from blosum62.txt which is a matrix.
    I'm not familiar with this domain, can you also provide some input example from stdin?

    best regards

  • #5
    New to the CF scene
    Join Date
    Apr 2009
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    .....
    Last edited by roro; 04-16-2009 at 11:26 PM.

  • #6
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by roro View Post
    Sorry, here is the sequence file! stdin would be this file being read into the script from command line.
    I try to see how it work and come back with results.
    I hope this is not a homework assignment,

    best regards

  • Users who have thanked oesxyl for this post:

    roro (04-16-2009)

  • #7
    New to the CF scene
    Join Date
    Apr 2009
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    .....
    Last edited by roro; 04-16-2009 at 11:24 PM.

  • #8
    New Coder
    Join Date
    Mar 2009
    Location
    Fabric Covered Box
    Posts
    69
    Thanks
    1
    Thanked 16 Times in 14 Posts
    Did anything ever happen with this?

    Getting the input working is easy, but I never could figure out how the calculations were intended to work.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •