...

View Full Version : help with de-bugging perl script



roro
04-16-2009, 04:47 PM
I have written a perl script, and need help to debug it. I am pretty new to Perl, and therefore do not know quite where I have gone wrong. Any help would be MUCH appreciated. The code will be put in below.

Essentially, it will first read in two protein sequences from an alignment file. It will then go to a subroutine "sub readBlosum", which will read in a scoring matrix from a file, with Z being one axis and X being the other, to get a value for each pair of protein residues. This subroutine will then score the sequence against the matrix. Finally, a bunch of calculations will be done, and then it should spit out a number at the end. However, my Perl debugger is giving me errors, and as I am new to this, I am not quite sure what is wrong.



#!/usr/bin/perl -w

use warnings;
use strict;

my $protein1 = ("");
my $protein2 = ("");
my @seq1 = ("");
my @seq2 = ("");
my @line = ("");
my @seq_line= ("");
my $readBLOSUM = "blosum62.txt";
#parse sequences by residues
while (<STDIN>)

{ if ($_ =~ m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g)
{$protein1=$2;}
elsif ($_ =~ m/^(Hs[1-9]{8})(\S+)(.*)/g)
{$protein2 = $2;}



if ($_ =~ /(^\|*|.|:)/n)
{null;}
else
{
@line = split($_,/\s+/); # split the sequence and sequence length
#number parts of the line to an array with two entries
@seq_line = split($line[1],//); # put sequence to an array by character
}

if ($line[0]=m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g)
{@seq_line-> @seq1}
elsif ($line[0] = m/^(Hs[1-9]{8})(\S+)(.*)/g)
{@seq_line-> @seq2}



readBlosum;


sub readBlosum{
open BLOSUM , "blosum62" or die "Unable to open inputFile.";
my $line = ;
my @acidIndices = split(/ /, $line);
my %blosumMatrix = ();

while () {
my($line) = $_;
my @scores = split(/ /, $line);
my $currAcid = $scores[0];
for ( my $i = 1; $i < scalar @scores; $i++) {
$blosumMatrix{$currAcid}{$acidIndices[$i]} = $scores[$i];

}

close BLOSUM;
return \%blosumMatrix;
}

my $blosumMat = &readBlosum();
$blosumMat->{Z}->{X};

#calculate weights of sequence 1/sequence 2 alignment and sum
#S1 is the similarity score

total;
lengthseq;
expscore;
s1norm;
unrelscore;
unrelnorm;
rawdist;
scoredist;



$scoremat=$blosumMat;

sub total {my ($s1, $scoremat);
my $s1 = 0;
foreach $scoremat {
$s1+= $scoremat;}
return ($s1);}



#calculate length
sub lengthseq{
$length = @seq1 ;
$len=length($length);}


#calculate expected score
sub expscore {$expscore= (-0.52) * $len}

#calculate Normalized score
sub s1norm {$s1norm= $s1 - $expscore}

#calculate unrelated score
sub unrelscore {$unrelscore=(([7769 + 9529)/2)}

#calculate normalized unrelated score
sub unrelnorm {$unrelnorm=$unrelscore-$s1norm}

#calculate raw distance
sub rawdist {$rawdist=-ln($s1norm/$unrelscore)*100}

#calculate scoredist estimate
sub scoredist {$scoredist=1.1775*$rawdist}
print $scoredist;

oesxyl
04-16-2009, 05:15 PM
there are few problems but first you don't close some '}', one to the while and one when you define readBlosum sub.


#!/usr/bin/perl
use strict;
use warnings;

my $readBLOSUM = "blosum62.txt";

my $protein1 = ("");
my $protein2 = ("");
my @seq1 = ("");
my @seq2 = ("");
my @line = ("");
my @seq_line= ("");

#parse sequences by residues
while(<STDIN>){
if($_ =~ m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g){
$protein1=$2;
}elsif($_ =~ m/^(Hs[1-9]{8})(\S+)(.*)/g){
$protein2 = $2;
}

if($_ =~ /(^\|*|.|:)/n){
null;
}else{
@line = split($_,/\s+/); # split the sequence and sequence length
#number parts of the line to an array with two entries
@seq_line = split($line[1],//); # put sequence to an array by character
}

if($line[0]=m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g){
@seq_line-> @seq1
}elsif($line[0] = m/^(Hs[1-9]{8})(\S+)(.*)/g){
@seq_line-> @seq2
}
}

readBlosum;

sub readBlosum{
open BLOSUM , "blosum62" or die "Unable to open inputFile.";
my $line = ;
my @acidIndices = split(/ /, $line);
my %blosumMatrix = ();

while(){
my($line) = $_;
my @scores = split(/ /, $line);
my $currAcid = $scores[0];
for(my $i = 1; $i < scalar @scores; $i++){
$blosumMatrix{$currAcid}{$acidIndices[$i]} = $scores[$i];
}

close BLOSUM;
return \%blosumMatrix;
}
}

my $blosumMat = &readBlosum();
$blosumMat->{Z}->{X};

#calculate weights of sequence 1/sequence 2 alignment and sum
#S1 is the similarity score

total;
lengthseq;
expscore;
s1norm;
unrelscore;
unrelnorm;
rawdist;
scoredist;

$scoremat=$blosumMat;

sub total {
my ($s1, $scoremat);
my $s1 = 0;
foreach $scoremat {
$s1+= $scoremat;
}
return ($s1);
}


#calculate length
sub lengthseq {
$length = @seq1 ;
$len=length($length);
}

#calculate expected score
sub expscore {
$expscore= (-0.52) * $len;
}

#calculate Normalized score
sub s1norm {
$s1norm= $s1 - $expscore;
}

#calculate unrelated score
sub unrelscore {
$unrelscore = ((7769 + 9529)/2);
}

#calculate normalized unrelated score
sub unrelnorm {
$unrelnorm=$unrelscore-$s1norm;
}

#calculate raw distance
sub rawdist {
$rawdist=-ln($s1norm/$unrelscore)*100;
}

#calculate scoredist estimate
sub scoredist {
$scoredist=1.1775*$rawdist;
}

print $scoredist;

I didn't run this script, just indented, please post blosum62.txt or some similar data.

best regards

roro
04-16-2009, 05:30 PM
.....

oesxyl
04-16-2009, 06:03 PM
Attached is BLOSUM62.txt. I would really appreciate it if you could help me get this running.

Thanks a lot.
you get data from two sources or I'm wrong. First is from stdin and second from blosum62.txt which is a matrix.
I'm not familiar with this domain, can you also provide some input example from stdin?

best regards

roro
04-16-2009, 06:09 PM
.....

oesxyl
04-16-2009, 06:29 PM
Sorry, here is the sequence file! stdin would be this file being read into the script from command line.
I try to see how it work and come back with results.
I hope this is not a homework assignment, :)

best regards

roro
04-16-2009, 06:31 PM
.....

Shannon Blonk
04-17-2009, 07:56 PM
Did anything ever happen with this?

Getting the input working is easy, but I never could figure out how the calculations were intended to work.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum