View Full Version : Having a perl / grep problem
momo17
08-09-2006, 03:44 PM
Hello... below is a sample script that basically intersects two arrays.. each array may or may not contain different elements.... the code works EXCEPT when the string contains parentheses. What can I do to get around this???
If you run the script, you will see that the there will be two "111(2)" in the final array:
###############################################
my @tmp;
my $j;
my @result;
push( @tmp,'123');
push( @tmp,'124');
push( @tmp,'125');
push( @tmp,'126');
push( @tmp,'111(2)');
push( @tmp,'127');
foreach $j (@tmp) {
unless( grep(/^$j$/,@result)) {
push(@result,$j);
}
}
print "Result contains @result\nonly new elements will be inserted\n";
my @tmp2;
push( @tmp2,'123Z');
push( @tmp2,'124D');
push( @tmp2,'125B');
push( @tmp2,'126');
push( @tmp2,'111(2)');
push( @tmp2,'127DD');
foreach $j (@tmp2) {
unless( grep(/^$j$/,@result)) {
print "Inserting $j \n";
push(@result,$j);
} else {
print "Not inserting $j - already existed\n";
}
}
print "Final array shouldn't have duplicates:\n array is @result\n\n";
################################################## #####
TIA.
Mike
FishMonger
08-09-2006, 04:48 PM
Use a hash instead of the array.
my %result = map {$_,1} qw(123 124 125 126 111(2) 127);
my @result = sort keys %result;
print "Result contains @result\nonly new elements will be inserted\n";
foreach (qw(123Z 1234D 125B 126 111(2) 127DD)) {
if (! $result{$_}) {
print "Inserting $_ \n";
$result{$_}++;
}
else {
print "Not inserting $_ - already existed\n";
}
}
KevinADC
08-09-2006, 06:30 PM
I agree with Fish, use a hash, but the explanation of your problem lies in the use of the regexp:
/^$j$/
the element 111() has meta characters in it. the parenthesis (). The regexp is interpreting them in meta context, not literal context. It looks like this to the regexp:
/^111()$/
perl thinks the () are looking for something to store in pattern memory, it really only treats 111 as a literal part of the string. If you change your code to this you will see that only 111 is being evaluated by the regexp:
my @tmp;
my $j;
my @result;
push( @tmp,'123');
push( @tmp,'124');
push( @tmp,'125');
push( @tmp,'126');
push( @tmp,'111');
push( @tmp,'127');
foreach $j (@tmp) {
unless( grep(/^$j$/,@result)) {
push(@result,$j);
}
}
print "Result contains @result\nonly new elements will be inserted\n";
my @tmp2;
push( @tmp2,'123');
push( @tmp2,'124D');
push( @tmp2,'125B');
push( @tmp2,'111()');
push( @tmp2,'127DD');
push( @tmp2,'126');
foreach $j (@tmp2) {
unless( grep(/^$j$/,@result)) {
print "Inserting $j \n";
push(@result,$j);
}
else {
print "Not inserting $j - already existed\n";
}
}
print "Final array shouldn't have duplicates:\n array is @result\n\n";
output:
Result contains 123 124 125 126 111 127
only new elements will be inserted
Not inserting 123 - already existed
Inserting 124D
Inserting 125B
Not inserting 111() - already existed
Inserting 127DD
Not inserting 126 - already existed
Final array shouldn't have duplicates:
array is 123 124 125 126 111 127 124D 125B 127DD
you can see, 111 and 111() evaluate to the same in the regexp you are using. You really should not be using a regexp to evaluate simple strings, you should be using the string comparison operators:
eq - equal
ne - not equal
gt - greater than (b is greater than a)
lt - less than (c is less than d)
unless( grep($_ eq $j,@result)) {
or you could use the \Q...\E operators to escape all meta characters in the search pattern of a regexp:
unless( grep(/^\Q$j\E$/,@result)) {
but that is overkill when all you are working with is simple strings like in your sample code. But your code also needlessly loops through the arrays too many times because grep evaluates all elements of a list everytime you use it. Using a method like FishMonger posted is better and more efficient, especially for large lists.
momo17
08-10-2006, 08:54 PM
thanks for the input and explanation.... will probably go with the hash technique but what exactly is going on with this one:
->unless( grep($_ eq $j,@result)) {
how is that being processed?
Mike
KevinADC
08-11-2006, 05:46 AM
unless( grep($_ eq $j,@result)) {
grep evaluates each element of the list (@result). Each element of the list is stored in the system scalar ($_) when using grep. So it's checking if each element of @result is equal to $j.
KevinADC
08-11-2006, 05:47 AM
I see you FishMonger..... not using invisible mode anymore?
FishMonger
08-11-2006, 06:11 AM
Are you keeping tabs on me? :o
I didn't realize I turned that feature off...I better turn it back on.
KevinADC
08-11-2006, 06:28 AM
Are you keeping tabs on me? :o
I didn't realize I turned that feature off...I better turn it back on.
hehehe.... You know you've made it when you have a stalker. :eek: :D
I was just surprised to see the green dot by your name mate. ;)
vBulletin® v3.8.2, Copyright ©2000-2010, Jelsoft Enterprises Ltd.