View Full Version : string comparison
mikkojay
11-11-2005, 06:01 AM
Hello,
This is my first post here-
I am very new to perl, but have been coding for a number of years. At work I have a program that I have coded in Cobol, VB, and c++, and I am now trying to get it to work in perl as a learning exercise.
Here is the gist:
1) I call an ASP that shoots me an html form with search entry fields & options.
2) The submit button calls a perl script and posts the search criteria.
3) One of these variables is a string "key" that will be used as the basis for a binary search on a fixed-length, sorted text data file.
4) I use CGI to successfully pick up the key from the web browser. It was passed via a hidden textbox.
5) I start the binary search, splitting the file into halves as I go in typical fashion.
6) When I compare the contents of the scalar variable that I am using to store my key's contents with the "just read" value, I am having trouble performing a sucessful less than/ < / lt comparison on my two values.
Let's say I have a million records. They have this format:
NNNNNNNCCCCCCC
Like
0000001dataaaaa
thru
1000000datazzzzz
The key part is the left-hand 7 bytes, the right-hand data portion doesn't matter as long as the record length is fixed.
I do a search looking for a key of, say, 0001234.
I split the file in half, and get one with 0500000.
OK, I currently have 0001234 placed in a variable called $searchkey.
I have the just-read 0500000 in a variable called $testkey.
In order to determine whether I need to go hi or low, I need to do something like this:
(pardon the pseudo- I am not familiar enough to whip it out just yet)
if ($searchkey eq $testkey)
# I have a match and I am done
else
if ($searchkey lt $testkey) #tried lt and < symbol
go up
print "$searchkey is less than $testkey"
else
go down
print "$searchkey is greater than $testkey"
What is happening is that I NEVER fall into the "go down" logic.
I even added some "print" statements to return the values onto the
screen for debugging.
I see statements that make sense, until I see the ones that were incorrectly evaluated, such as:
0022222 is less than 0000111 right there on the screen.
I thought there might be an unseen character causing the behavior, but after messing with it after work for wayyy too long, I still have yet to get the two string variables to compare in a way that makes sense to me.
If anyone has any ideas, let me know- I'd really appreciate it.
Thanks, Mike
Omaha, NE
By the way, I have only been messing with perl for about a week, and I absolutely love it! I just need to get over those syntax speedbumps.
FishMonger
11-11-2005, 07:45 AM
I don't have a clear picture of the data that you're trying to compare. Your description seems to be intermixing the comparison of strings and numbers. You say that you're doing a binary search, but none of the numbers in your example data are binary; in fact, some of them are alphanumeric. You also say that you're comparing strings, but you're using a numeric comparison operator instead of a string operator.
Can you give us an example of your actual data instead of the conflicting examples?
mikkojay
11-11-2005, 02:35 PM
The data being read is just plain ascii.
I used the word binary because a binary search splits something in half until you find what you are looking for- The embedded question in there is: what would cause two variables that contain these strings
variable a = "00000111111"
and
variable b = "00000088888"
to not get compared as expected in this IF statement:
if(a lt b)
*it executes this code
else
*I thought it should have executed this code
Does that help? It was pretty late when I wrote the question last night, and I was pretty fried from trying to figure it out on my own.
Thanks!
Once I get to work I can maybe grab some screen shots etc...
mikkojay
11-11-2005, 03:39 PM
Here is a bit more info.
First, some snippets of the code:
________________________________________________________________
#the search key is taken from post data passed by the browser
$searchkey = $input{'txtkey'};
open AIFILE, $aidexfilepath or die print "Can't open file: $!";
$lowrec = 1;
$hirec = $numrecs;
while ($lowrec <= $hirec){
$middle = int(($lowrec + $hirec) / 2);
print "lo = $lowrec<br>\n";
print "hi = $hirec<br>\n";
print "mid = $middle<br>\n";
$offset = (($middle * $reclen) - $reclen);
#$offset++;
print "off set is $offset<br>\n";
seek AIFILE, $offset, $startingpoint or die print "Can't seek file: $!";
read AIFILE, $Buffer, $reclen - 2 or die print "Can't read file: $!";
print "here is the buffer\n";
print "$Buffer<br>\n";
$testkey = substr($Buffer,0,$testkeylen);
print "srch key: $searchkey<br>\n";
print "test key: $testkey<br>\n";
if ($testkey == $searchkey){
$hirec = 0;
$lorec = 1;
#print "got one\n<br>\n";
}
else{
if($searchKey lt $testkey){
$hirec = $middle - 1;
print "$searchkey is less than $testkey<br>\n";}
else{
$lowrec = $middle + 1;
print "$searchkey is not less than $testkey<br>\n";}
}
}
____________________________________________________________
The problem line is this one: if($searchKey lt $testkey){
Here is a small snippet of the ascii text file being read:
0000000120000000007000432474
0000000120000000008000468421
0000000120000000009000503029
0000000120000000010000613181
0000000120000000011000754266
0000000120000000012000806372
0000000120000000013000926713
0000000120000000014000937702
0000000120000000015000959705
0000000121000000001000368127
0000000121000000002000388512
0000000121000000003000421840
0000000121000000004000899196
0000000121000000005000924014
0000000122000000001000118267
0000000122000000002000619107
0000000122000000003000741637
0000000122000000004000898594
0000000123000000001000173343
0000000123000000002000283175
0000000123000000003000395115
0000000123000000004000423840
0000000123000000005000801397
0000000124000000001000107720
0000000124000000002000218997
0000000124000000003000419622
0000000124000000004000821017
0000000125000000001000006812
0000000125000000002000038514
0000000125000000003000146493
0000000125000000004000226777
0000000125000000005000233840
0000000125000000006000253075
0000000125000000007000256982
0000000125000000008000262272
0000000125000000009000356848
0000000125000000010000365832
0000000125000000011000409660
0000000125000000012000409661
0000000125000000013000504491
0000000125000000014000521653
0000000125000000015000542776
It is just plain ascii text.
Here is a COBOL FD of the file if that makes things clearer:
01 ALTKEY2-REC-NEW.
02 ALTKEY2-KEY-NEW.
03 ALTKEY2-AMOUNT-NEW PIC 9(8)V99.
02 ALTKEY2-TICKER-NEW PIC 9(9).
02 AK2-RECORD-NUMBER-NEW PIC 9(9).
_____________________________________________________________
Here is the output:
(watch for the comparison at this record: off set is 50220)
lo = 1
hi = 857976
mid = 428988
off set is 12869610
here is the buffer 0000010000000015797000734928
srch key: 0000000123000000001
test key: 0000010000000015797
0000000123000000001 is less than 0000010000000015797
lo = 1
hi = 428987
mid = 214494
off set is 6434790
here is the buffer 0000003500000000502000084642
srch key: 0000000123000000001
test key: 0000003500000000502
0000000123000000001 is less than 0000003500000000502
lo = 1
hi = 214493
mid = 107247
off set is 3217380
here is the buffer 0000001902000000008000161807
srch key: 0000000123000000001
test key: 0000001902000000008
0000000123000000001 is less than 0000001902000000008
lo = 1
hi = 107246
mid = 53623
off set is 1608660
here is the buffer 0000001008000000013000475400
srch key: 0000000123000000001
test key: 0000001008000000013
0000000123000000001 is less than 0000001008000000013
lo = 1
hi = 53622
mid = 26811
off set is 804300
here is the buffer 0000000700000000696000621492
srch key: 0000000123000000001
test key: 0000000700000000696
0000000123000000001 is less than 0000000700000000696
lo = 1
hi = 26810
mid = 13405
off set is 402120
here is the buffer 0000000500000000595000140136
srch key: 0000000123000000001
test key: 0000000500000000595
0000000123000000001 is less than 0000000500000000595
lo = 1
hi = 13404
mid = 6702
off set is 201030
here is the buffer 0000000300000000253000453409
srch key: 0000000123000000001
test key: 0000000300000000253
0000000123000000001 is less than 0000000300000000253
lo = 1
hi = 6701
mid = 3351
off set is 100500
here is the buffer 0000000198000000018000421540
srch key: 0000000123000000001
test key: 0000000198000000018
0000000123000000001 is less than 0000000198000000018
lo = 1
hi = 3350
mid = 1675
off set is 50220
here is the buffer 0000000088000000012000870332
srch key: 0000000123000000001
test key: 0000000088000000012
0000000123000000001 is less than 0000000088000000012
______________________________________________________________
There are more iterations of the lookup, but I won't waste the space.
See the last lookup above?
It is saying that:
0000000123000000001
is less than
0000000088000000012
Whether you evaluate these as numbers, or strings, I cannot see
how the first could be considered "less than" the second.
Has this clarified anything, or have I just made an even bigger mess?
If you have read this far down the post, I thank you!
-Mike
mikkojay
11-11-2005, 04:05 PM
Oh geez, never mind- I figured it out.
It was a CaSe TyPo!!!
I found it by turning on the warnings like this:
use warnings;
This yielded some new output...
Use of uninitialized value in string lt at C:\Inetpub\wwwroot\AIWC\getpost.pl line 164.
What do you know, but it was the same line I was complaining about?
if($searchKey lt $testkey){
should have been
if($searchkey lt $testkey){
one upper-case K kicked my ***.
Fun stuff.
I'll let you know when I have a REAL problem.
Thanks for reading.
FishMonger
11-11-2005, 05:30 PM
Good to hear that you found the problem. I've made that exact same mistake more times than I want to admit.
One thing I've found that helps to find these types of errors in seconds is to use an IDE such as Komodo for writting my scripts. The system I was on yesterday didn't have an IDE, and I spent 45 minutes tracking down an error that Komodo would have pointed out in seconds. You can download it and try it for the 21 days free trial and if you like it, it's only $30.
http://www.activestate.com/Products/Komodo/?tn=1
EDIT:
In addition to enabling warnings, you should use the strict pragma. The strict pragma should be used in all of your scripts. If you wish, once you have your script fully debugged, you can disable warnings, but I recommend to keep it enabled.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.