PDA

View Full Version : Compare two files, output to a new file


vietboy505
03-01-2006, 03:39 PM
Compare two files and output the difference in a new file?

nameList.txt

| | |AAAAAAA* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |

| | |AAAAAAA* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |

| | |CCCCCCC* | |* | |9999| |NAD | |- | |XXXXXXXXX| | |

| | |CCCCCC D* | |* | |9999| |BEM | |- | |XXXXXXXXX| |YYYYY|

| | |XXXXXX A* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |

| | |ZZZZZZ A* | |* | |9999| |NAD | |- | |XXXXXXXXX| | |

| | |EEASAW A* | |* | |9999| |NA* | |- | |XXXXXXXXX| |YYYYY|

| | |ASCAWF W A* | |* | |9999| |ME* | |* | |XXXXXXXXX| | |

| | |XXXXXX A A* | |* | |9999| |BE* | |* | |XXXXXXXXX| | |

| | |AWSDAW* | |* | |9999| |ME* | |- | |XXXXXXXXX| | |

| | |WFCAPI A2* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |



checkList.txt

1 XXXXXXX 6 6 U 1 2 3 4 5 6 1 1 P 1 0 1 0
2 AAAAAAA -12 11 Y 1 2 3 4 5 6 469.7 481.7 P 1 0 1 0
3 FASZFAS -12 -6 Z 1 2 3 4 426.4 431.7 Z 0 0 1 0
4 JJHJHGC -12 12 Y 1 2 3 4 5 6 446.5 457.6 P 1 1 1 0
5 JHGJHZA -9 -4 Z 1 2 3 4 405.6 410.7 Z 0 0 1 0
6 7843JHE -8 8 Y 1 2 3 4 5 6 446.5 457.6 P 1 0 1 0
7 NMHZJYA -12 12 Y 1 2 3 4 5 6 446.5 457.6 P 1 1 1 0
8 WFCAPI -8 8 Y 1 2 3 4 5 6 446.5 457.6 P 1 0 1 0
9 FASTYTA -8 8 Y 1 2 3 4 5 6 446.5 457.6 P 1 0 1 0
10 89QANJGA -14 -7 Z 1 2 3 405.6 410.7 Z 0 0 1 0


I was wondering how I can do this by passing nameList.txt in array, probably with AAAAAAA or ASCAWF, begin third characters.

Then grab the second file, checkList.txt, on text such as FASZFAS or NAETGNA, second column.

Then if array in checkList.txt doesn't match with nameList.txt, output that name to a new file.
This will keep on doing until the end of file and keep on appending the name to the same new file.

Functions I probably need to use is:
open()
close()
while loop
compare method

Can any one help me started?

vietboy505
03-02-2006, 06:44 AM
#!/usr/local/bin/perl

open(FILE0, $ARGV[0]) || die "Can't open $_: $!\n";
open(FILE1, $ARGV[1]) || die "Can't open $_: $!\n";
@file0data = <FILE0>;
@file1data = <FILE1>;
close(FILE0);
close(FILE1);

@linedata=();
@line1data=();


while(@file0data)
{
@linedata = split(/|/);

}

while(@file1data)
{
@line1data = split(/\t/);
}

while(@file1data)
{
@line1data = split(/\t/);

}

print("FILE: $linedata[3] \n\n");
print("FILE2: $line1data[1] \n\n");


##check

while(@line1data)
{
if($line1data[1] eq $linedata[3])
{
#output $line1data[1] & append to fileCheck.txt
}

}


Here the code I try, it does pass the text in the file into the array, but the split doesn't work.

nkrgupta
03-02-2006, 07:29 AM
#!/usr/bin/perl

open(FILE0, $ARGV[0]) || die "Can't open $_: $!\n";

while (<FILE0>)
{
chomp;
my($column)=$_=~m!\|\s+\|\s+\|(\w+).*!ig;
$file0{$column}++;
}
close(FILE0);

open(FILE1, $ARGV[1]) || die "Can't open $_: $!\n";

while (<FILE1>)
{
chomp;
my($column)=$_=~m!\d+\s+(\w+).*!ig;
$file1{$column}++;
}
close(FILE0);

open (W,'>filecheck.txt');
foreach (keys %file0) {
print W $_."\n" if (exists $file1{$_});
}
close(W)

See if this helps..... reply if you require an explanation of what's going on in the code. I've assumed that you don't want the character '*' or any space leading the column in the first file to be taken into account. So for
'CCCCCCC*' -- only 'CCCCCCC' is taken and for 'CCCCCC D*' -- only CCCCCC is taken.

vietboy505
03-02-2006, 03:27 PM
nkrgupta , that is what I need, thanks.

This is code check for difference?

If stuff in FILE1 doesn't match in FILE0, output that out.. other wise continue..

Should it be if not exist as the code below?


my($column)=$_=~m!\d+\s+(\w+).*!ig;

$file0{$column}++; #add up all colum??

open (W,'>filecheck.txt'); #open a file and output to fileheck.txt
foreach (keys %file0) { #loop through to find ... ...
print W $_."\n" if (exists $file1{$_}); #print if exist in the array of file1
}


What's the following codes are doing above?

And also, I want to make sure that all two arguments passed in or exit right away.


if ($ARGV[0] eq "" && $ARGV[1] eq "") {
exit;
}


Is this right, but that check for both arguments, what if if they type only in 1 argument. Is it repetive to do this


if ($ARGV[0] eq "") {
exit;
} elsif ($ARGV[1] eq "") {
exit;
}
elsif ($ARGV[0] eq "" && $ARGV[1] eq "") {
exit;
}


Or should I do this, exit right away:

if ($ARGV[0] eq "" || $ARGV[1] eq "") {
exit;
}

vietboy505
03-02-2006, 05:57 PM
I did a test on this name and it doesn't work.

file0.txt

| | |ABCDEFGH* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |
| | |IJKLMNOPQ* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |
| | |RSTUVWX* | |* | |9999| |NAD | |- | |XXXXXXXXX| | |
| | |YZABCD D* | |* | |9999| |BEM | |- | |XXXXXXXXX| |YYYYY|
| | |18ABCDE A* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |
| | |8ROAST A* | |* | |9999| |NAD | |- | |XXXXXXXXX| | |
| | |ABCZZA A* | |* | |9999| |NA* | |- | |XXXXXXXXX| |YYYYY|
| | |9WASHERE W A* | |* | |9999| |ME* | |* | |XXXXXXXXX| | |
| | |SEEDAR A A* | |* | |9999| |BE* | |* | |XXXXXXXXX| | |
| | |LIFE4* | |* | |9999| |ME* | |- | |XXXXXXXXX| | |
| | |PROGRAM A2* | |* | |9999| |MEP | |- | |XXXXXXXXX| | |


file1.txt

1 8ROAST 6 6 U 1 2 3 4 5 6 1 1 P 1 0 1 0
2 ABCZZA -12 11 Y 1 2 3 4 5 6 469.7 481.7 P 1 0 1 0
3 RSTUVWX -12 -6 Z 1 2 3 4 426.4 431.7 Z 0 0 1 0
4 ABCDEFGH -12 12 Y 1 2 3 4 5 6 446.5 457.6 P 1 1 1 0
5 8RASTS -9 -4 Z 1 2 3 4 405.6 410.7 Z 0 0 1 0
6 SWEETW -8 8 Y 1 2 3 4 5 6 446.5 457.6 P 1 0 1 0
7 LIFE4 -12 12 Y 1 2 3 4 5 6 446.5 457.6 P 1 1 1 0
8 RSTUVWX -8 8 Y 1 2 3 4 5 6 446.5 457.6 P 1 0 1 0
9 569SHOULDOUTPUT Y -8 8 Y 1 2 3 4 5 6 446.5 457.6 P 1 0 1 0
10 ABCZZA -14 -7 Z 1 2 3 405.6 410.7 Z 0 0 1 0


It print out the wrong solution.. Based on that..

from file1.txt
5,6,9 --> 8RASTS SWEETW 569SHOULDOUTPUT should output this.. but it doesn't because it's difference compare to file0.txt

I try to change the code this and still doesn't work:

#!/usr/bin/perl

open(FILE0, $ARGV[0]) || die "Can't open $_: $!\n";

while (<FILE0>)
{
chomp;
my($column)=$_=~m!\|\s+\|\s+\|(\w+).*!ig;
$file0{$column}++;
}
close(FILE0);

open(FILE1, $ARGV[1]) || die "Can't open $_: $!\n";

while (<FILE1>)
{
chomp;
my($column)=$_=~m!\d+\s+(\w+).*!ig;
$file1{$column}++;
}
close(FILE1);

open (W,'>file_check.txt');
foreach (keys %file0) {
##change from exist to not exists
print W $_."\n" if (not exists $file1{$_});
}
close(W);

bustamelon
03-10-2006, 03:17 PM
This has already been written for you. If you're on a *nix server, just do:

diff file0.txt file1.txt

...and this will output the differences to a file called diff.txt:

diff file0.txt file1.txt > diff.txt

It's not tjhat pretty, but it does basically whaat you are looking for.