PDA

View Full Version : Need help with regex matching multiple lines


Jon Michael
04-05-2006, 06:56 AM
I'm writing a program file that reads an xml file and compares the settings in that .xml file to other servers that have the same .xml file - it's basically comparing the settings and writing to a log file as to whether it matches or doesn't match.

The problem I'm running into is toward the end of the .xml file, the lines in the .xml file are no longer unique - it uses the same naming convention for multiple setting instances.

For example.


<SpamIp4rLookups>
<name>blah</name>
<domainLookup>blahblah</domainLookup>
<description />
<weight>a number</weight>
<enabled>False</enabled>
<smtpenabled>False</smtpenabled>
</SpamIp4rLookups>


Another example.


<spamLevelLowAction>
<type>PrefixSubject</type>
<argument>blah-low: </argument>
</spamLevelLowAction>


Up to this point I've just doing matches on things like <type>(.*)<\/type> but I can no longer do that as it's only going to find the first match every time, and I want it to find all instances of that setting, but I want to keep it organized for each setting block.

I hope this makes sense - I'm just learning perl so this part has got me a bit stumped, I assume I would have to use hash references to accomplish this but am not sure.

Can anyone provide some insight?

Thanks.

nkrgupta
04-05-2006, 09:07 AM
Can you post your code and sample xml files that you are matching? Else its very difficult to figure out.

FishMonger
04-05-2006, 04:27 PM
You haven't given us enough info for us to be able to help, but you might find it easier to use one of the XML modules instead of the regex.

http://search.cpan.org/~msergeant/XML-Parser-2.34/Parser.pm
http://search.cpan.org/~mirod/XML-Twig-3.23/Twig.pm
http://search.cpan.org/~nwetters/XML-LibXML-Fixup-0.03/Fixup.pm
http://search.cpan.org/search?query=xml&mode=all

Jon Michael
04-08-2006, 08:05 AM
my $file = \\\\$server\\c\$\path\\file.xml;
open FILE, "<$file";

my $username;

while(<FILE>)
{
if ( $_ =~ /<username>(.*)<\/username>/i )
{
$username = $1;
}

...

}
close(FILE);

if ( $username =~ /\Auser\Z/i )
{
print "$server: username Matched [$username]\n";
print LOG "$server: username Matched [$username]\n";
}
else
{
print "$server: username not Matched [$username]\n";
print LOG "$server: username not Matched [$username]\n";
}


There part of my code. It's telling me if one line of the xml file matches or doesn't match.

The problem I'm running into is when I can no longer match each setting because they are no longer unique as they span across multiple lines.

Taken from my original post


<SpamIp4rLookups>
<name>blah</name>
<domainLookup>blahblah</domainLookup>
<description />
<weight>a number</weight>
<enabled>False</enabled>
<smtpenabled>False</smtpenabled>


<name>blah2</name>
<domainLookup>blahblah2</domainLookup>
<description />
<weight>a number</weight>
<enabled>False</enabled>
<smtpenabled>False</smtpenabled>
</SpamIp4rLookups>


Like above, it's possible for there to be multiple instances of settings, therefore there are multiple lines of <name></name> <weight></weight> therefore I can't match what's inbetween <name></name> because it's only going to match the first one it finds - I can't assume there is only going to be one line called <name></name>

I looked at those modules and they make absolutely no sense to me - like I said, I'm new to all of this. Hopefully this gives a little more information.