Perl regex question

12-10-2009, 06:00 PM
A quick question with a regex problem that seems to be stumping me. Hopefully someone has some thoughts.

open(FDATA, $fullPath) or die "Can't open $fullPath : $!";

while(<FDATA>) {
if($_ =~ /<BODY>(.*)<\/BODY>/gis) {
print "Match: $1\n";

I'm basically just comparing this code against an html file or xml file with a body tag somewhere in it. The regex seems to be good from what I can tell. I ran it through an online regex comparison from regexbuddy and it seems to return properly. But for some reason it isn't returning anything here. Any thoughts?

12-10-2009, 06:05 PM
Not tested but I would try:

if($_ =~ /[<BODY>|</BODY>]/i ){
print qq(do summat);

you may need to escape the < and > but I doubt it.


12-10-2009, 06:18 PM
Sorry, I'm a little rusty with regexs, so while that may work, I'm not sure how to use it. I should have been more explicit. Am I mistaken that your code is simply matching <BODY> or </BODY>? I'm trying to return everything between the BODY tags. I'm using (.*) because I need to be able to reference it somehow to use it elsewhere, such as with $1. If your code is returning what's between the body tags, how would I reference a match using your syntax?

Regardless, thanks for the input. It is appreciated.

12-10-2009, 07:44 PM
In almost all cases, using a regex to parse html is the wrong approach. Normally you'd want to use one of the html parsers on cpan. However, in this case, you might get by with a regex (actually 2 regex's and the flip-flop operator).

my $fullPath = '/some/path/file';
open( my $FDATA, '<', $fullPath)
or die "Can't open $fullPath : $!";

my $body;
while(<$FDATA>) {
if(/<BODY>/i .. /<\/BODY>/i ) {
$body .= $_;
close $FDATA;

print $body;