...

View Full Version : Regex Problem



byuhobbes85
08-07-2009, 09:10 PM
I have no idea why this code is not working. I have some XML and I'm trying to get the contents of the root tag. I'm doing a very simple regular expression, but it does no substitution to the $xml string. What is causing this and how do I fix it? Thanks.



use strict;

my $xml = '<prompt xmlns="http://www.imsglobal.org/xsd/imsqti_v2p0" xmlns:awwedu="http://mylab.myuniv.edu/dokuwiki/doku.php?id=awwxmlwitharclitetags" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<awwedu:audio>http://mylab.myuniv.edu/aww/alifbaa_unit6/sounds/ABU06Dr06.01.mp3</awwedu:audio>
</prompt>';

$xml =~ s/<prompt.*>(.*)<\/prompt>/$1/;
print "\n$xml\n";

FishMonger
08-07-2009, 09:30 PM
You should use an XML parser not a regex.

Here are 2 of the more commonly used modules.

XML::Parser - A perl module for parsing XML documents
http://search.cpan.org/~msergeant/XML-Parser-2.36/Parser.pm

XML::Simple - Easy API to maintain XML (esp config files)
http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm

byuhobbes85
08-07-2009, 09:41 PM
This is more of an academic question now than a practical one. I'm not even going to be using Perl in my final implementation, and I probably am going to use an XML parser. I started off using Perl just to see if the Regex would work, and I want to know why it's not working. It seems like such a simple substitution.

Thanks.

byuhobbes85
08-07-2009, 10:13 PM
Problem solved. I thought .* would match newline characters and forgot to use the 'm' flag signifying multiline match. The regex should be s/<prompt.*?>\s*(.*)\s*<\/prompt>/$1/m. The \s* are for the whitespace.

Shannon Blonk
08-07-2009, 10:21 PM
It didn't work the way you expected because * is greedy. Insert this:

$xml =~ m/(<prompt.*>)(.*)<\/prompt>/;
print "\$1=$1\n";
print "\$2=$2\n";

-----

I gotta lern to typ fastr.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum