View Full Version : Regex Problem

08-07-2009, 09:10 PM
I have no idea why this code is not working. I have some XML and I'm trying to get the contents of the root tag. I'm doing a very simple regular expression, but it does no substitution to the $xml string. What is causing this and how do I fix it? Thanks.

use strict;

my $xml = '<prompt xmlns="http://www.imsglobal.org/xsd/imsqti_v2p0" xmlns:awwedu="http://mylab.myuniv.edu/dokuwiki/doku.php?id=awwxmlwitharclitetags" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

$xml =~ s/<prompt.*>(.*)<\/prompt>/$1/;
print "\n$xml\n";

08-07-2009, 09:30 PM
You should use an XML parser not a regex.

Here are 2 of the more commonly used modules.

XML::Parser - A perl module for parsing XML documents

XML::Simple - Easy API to maintain XML (esp config files)

08-07-2009, 09:41 PM
This is more of an academic question now than a practical one. I'm not even going to be using Perl in my final implementation, and I probably am going to use an XML parser. I started off using Perl just to see if the Regex would work, and I want to know why it's not working. It seems like such a simple substitution.


08-07-2009, 10:13 PM
Problem solved. I thought .* would match newline characters and forgot to use the 'm' flag signifying multiline match. The regex should be s/<prompt.*?>\s*(.*)\s*<\/prompt>/$1/m. The \s* are for the whitespace.

Shannon Blonk
08-07-2009, 10:21 PM
It didn't work the way you expected because * is greedy. Insert this:

$xml =~ m/(<prompt.*>)(.*)<\/prompt>/;
print "\$1=$1\n";
print "\$2=$2\n";


I gotta lern to typ fastr.