Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5

Thread: Regex Problem

  1. #1
    Regular Coder byuhobbes85's Avatar
    Join Date
    Oct 2006
    Location
    Ames, Iowa, USA
    Posts
    116
    Thanks
    9
    Thanked 4 Times in 4 Posts

    Question Regex Problem

    I have no idea why this code is not working. I have some XML and I'm trying to get the contents of the root tag. I'm doing a very simple regular expression, but it does no substitution to the $xml string. What is causing this and how do I fix it? Thanks.

    Code:
    use strict;
    
    my $xml = '<prompt xmlns="http://www.imsglobal.org/xsd/imsqti_v2p0" xmlns:awwedu="http://mylab.myuniv.edu/dokuwiki/doku.php?id=awwxmlwitharclitetags" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
      <awwedu:audio>http://mylab.myuniv.edu/aww/alifbaa_unit6/sounds/ABU06Dr06.01.mp3</awwedu:audio>
    </prompt>';
    
    $xml =~ s/<prompt.*>(.*)<\/prompt>/$1/;
    print "\n$xml\n";
    -- </byuhobbes>

  • #2
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,872
    Thanks
    2
    Thanked 164 Times in 159 Posts
    You should use an XML parser not a regex.

    Here are 2 of the more commonly used modules.

    XML::Parser - A perl module for parsing XML documents
    http://search.cpan.org/~msergeant/XM...2.36/Parser.pm

    XML::Simple - Easy API to maintain XML (esp config files)
    http://search.cpan.org/~grantm/XML-S.../XML/Simple.pm

  • #3
    Regular Coder byuhobbes85's Avatar
    Join Date
    Oct 2006
    Location
    Ames, Iowa, USA
    Posts
    116
    Thanks
    9
    Thanked 4 Times in 4 Posts
    This is more of an academic question now than a practical one. I'm not even going to be using Perl in my final implementation, and I probably am going to use an XML parser. I started off using Perl just to see if the Regex would work, and I want to know why it's not working. It seems like such a simple substitution.

    Thanks.
    -- </byuhobbes>

  • #4
    Regular Coder byuhobbes85's Avatar
    Join Date
    Oct 2006
    Location
    Ames, Iowa, USA
    Posts
    116
    Thanks
    9
    Thanked 4 Times in 4 Posts

    Thumbs up

    Problem solved. I thought .* would match newline characters and forgot to use the 'm' flag signifying multiline match. The regex should be s/<prompt.*?>\s*(.*)\s*<\/prompt>/$1/m. The \s* are for the whitespace.
    Last edited by byuhobbes85; 08-07-2009 at 09:16 PM. Reason: correct regex
    -- </byuhobbes>

  • #5
    New Coder
    Join Date
    Mar 2009
    Location
    Fabric Covered Box
    Posts
    69
    Thanks
    1
    Thanked 16 Times in 14 Posts
    It didn't work the way you expected because * is greedy. Insert this:

    $xml =~ m/(<prompt.*>)(.*)<\/prompt>/;
    print "\$1=$1\n";
    print "\$2=$2\n";

    -----

    I gotta lern to typ fastr.
    Last edited by Shannon Blonk; 08-07-2009 at 09:23 PM. Reason: too slow

  • Users who have thanked Shannon Blonk for this post:

    byuhobbes85 (08-07-2009)


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •