View Full Version : Help w/ Regular Expressions
cbowen2
07-14-2005, 04:04 PM
I am trying to write a perl script that checks xhtml pages for coding standards. Here is an example:
$<input type="text" class="edit" name="BE_INCOME" id="be_income" />
I want to check that the tag name and attribute names are in lower case. If they are not then print out an error with the line number. Attribute values are permitted to be in upper case. So the above line should pass the test. But, this line would fail:
$<input TYPE="text" class="edit" Name="BE_INCOME" id="be_income" />
because TYPE is capitalized.
Thank you to anyone that can help.
cbowen2
cbowen2
07-14-2005, 04:34 PM
So far, I have this piece of code:
my $line = "<input tYpe=\"text\" class=\"edit\" name=\"#be_income\" id=\"be_income\" />";
$line =~ s/="[a-zA-Z0-9_#]+"//g; #removes the ="value"
$line =~ s/\s+//g; #remove the spaces
# so now I have a line that looks like this
# <inputtYpclassnameid/>
if( $line =~ /<[A-Z]+>/g ) {
print "$1\n";
print "Found caps in $line\n";
}
but the if statement is finding the 'Y' and I thought it should.
Thanks again,
cbowen2
Jeff Mott
07-14-2005, 07:16 PM
Parsing HTML or XHTML correctly is a lot trickier than it appears at first. If you want to validate XHTML then you'll want to find an XML validator module. XML::Checker (http://search.cpan.org/~tjmather/XML-Checker-0.13/lib/XML/Checker.pm) is a possibility, though I havn't used it so I can't vouche for it. Carefully read the documentation for that module and continue to search for more until you find the one that is the best for what you need.
rwedge
07-14-2005, 09:35 PM
xml and xhtml are not the same.
Tidy (http://cgi.w3.org/cgi-bin/tidy) is a 'free to use' program that cleans up xhtml documents fairly well.
Jeff Mott
07-14-2005, 10:05 PM
xml and xhtml are not the sameXHTML is an application of XML. Any XHTML document is also an XML document. An XML validator can check the validity of XHTML markup against the DTD.
rwedge
07-15-2005, 05:20 AM
XHTML is an application of XML, true, but the modules may be better for validating XML and not so good for XHTML.
Properly nested tags, quotes around attributes would raise flags, but, for instance, <pubDate></pubDate> is valid XML, as opposed to <hTml></hTml> being invalid XHTML. XML does not have mandatory elements where XHTML does, etc..
I appreciate Tidy because it would be a formidable task to this type of validation. It will not even try to decode a HTML page that is not already somewhere in the ball park.
Jeff Mott
07-15-2005, 05:38 AM
XML does not have mandatory elements where XHTML does, etcThat is why I said to validate the XML document against the DTD. A program that uses the DTD of any given document to check the validity would be capable of validating any XML document, including an XHTML page.
vBulletin® v3.8.2, Copyright ©2000-2010, Jelsoft Enterprises Ltd.