...

View Full Version : Tough one- using regular expressions to remove HTML tags?



WA
01-20-2003, 01:45 PM
Ok, this is probably as much a challenge as it is a question. Within a string, I'm looking for a way to remove all potential HTML tags. For example, using the following string:

var mystring='This is a <b>very</b> interesting <a href="http://www.javascriptkit.com">site</a> for JS'

The resulting output should be instead:

var mystring='This is a very interesting site for JS'

I would define a HTML tag as anything that's surrounded by < >.

Obviously regular expressions is needed here, though I would gather specifically, back referencing. It's definitely not my strong suit.

Thanks!

mordred
01-20-2003, 05:30 PM
You could try



mystring = mystring.replace(/\<.+?\>/g, '');


though doesn't make use of backreferences... ;)

brothercake
01-20-2003, 07:39 PM
dude - you should get O'Reilly's book (http://www.oreilly.com/catalog/regex2/) - it changed my life :)

WA
01-21-2003, 12:35 AM
Thanks guys. mordred, your solution may just work for me! I was getting ahead of myself with the comment on back referencing. It seems I only need that when trying to replace a regular HTML tag with a custom one, such as from:

<a href="http://www.dynamicdrive.com">

to

[a url="http://www.dynamicdrive.com"]

or visa versa.

I'll try and shoot some holes into your solution a bit later. If it's bullet proof, that'd be awesome.

beetle
01-21-2003, 02:12 AM
Regular expressions by their nature are greedy, and unfortunately there is no way to Ungreedy a regex in JS. So, I think the more appropriate syntax is
mystring = mystring.replace(/\<[^\>]+\>/g, '');:D

mordred
01-21-2003, 11:11 AM
Of course you can. That's what the question mark after the quantifier is for in my regexp. IIRC the greedy/ungreedyness switching was added in JavaScript1.5, but hey, who targets NN4 or IE4 any longer today?

beetle
01-21-2003, 01:34 PM
Originally posted by mordred
Of course you can. That's what the question mark after the quantifier is for in my regexp. IIRC the greedy/ungreedyness switching was added in JavaScript1.5, but hey, who targets NN4 or IE4 any longer today? Whoa, cool. Thanks mordred :thumbsup:

brothercake
01-21-2003, 03:26 PM
ungreediness switching?? Is that generally available in regex, or can it only be done with vendor or language specific syntax?

beetle
01-21-2003, 03:54 PM
Originally posted by brothercake
ungreediness switching?? Is that generally available in regex, or can it only be done with vendor or language specific syntax? Not sure, but typically it's a flag or modifier. For example, that same regex in PHP would be
$mystring = preg_replace( "/\\<.*\\>/U", "", $mystring );

brothercake
01-21-2003, 03:59 PM
so it's the U in that case?

beetle
01-21-2003, 04:29 PM
Originally posted by brothercake
so it's the U in that case? Yes. But, obviously as all flags do, the Ungreedy modifier affects the whole pattern.

brothercake
01-21-2003, 07:22 PM
thanks

beetle
01-21-2003, 07:32 PM
I've also since learned that the double quantifier (using the ? after a quantifier) is the PCRE syntax for ungreedy

brothercake
01-21-2003, 08:33 PM
ahh .. hence mordred's original expression ..



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum