PDA

View Full Version : Resolved Need help with replace()



neochronomo
01-28-2010, 11:58 AM
I found a regular expression that will strip my string of every html tag...

var stripped = htmlStr.replace(/(<([^>]+)>)/ig,"");

Well, I also want it to strip every thing between parentheses, square brackets, and replace "&#160;" with a space.

It would be really cool if you could explain how to do this as well, because the stuff in the above code after "replace" confuses me. I have no idea how that's working.

Old Pedant
01-28-2010, 09:25 PM
Only thing you can't do all in one operation is the replacement of *.



var stripped = htmlStr.replace(/(\<[^\>]+\>|\([^\)]+\)|\[[^\]]\])/g,"")
.replace(/\&\#160\;/g, " ");

To read that:
(...|...|...) means "any one of the ... choices"
\<[^\>]+\> is the first of three choices. the others are the same except for the delimiters
\< means "match a < character". You probably don't need the \ there, but I'm paranoid.
[^xyz] means "match any character *except* xyz, so
[^\>] means "match any character except > (again, the \ may not be needed)
+ means "one or more, so
[^\>]+ means "one or more characters that are not > characters"
\> means "match one > character, so
\<[^\>]+\> means "look for < followed by any number of non-> characters followed by >"
similarly
\([^\)]+\) means "look for ( followed by any number of non-) characters followed by )"
\[[^\]]\] means "look for [ followed by any number of non-] characters followed by ]"
and then
(\<[^\>]+\>|\([^\)]+\)|\[[^\]]\]) means "find any one of those 3 patterns just described" and
/(\<[^\>]+\>|\([^\)]+\)|\[[^\]]\])/g means "and do this 'globally' (that is, as many times as you find the pattern(s))"

Finally, when you do
string.replace( regexp, replacement )
that says "replace whatever the regexp matches with the specified replacement".

So this says, in essence, "replace all occurrences of <...> or (...) or [...] with a blank string".

And then the second replace is easier: It just says "replace all occurrences of & #160; with a space".

Note that we use the \ character to "escape" special characters. There are many special characters in regular expressions, certainly including [ and ] and ( and ) as you saw, so putting the \ in front of a special character removes its special meaning, saying "take the next character only as a regular character without special meaning."

In many places in regexp's a special character may not actually have its special meaning, so you may not need to escape it. But it can never hurt to do so, and I prefer to err on the side of safety.

Okay?

jmrker
01-28-2010, 10:10 PM
While 'Old Pedant' was supplying the answer to your question,
I was plodding away on a version of my own. Got it all but the special character. :mad:

Oh well, better late than never. :o


<html>
<head>
<title>Tag, (, [ Replacer</title>
<script type="text/javascript">
// From: http://codingforums.com/showthread.php?t=187906

function ReplacementFunction() {
var htmlStr = document.getElementById('AreaSource').innerHTML;
var stripped = htmlStr.replace(/(<br>)/g,'\n');
stripped = stripped.replace(/(<([^>]+)>)/ig,"");
stripped = stripped.replace(/(\([^\)]+\))/ig,"");
stripped = stripped.replace(/(\[[^\]]+\])/ig,"");

stripped = stripped.replace(/\&\#160\;/g, " "); // not working yet!

stripped = stripped.replace(/\n/g,'<br>');
document.getElementById('AreaReplace').innerHTML = stripped;
}

</script>
</head>
<body>
<h1>Tab, (, [ & space Replacer</h1>
<div id="AreaSource" style="border:1px solid blue; width:300px;">
Tag: <h3>tag information replaced</h3><br>
Parenthesis: (parenthesis replaced)<br>
Bracket: [bracket replaced]<br>
Space: ******* replaced
</div>
<p>
<button onclick="ReplacementFunction()">Do It!</button>
<p>
<div id="AreaReplace" style="border:5px solid red; width:300px;">
Test Display Area
</div>

</body>
</html>

His is probably a better solution. Mine may be slightly more understandable (?).

Old Pedant
01-28-2010, 10:44 PM
Well, actually I don't think you'll ever see & #160; if you are looking at HTML. You'll see & nbsp; instead. But I did what he asked.

One comment:

There's no need for the parens in those regexps if you do them one at a time. And the "i" won't do anything as there are no alphabetic characters involved:


stripped = stripped.replace(/(<([^>]+)>)/ig,"");
is the same as
stripped = stripped.replace(/<[^>]+>/g,"");

I know, you were using the original as a starting point. And there's nothing wrong about the added parens and "i", just as there's nothing wrong about my paranoia with the back slashes.

Old Pedant
01-28-2010, 10:47 PM
Oh...and one more thing: The sequences <> and [] and () will *NOT* be zapped by any of the above. Because we are insisting on at least one character between the pairs.

If you want to zap <> and [] and () then just change the + to an * in the patterns

\[[^\]]*\]
and similar.

neochronomo
01-29-2010, 04:26 AM
Thanks everybody! I've raised your reputations :)