Regex to remove punctuation before/after dictionary words
I know. I should just learn it. But every tutorial I look at just makes my head spin.
So here's the thing: I get a bunch of strings that are single words. I don't know what they are, so it has to be dynamic. But I have to strip out the punctuation that they come with, outside of the word boundaries.
So:
(anyway) should become anyway
and/ becomes and
or, becomes or
'cool' becomes cool
but they're remains they're
and co-produce stays co-produce
Seems simple, but google is not my friend, once again.
thanks in advance for any suggestions. And if anybody knows of a non- head spinny regex tutorial, I'd love to see it.
I know. I should just learn it. But every tutorial I look at just makes my head spin.
So here's the thing: I get a bunch of strings that are single words. I don't know what they are, so it has to be dynamic. But I have to strip out the punctuation that they come with, outside of the word boundaries.
So:
(anyway) should become anyway
and/ becomes and
or, becomes or
'cool' becomes cool
but they're remains they're
and co-produce stays co-produce
Seems simple, but google is not my friend, once again.
thanks in advance for any suggestions. And if anybody knows of a non- head spinny regex tutorial, I'd love to see it.
Code:
<script type = "text/javascript">
var x = "So: I wonder how I can remove (these brackets)... also and/ or, \"this\" 'cool' apostrophe, [they're] co-produce (anyway)."
x = x.replace(/\b[-.,:;()&$#!\[\]\/{}"']+\B|\B[-.,:;()&$#!\[\]\/{}"']+\b/g, "");
alert (x);
</script>
thanks for the links, too, although I have already seen those ones and they too make my head spin. I'm beginning to think it's not the people who are explaining it who have the problem
seems to be OK. The code alert all the same words as the firefox inline spellchecker, which seems good enough. One thing, though - it seems the use of the \w amkes the code think that café ends at "f" - any way around that one?
seems to be OK. The code alert all the same words as the firefox inline spellchecker, which seems good enough. One thing, though - it seems the use of the \w makes the code think that café ends at "f" - any way around that one?
Code:
var x = "So: Théo, I wonder how I can remove (these brackets)... but not the é in café and/ or, \"this\" 'cool' apostrophe, [they're] co-terminous; (I believe!)."
x = x.replace(/\b[^\w\s\u00E0-\u00FC]+\B|\B[^\w\s]+\b/g, ""); // shorter alternative, do not delete accented characters at end of words
alert (x);
If you only want small letter e with acute é the Unicode is \u00E9. I don't think that there are any other accented characters which can appear at the end of a word in (imported) English except perhaps e with grave è which is \u00E8. Obviously many foreign languages use accented characters. In Italian è means is. You might perhaps want to retain La donna è mobile. My code covers all accented lower-case characters and hence all eventualities.
@Old Pedant - my understanding is that we are talking about dictionary words, not proper names. I don't see how any spell checker can check proper names. Some people even mis-spell Philip.
Is there a usually comma in the rendering of Wm. P. Norquist, III ?
@xelawoo - Might I repectfully suggest that you change your thread title to something more indicative of the content - such as "Regex to remove punctuation before/after dictionary words" which would perhaps be more helpful to people using the search feature of this forum.
__________________
All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.
@xelawoo - Might I repectfully suggest that you change your thread title to something more indicative of the content - such as "Regex to remove punctuation before/after dictionary words" which would perhaps be more helpful to people using the search feature of this forum.
you might and I have. And I appreciate the respectful nature of the request. I have seen you make similar ones in not-so-diplomatic terms.
Thanks for the new regex, too. Does exactly what it needs to do
you might and I have. And I appreciate the respectful nature of the request. I have seen you make similar ones in not-so-diplomatic terms.
Thanks for the new regex, too. Does exactly what it needs to do
My usual comment is:-
Do please read the posting guidelines regarding silly thread titles. The thread title is supposed to help people who have a similar problem in future. Yours is useless for this purpose. You can (and should) edit it to make it more meaningful.
That is aimed at newcomers, and silly thread titles such as "Help me" and "Urgent...deadline tomorrow!" (as per forum posting guidelines).
Your original thread title was not silly, but could be made more useful as I suggested.
You are right to deduce that I do not suffer fools gladly, although in your case I am willing to make an exception.
Long ago, a senior manager of my company said to me "The trouble with you, Philip, is that you don't suffer fools gladly".
My response was "Oh, I wouldn't say that. I always thought that we got on pretty well together."
__________________
All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.
Long ago, a senior manager of my company said to me "The trouble with you, Philip, is that you don't suffer fools gladly".
My response was "Oh, I wouldn't say that. I always thought that we got on pretty well together."
WOW! I loved that! You should send that to Scott Adams (the guy who created Dilbert) and suggest he use it.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.