Go Back   CodingForums.com > :: Client side development > JavaScript programming

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 02-04-2013, 03:30 AM   PM User | #1
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,461
Thanks: 52
Thanked 457 Times in 455 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
Regex to remove punctuation before/after dictionary words

I know. I should just learn it. But every tutorial I look at just makes my head spin.

So here's the thing: I get a bunch of strings that are single words. I don't know what they are, so it has to be dynamic. But I have to strip out the punctuation that they come with, outside of the word boundaries.

So:
(anyway) should become anyway
and/ becomes and
or, becomes or
'cool' becomes cool
but they're remains they're
and co-produce stays co-produce

Seems simple, but google is not my friend, once again.

thanks in advance for any suggestions. And if anybody knows of a non- head spinny regex tutorial, I'd love to see it.

Last edited by xelawho; 02-05-2013 at 07:51 PM..
xelawho is offline   Reply With Quote
Old 02-04-2013, 07:52 AM   PM User | #2
Philip M
Supreme Master coder!

 
Philip M's Avatar
 
Join Date: Jun 2002
Location: London, England
Posts: 17,100
Thanks: 197
Thanked 2,421 Times in 2,399 Posts
Philip M has a spectacular aura aboutPhilip M has a spectacular aura aboutPhilip M has a spectacular aura about
Quote:
Originally Posted by xelawho View Post
I know. I should just learn it. But every tutorial I look at just makes my head spin.

So here's the thing: I get a bunch of strings that are single words. I don't know what they are, so it has to be dynamic. But I have to strip out the punctuation that they come with, outside of the word boundaries.

So:
(anyway) should become anyway
and/ becomes and
or, becomes or
'cool' becomes cool
but they're remains they're
and co-produce stays co-produce

Seems simple, but google is not my friend, once again.

thanks in advance for any suggestions. And if anybody knows of a non- head spinny regex tutorial, I'd love to see it.
Code:
<script type = "text/javascript">

var x = "So: I wonder how I can remove (these brackets)... also and/ or, \"this\" 'cool' apostrophe, [they're] co-produce (anyway)."
x = x.replace(/\b[-.,:;()&$#!\[\]\/{}"']+\B|\B[-.,:;()&$#!\[\]\/{}"']+\b/g, "");

alert (x);

</script>

You will find excellent regex tutorials not a million miles from here at
http://www.javascriptkit.com/javatutors/re.shtml
http://www.javascriptkit.com/javatutors/redev.shtml

You can test your regular expressions at: http://www.claughton.clara.net/regextester.html


A man generally has two reasons for doing things - the one that sounds good, and the real one. - J.P.Morgan
__________________

All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.

Last edited by Philip M; 02-04-2013 at 08:53 AM..
Philip M is offline   Reply With Quote
Old 02-04-2013, 09:10 AM   PM User | #3
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,461
Thanks: 52
Thanked 457 Times in 455 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
Thanks, Philip. The regex works great.

thanks for the links, too, although I have already seen those ones and they too make my head spin. I'm beginning to think it's not the people who are explaining it who have the problem
xelawho is offline   Reply With Quote
Old 02-04-2013, 10:42 AM   PM User | #4
Philip M
Supreme Master coder!

 
Philip M's Avatar
 
Join Date: Jun 2002
Location: London, England
Posts: 17,100
Thanks: 197
Thanked 2,421 Times in 2,399 Posts
Philip M has a spectacular aura aboutPhilip M has a spectacular aura aboutPhilip M has a spectacular aura about
You could shorten the regex to

Code:
x = x.replace(/\b[^\w\s]+\B|\B[^\w\s]+\b/g, "");
__________________

All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.
Philip M is offline   Reply With Quote
Old 02-05-2013, 02:53 AM   PM User | #5
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,556
Thanks: 62
Thanked 4,055 Times in 4,024 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Hmmm...and what about V.I.Warshavski or Wm.P.Norquist, III ??

Would you strip the periods and commas from them?

Phillips code leaves the periods alone by zaps the comma. Is that what you want?

But if the text is V. I. Warshavski and Wm. P. Norquist, III (note the spaces after the periods), then the periods get zapped, as well.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

Last edited by Old Pedant; 02-05-2013 at 03:00 AM..
Old Pedant is offline   Reply With Quote
Old 02-05-2013, 06:32 AM   PM User | #6
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,461
Thanks: 52
Thanked 457 Times in 455 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
seems to be OK. The code alert all the same words as the firefox inline spellchecker, which seems good enough. One thing, though - it seems the use of the \w amkes the code think that café ends at "f" - any way around that one?
xelawho is offline   Reply With Quote
Old 02-05-2013, 07:36 AM   PM User | #7
Philip M
Supreme Master coder!

 
Philip M's Avatar
 
Join Date: Jun 2002
Location: London, England
Posts: 17,100
Thanks: 197
Thanked 2,421 Times in 2,399 Posts
Philip M has a spectacular aura aboutPhilip M has a spectacular aura aboutPhilip M has a spectacular aura about
Quote:
Originally Posted by xelawho View Post
seems to be OK. The code alert all the same words as the firefox inline spellchecker, which seems good enough. One thing, though - it seems the use of the \w makes the code think that café ends at "f" - any way around that one?
Code:
var x = "So: Théo, I wonder how I can remove (these brackets)... but not the é in café and/ or, \"this\" 'cool' apostrophe, [they're] co-terminous; (I believe!)."

x = x.replace(/\b[^\w\s\u00E0-\u00FC]+\B|\B[^\w\s]+\b/g, "");  // shorter alternative, do not delete accented characters at end of words

alert (x);
If you only want small letter e with acute é the Unicode is \u00E9. I don't think that there are any other accented characters which can appear at the end of a word in (imported) English except perhaps e with grave è which is \u00E8. Obviously many foreign languages use accented characters. In Italian è means is. You might perhaps want to retain La donna è mobile. My code covers all accented lower-case characters and hence all eventualities.

@Old Pedant - my understanding is that we are talking about dictionary words, not proper names. I don't see how any spell checker can check proper names. Some people even mis-spell Philip.
Is there a usually comma in the rendering of Wm. P. Norquist, III ?

@xelawoo - Might I repectfully suggest that you change your thread title to something more indicative of the content - such as "Regex to remove punctuation before/after dictionary words" which would perhaps be more helpful to people using the search feature of this forum.
__________________

All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.

Last edited by Philip M; 02-05-2013 at 10:16 AM..
Philip M is offline   Reply With Quote
Users who have thanked Philip M for this post:
xelawho (02-05-2013)
Old 02-05-2013, 07:57 PM   PM User | #8
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,461
Thanks: 52
Thanked 457 Times in 455 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
Quote:
Originally Posted by Philip M View Post
@xelawoo - Might I repectfully suggest that you change your thread title to something more indicative of the content - such as "Regex to remove punctuation before/after dictionary words" which would perhaps be more helpful to people using the search feature of this forum.
you might and I have. And I appreciate the respectful nature of the request. I have seen you make similar ones in not-so-diplomatic terms.

Thanks for the new regex, too. Does exactly what it needs to do
xelawho is offline   Reply With Quote
Old 02-06-2013, 07:42 AM   PM User | #9
Philip M
Supreme Master coder!

 
Philip M's Avatar
 
Join Date: Jun 2002
Location: London, England
Posts: 17,100
Thanks: 197
Thanked 2,421 Times in 2,399 Posts
Philip M has a spectacular aura aboutPhilip M has a spectacular aura aboutPhilip M has a spectacular aura about
Quote:
Originally Posted by xelawho View Post
you might and I have. And I appreciate the respectful nature of the request. I have seen you make similar ones in not-so-diplomatic terms.

Thanks for the new regex, too. Does exactly what it needs to do
My usual comment is:-
Do please read the posting guidelines regarding silly thread titles. The thread title is supposed to help people who have a similar problem in future. Yours is useless for this purpose. You can (and should) edit it to make it more meaningful.

That is aimed at newcomers, and silly thread titles such as "Help me" and "Urgent...deadline tomorrow!" (as per forum posting guidelines).
Your original thread title was not silly, but could be made more useful as I suggested.

You are right to deduce that I do not suffer fools gladly, although in your case I am willing to make an exception.

Long ago, a senior manager of my company said to me "The trouble with you, Philip, is that you don't suffer fools gladly".
My response was "Oh, I wouldn't say that. I always thought that we got on pretty well together."
__________________

All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.

Last edited by Philip M; 02-06-2013 at 07:55 AM..
Philip M is offline   Reply With Quote
Old 02-06-2013, 08:59 PM   PM User | #10
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,556
Thanks: 62
Thanked 4,055 Times in 4,024 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Quote:
Originally Posted by Philip M View Post
Long ago, a senior manager of my company said to me "The trouble with you, Philip, is that you don't suffer fools gladly".
My response was "Oh, I wouldn't say that. I always thought that we got on pretty well together."
WOW! I loved that! You should send that to Scott Adams (the guy who created Dilbert) and suggest he use it.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:14 AM.


Advertisement
Log in to turn off these ads.