Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    2,775
    Thanks
    55
    Thanked 519 Times in 516 Posts

    Regex to remove punctuation before/after dictionary words

    I know. I should just learn it. But every tutorial I look at just makes my head spin.

    So here's the thing: I get a bunch of strings that are single words. I don't know what they are, so it has to be dynamic. But I have to strip out the punctuation that they come with, outside of the word boundaries.

    So:
    (anyway) should become anyway
    and/ becomes and
    or, becomes or
    'cool' becomes cool
    but they're remains they're
    and co-produce stays co-produce

    Seems simple, but google is not my friend, once again.

    thanks in advance for any suggestions. And if anybody knows of a non- head spinny regex tutorial, I'd love to see it.
    Last edited by xelawho; 02-05-2013 at 07:51 PM.

  • #2
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,734
    Thanks
    202
    Thanked 2,508 Times in 2,486 Posts
    Quote Originally Posted by xelawho View Post
    I know. I should just learn it. But every tutorial I look at just makes my head spin.

    So here's the thing: I get a bunch of strings that are single words. I don't know what they are, so it has to be dynamic. But I have to strip out the punctuation that they come with, outside of the word boundaries.

    So:
    (anyway) should become anyway
    and/ becomes and
    or, becomes or
    'cool' becomes cool
    but they're remains they're
    and co-produce stays co-produce

    Seems simple, but google is not my friend, once again.

    thanks in advance for any suggestions. And if anybody knows of a non- head spinny regex tutorial, I'd love to see it.
    Code:
    <script type = "text/javascript">
    
    var x = "So: I wonder how I can remove (these brackets)... also and/ or, \"this\" 'cool' apostrophe, [they're] co-produce (anyway)."
    x = x.replace(/\b[-.,:;()&$#!\[\]\/{}"']+\B|\B[-.,:;()&$#!\[\]\/{}"']+\b/g, "");
    
    alert (x);
    
    </script>

    You will find excellent regex tutorials not a million miles from here at
    http://www.javascriptkit.com/javatutors/re.shtml
    http://www.javascriptkit.com/javatutors/redev.shtml

    You can test your regular expressions at: http://www.claughton.clara.net/regextester.html


    A man generally has two reasons for doing things - the one that sounds good, and the real one. - J.P.Morgan
    Last edited by Philip M; 02-04-2013 at 08:53 AM.

    All the code given in this post has been tested and is intended to address the question asked.
    Unless stated otherwise it is not just a demonstration.

  • #3
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    2,775
    Thanks
    55
    Thanked 519 Times in 516 Posts
    Thanks, Philip. The regex works great.

    thanks for the links, too, although I have already seen those ones and they too make my head spin. I'm beginning to think it's not the people who are explaining it who have the problem

  • #4
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,734
    Thanks
    202
    Thanked 2,508 Times in 2,486 Posts
    You could shorten the regex to

    Code:
    x = x.replace(/\b[^\w\s]+\B|\B[^\w\s]+\b/g, "");

    All the code given in this post has been tested and is intended to address the question asked.
    Unless stated otherwise it is not just a demonstration.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Hmmm...and what about V.I.Warshavski or Wm.P.Norquist, III ??

    Would you strip the periods and commas from them?

    Phillips code leaves the periods alone by zaps the comma. Is that what you want?

    But if the text is V. I. Warshavski and Wm. P. Norquist, III (note the spaces after the periods), then the periods get zapped, as well.
    Last edited by Old Pedant; 02-05-2013 at 03:00 AM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    2,775
    Thanks
    55
    Thanked 519 Times in 516 Posts
    seems to be OK. The code alert all the same words as the firefox inline spellchecker, which seems good enough. One thing, though - it seems the use of the \w amkes the code think that café ends at "f" - any way around that one?

  • #7
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,734
    Thanks
    202
    Thanked 2,508 Times in 2,486 Posts
    Quote Originally Posted by xelawho View Post
    seems to be OK. The code alert all the same words as the firefox inline spellchecker, which seems good enough. One thing, though - it seems the use of the \w makes the code think that café ends at "f" - any way around that one?
    Code:
    var x = "So: Théo, I wonder how I can remove (these brackets)... but not the é in café and/ or, \"this\" 'cool' apostrophe, [they're] co-terminous; (I believe!)."
    
    x = x.replace(/\b[^\w\s\u00E0-\u00FC]+\B|\B[^\w\s]+\b/g, "");  // shorter alternative, do not delete accented characters at end of words
    
    alert (x);
    If you only want small letter e with acute é the Unicode is \u00E9. I don't think that there are any other accented characters which can appear at the end of a word in (imported) English except perhaps e with grave è which is \u00E8. Obviously many foreign languages use accented characters. In Italian è means is. You might perhaps want to retain La donna è mobile. My code covers all accented lower-case characters and hence all eventualities.

    @Old Pedant - my understanding is that we are talking about dictionary words, not proper names. I don't see how any spell checker can check proper names. Some people even mis-spell Philip.
    Is there a usually comma in the rendering of Wm. P. Norquist, III ?

    @xelawoo - Might I repectfully suggest that you change your thread title to something more indicative of the content - such as "Regex to remove punctuation before/after dictionary words" which would perhaps be more helpful to people using the search feature of this forum.
    Last edited by Philip M; 02-05-2013 at 10:16 AM.

    All the code given in this post has been tested and is intended to address the question asked.
    Unless stated otherwise it is not just a demonstration.

  • Users who have thanked Philip M for this post:

    xelawho (02-05-2013)

  • #8
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    2,775
    Thanks
    55
    Thanked 519 Times in 516 Posts
    Quote Originally Posted by Philip M View Post
    @xelawoo - Might I repectfully suggest that you change your thread title to something more indicative of the content - such as "Regex to remove punctuation before/after dictionary words" which would perhaps be more helpful to people using the search feature of this forum.
    you might and I have. And I appreciate the respectful nature of the request. I have seen you make similar ones in not-so-diplomatic terms.

    Thanks for the new regex, too. Does exactly what it needs to do

  • #9
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,734
    Thanks
    202
    Thanked 2,508 Times in 2,486 Posts
    Quote Originally Posted by xelawho View Post
    you might and I have. And I appreciate the respectful nature of the request. I have seen you make similar ones in not-so-diplomatic terms.

    Thanks for the new regex, too. Does exactly what it needs to do
    My usual comment is:-
    Do please read the posting guidelines regarding silly thread titles. The thread title is supposed to help people who have a similar problem in future. Yours is useless for this purpose. You can (and should) edit it to make it more meaningful.

    That is aimed at newcomers, and silly thread titles such as "Help me" and "Urgent...deadline tomorrow!" (as per forum posting guidelines).
    Your original thread title was not silly, but could be made more useful as I suggested.

    You are right to deduce that I do not suffer fools gladly, although in your case I am willing to make an exception.

    Long ago, a senior manager of my company said to me "The trouble with you, Philip, is that you don't suffer fools gladly".
    My response was "Oh, I wouldn't say that. I always thought that we got on pretty well together."
    Last edited by Philip M; 02-06-2013 at 07:55 AM.

    All the code given in this post has been tested and is intended to address the question asked.
    Unless stated otherwise it is not just a demonstration.

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,020
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Quote Originally Posted by Philip M View Post
    Long ago, a senior manager of my company said to me "The trouble with you, Philip, is that you don't suffer fools gladly".
    My response was "Oh, I wouldn't say that. I always thought that we got on pretty well together."
    WOW! I loved that! You should send that to Scott Adams (the guy who created Dilbert) and suggest he use it.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •