Go Back   CodingForums.com > :: Client side development > JavaScript programming

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 01-31-2013, 04:41 PM   PM User | #1
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,437
Thanks: 52
Thanked 453 Times in 451 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
another regex question

I am slowly getting my head around regex, but really it is mostly a mystery to me.

Here's the thing: I have a string (although I have no idea how that string will look). All I know is that the string will contain a word (I don't know what that word is either). I don't know if the string will be a paragraph, a sentence or a sentence fragment (the sentence may be cut off, either at the start or the end).

But I need to get as much of the sentence containing the word as possible, without getting too much.

So I figure that these are the "rules":

- Start capturing from the closest word before the variable word that starts with a capital/uppercase.
- If there is no word that starts with a capital before the variable word, start capturing from the start of the string.
- Equally, if the part of the string after the variable word contains a full stop/period, finish capturing at the full stop.
- If not, capture until the end of the string.

I know it's not perfect logic, but it doesn't have to be - all I want to do is to be able to show the word in some sort of context, like Word does when you do spellcheck.

Any suggestions?

Last edited by xelawho; 01-31-2013 at 04:50 PM.. Reason: clarifying
xelawho is offline   Reply With Quote
Old 01-31-2013, 06:43 PM   PM User | #2
AndrewGSW
Senior Coder

 
Join Date: Apr 2011
Location: London, England
Posts: 2,120
Thanks: 15
Thanked 354 Times in 353 Posts
AndrewGSW will become famous soon enough
Something like this:

Code:
(?:^|\.)\s?([^.]*wibble[^.]*)(?:$|\.)
You can test it here.

But I haven't tried to match a capital letter..
__________________
"I'm here to save your life. But if I'm going to do that, I'll need total uninanonynymity." Me Myself & Irene.
Validate your HTML and CSS
AndrewGSW is offline   Reply With Quote
Old 01-31-2013, 06:48 PM   PM User | #3
AndrewGSW
Senior Coder

 
Join Date: Apr 2011
Location: London, England
Posts: 2,120
Thanks: 15
Thanked 354 Times in 353 Posts
AndrewGSW will become famous soon enough
This version

Code:
(?:^|\.|\;)\s?([A-Z][^.]*wibble[^.]*)(?:$|\.)
looks either for a full-stop or semi colon, and the sentence should start will a capital letter.
__________________
"I'm here to save your life. But if I'm going to do that, I'll need total uninanonynymity." Me Myself & Irene.
Validate your HTML and CSS
AndrewGSW is offline   Reply With Quote
Old 01-31-2013, 07:11 PM   PM User | #4
Philip M
Supreme Master coder!

 
Philip M's Avatar
 
Join Date: Jun 2002
Location: London, England
Posts: 17,036
Thanks: 197
Thanked 2,411 Times in 2,389 Posts
Philip M has a spectacular aura aboutPhilip M has a spectacular aura aboutPhilip M has a spectacular aura about
Here's my suggestion:-

Code:
<html>
<head>
</head>
<body>

Enter word to find <input type = "text" id = "theword" onblur = "findit()">

<script type = "text/javascript">

var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam ipsum leo, scelerisque at dapibus ac, consectetur vel ipsum. Morbi et metus ut diam molestie ullamcorper. Suspendisse rutrum semper semper. Donec volutpat neque in lorem tempus scelerisque. Curabitur dignissim rhoncus quam ac suscipit. Donec viverra quam lobortis neque porta a sagittis urna tristique. Suspendisse nec lacus nisi. Pellentesque fermentum massa sit amet magna hendrerit vestibulum. Sed elit libero, scelerisque eu eleifend ut, interdum gravida nunc. Etiam ut nisi sapien, et tempus sem. Nam vel mi est. Mauris congue felis ut ante bibendum vehicula. Nullam nec sapien arcu, eget cursus lorem. Donec blandit, dolor tristique ornare dictum, arcu sapien vulputate dolor, et placerat risus odio ut magna. Ut magna mauris, pellentesque at ultricies vitae, fermentum vitae dolor."

//var ts = text.split(/\.|;/);   // split at period or semi-colon
var ts = text.split(".");  // split at period only

function findit() {
var intext = false;
for (var i=0; i < ts.length; i++) {
var found = false;
var tofind = document.getElementById("theword").value;
var regexp = new RegExp(tofind, 'gi');	 // setting regex case insensitive and global
if (regexp.test(ts[i])) {
found = true;
intext = true;
}
if (found) {alert ("The word " + tofind + " was found in the sentence:- " + "\n" + ts[i])}
}
if (!intext) {alert ("The word " + tofind  + " was not found.")}

}

</script>

</body>
</html>
Christians only have one spouse. This is called monotony.
- Pupil's answer to Catholic Elementary School test.
__________________

All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.

Last edited by Philip M; 01-31-2013 at 07:21 PM..
Philip M is offline   Reply With Quote
Old 01-31-2013, 07:53 PM   PM User | #5
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,437
Thanks: 52
Thanked 453 Times in 451 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
thanks Andrew - the first one was very close. I changed it to
Code:
(?:|^)?[\w]([^.]*wibble[^.]*)($:|\.|\?|\!|$)
to start the capture at the beginning of the sentence ort the beginning of the string, instead of the end of the previous one, and to end on a full stop, exclamation, question mark or just the end of the string

seems right to me. Thank you both for your suggestions.
xelawho is offline   Reply With Quote
Old 01-31-2013, 08:01 PM   PM User | #6
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,437
Thanks: 52
Thanked 453 Times in 451 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
no, wait - that doesn't work. it ends if the sentence ends with a full stop, but keeps going if it is a ! or ?
xelawho is offline   Reply With Quote
Old 01-31-2013, 08:02 PM   PM User | #7
Philip M
Supreme Master coder!

 
Philip M's Avatar
 
Join Date: Jun 2002
Location: London, England
Posts: 17,036
Thanks: 197
Thanked 2,411 Times in 2,389 Posts
Philip M has a spectacular aura aboutPhilip M has a spectacular aura aboutPhilip M has a spectacular aura about
Use mine!

Code:
var ts = text.split(/\.|;|\?|!/);   // split at period or semi-colon or ? or !
Does your regex allow you to find a variable word? Or a phrase? Not just wibble!
__________________

All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.

Last edited by Philip M; 01-31-2013 at 08:07 PM..
Philip M is offline   Reply With Quote
Old 01-31-2013, 08:23 PM   PM User | #8
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,437
Thanks: 52
Thanked 453 Times in 451 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
Here's the thing: Lets say the string is this:
"The dog jumped over the moon. He was happy to see me. I left in a hurry"

and the word is "happy"

in that case, all I want is
"He was happy to see me."

If it's
"was happy to see me. I left in a hurry"

all I want is
"was happy to see me."

If it's
"The dog jumped over the moon. He was happy to see"

all I want is:
"He was happy to see"

splitting it on the punctuation is probably the safest way, but then I have to loop through the array to find out which split is the one that I want. Which is why regex seems to be the answer...
xelawho is offline   Reply With Quote
Old 01-31-2013, 08:43 PM   PM User | #9
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,198
Thanks: 59
Thanked 3,996 Times in 3,965 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
And what about
"aardvarks whistle. happy dogs bark"
???

What do you want to get out of that?

Logically, it would be "happy dogs bark", as the period before "happy" belongs in another sentence. But it's your call.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 01-31-2013, 08:49 PM   PM User | #10
xelawho
Senior Coder

 
xelawho's Avatar
 
Join Date: Nov 2010
Posts: 2,437
Thanks: 52
Thanked 453 Times in 451 Posts
xelawho will become famous soon enoughxelawho will become famous soon enough
in that case I would want happy dogs bark

but sentences will always begin with a capital, and end with . or ! or ?

the problem is that the string that contains the word may not be a complete sentence.
xelawho is offline   Reply With Quote
Old 01-31-2013, 09:18 PM   PM User | #11
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,198
Thanks: 59
Thanked 3,996 Times in 3,965 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Here's my answer.

I'll let you figure out if you can combine the 4 regexp's into one.

Note that I stop on the first match, because some text patterns will match more than one of the regexps, but the regexps are purposely ordered by most desirable match.

The hack to get rid of a leading period is just that: a hack. But it works.

Code:
<script type="text/javascript">
function findSentenceByWord( text, word )
{
    var re1 = new RegExp( "[A-Z\\.][^A-Z\\.]+?" + word + "[^\\.\\?\\!]*[\\.\\?\\!]", "" );
    var re2 = new RegExp( "^[\\s\\S]*?" + word + "[^\\.\\?\\!]*[\\.\\?\\!]", "" );
    var re3 = new RegExp( "[A-Z\\.][^A-Z\\.]+?" + word + "[\\s\\S]*$", "" );
    var re4 = new RegExp( "^[\\s\\S]*?" + word + "[\\s\\S]*$", "" );
    var res = [ re1, re2, re3, re4 ];
    for ( var r = 0; r < res.length; ++r )
    {
        var re = res[r];
        if ( re.test( text ) )
        {
            document.write("Match on regexp " + (r+1) + "<br/>");
            var m = text.match(re)[0];
            if ( m.charAt(0) == "." ) { m = m.substring(1); }
            document.write( m + "<br/>");
            return;
        }
    }
}

function demo( text, word )
{
    document.write( "<hr/>Testing <i><b>" + text + "</b></i> for word " + word + "<br/>" );
    findSentenceByWord( text, word );
}    

demo( "The dog jumped over the moon. He was happy to see me. I left in a hurry", "happy" );
demo( "was happy to see me. I left in a hurry", "happy" );
demo( "The dog jumped over the moon. He was happy to see", "happy" );
demo( "aardvarks whistle. happy dogs bark", "happy" );
demo( "happy happy happy! and even more happy?", "happy" );
demo( "all the happy dogs", "happy" );</script>
I dump out which regexp matched so that you can see that indeed all 4 are needed, depending on the input.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 01-31-2013, 09:20 PM   PM User | #12
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,198
Thanks: 59
Thanked 3,996 Times in 3,965 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Quote:
Originally Posted by xelawho View Post
in that case I would want happy dogs bark

but sentences will always begin with a capital, and end with . or ! or ?
If that is true, why did you include this example:
Quote:
If it's
"was happy to see me. I left in a hurry"
"was happy to see me." does not start with a capital letter.

My answer includes code to handle that case. It could be less code if you were *SURE* that a sentence always starts with a capital letter.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 01-31-2013, 09:22 PM   PM User | #13
AndrewGSW
Senior Coder

 
Join Date: Apr 2011
Location: London, England
Posts: 2,120
Thanks: 15
Thanked 354 Times in 353 Posts
AndrewGSW will become famous soon enough
This revision
Code:
(?:|^)?[\w]([^.]*wibble[^.]*)($:|\.|\?|\!|$)
is incorrect. Should be
Code:
(?:^|\.|\?\!)?[\w]([^.]*wibble[^.]*)(?:\.|\?|\!|$)
(?: denotes a non-capturing group, and the | at the beginning was incorrect. So the previous sentence might also end with a ? or !
__________________
"I'm here to save your life. But if I'm going to do that, I'll need total uninanonynymity." Me Myself & Irene.
Validate your HTML and CSS
AndrewGSW is offline   Reply With Quote
Old 01-31-2013, 09:34 PM   PM User | #14
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,198
Thanks: 59
Thanked 3,996 Times in 3,965 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Here's a slightly better version. Handles the sentence *before* "happy" ending with ? or ! (not just period).

Has the interesting effect of changing *which* "happy" is found in demo #5. If you really wanted the first one found, I could fix it to do that. But I'm assuming that's a case you aren't too worried about.
Code:
<script>
function findSentenceByWord( text, word )
{
    var re1 = new RegExp( "[A-Z\\.\\?\\!][^A-Z\\.\\?\\!]+?" + word + "[^\\.\\?\\!]*[\\.\\?\\!]", "" );
    var re2 = new RegExp( "^[\\s\\S]*?" + word + "[^\\.\\?\\!]*[\\.\\?\\!]", "" );
    var re3 = new RegExp( "[A-Z\\.\\?\\!][^A-Z\\.\\?\\!]+?" + word + "[\\s\\S]*$", "" );
    var re4 = new RegExp( "^[\\s\\S]*?" + word + "[\\s\\S]*$", "" );
    var res = [ re1, re2, re3, re4 ];
    for ( var r = 0; r < res.length; ++r )
    {
        var re = res[r];
        if ( re.test( text ) )
        {
            document.write("Match on regexp " + (r+1) + "<br/>");
            var m = text.match(re)[0];
            m = m.replace( /^[\.\?\!]?\s*/, "" );
            document.write( m + "<br/>");
            return;
        }
    }
}

function demo( text, word )
{
    document.write( "<hr/>Testing <i><b>" + text + "</b></i> for word " + word + "<br/>" );
    findSentenceByWord( text, word );
}    

demo( "The dog jumped over the moon. He was happy to see me. I left in a hurry", "happy" );
demo( "was happy to see me. I left in a hurry", "happy" );
demo( "The dog jumped over the moon. He was happy to see", "happy" );
demo( "aardvarks whistle. happy dogs bark", "happy" );
demo( "aardvarks whistle dixie! happy dogs bark", "happy" );
demo( "happy happy happy! and even more happy?", "happy" );
demo( "all the happy dogs", "happy" );
</script>
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 01-31-2013, 09:41 PM   PM User | #15
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,198
Thanks: 59
Thanked 3,996 Times in 3,965 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Andrew: I'm pretty sure this is wrong:
(?:^|\.|\?\!)

The ^ character only means negation when used inside of [ ].

In any case, you forgot the | between \? and \! if you were looking for "or" conditions. And also, in any case, you are missing parens.

But I'm pretty sure that should be
(?:[^\.\?\!])
But I think that
(?!(\.|\?|\!))
would also work. ?! is a *negative* non-capture. The ! is the negation character for captures, not the ^

Did you test it? Against many samples, as I did?

*********

EDIT: I did test it.

I tested both your version:
/(?:^|\.|\?|\!)?[\w]([^.]*happy[^.]*)(?:\.|\?|\!|$)/
(I added the missing | before the first \!)

And my modification:
/(?:[^\.\?\!])?[\w]([^.]*happy[^.]*)(?:([\.|\?|\!]|$))/;

Neither passed all tests.
Neither could find "happy" in aardvarks whistle. happy dogs bark

Neither isolated the sentence in either
aardvarks whistle dixie! happy dogs bark
or
happy happy happy! and even more happy?
(that is, in both cases they returned the entire test string)

I will say that your (?:^|\.|\?|\!) seemed to have mostly worked. Surprised me.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

Last edited by Old Pedant; 01-31-2013 at 09:56 PM..
Old Pedant is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:14 AM.


Advertisement
Log in to turn off these ads.