Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts

    Function to strip out non-alphabetical characters ?

    Hi,

    Is there any function to strip out non-alphabetical characters so that
    I only get words left ?

    I don't need anything with numbers or other non-alphabetical characters
    All I want is the words.

    Is there any function for that ? or do I have to use
    a regex for that ?

  • #2
    met
    met is offline
    Regular Coder
    Join Date
    Oct 2009
    Location
    United Kingdom
    Posts
    728
    Thanks
    4
    Thanked 119 Times in 119 Posts
    quick google revealed

    PHP Code:
        function allowAlphabets($string){
     
            
    //create an array which has only allowed characters set
            
    $allow_characters=array('a','b','c','d','e','f','g','h','i','j','k',
            
    'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B',
            
    'C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S',
            
    'T','U','V','W','X','Y','Z');
     
            
    //start of regx pattern
            
    $pattern "@[^(";
     
            
    //generate a regx pattern 
            
    foreach ($allow_characters as $char) {
                
    $pattern .= preg_quote($char"@");
            }
     
            
    //close the regx pattern string
            
    $pattern .= ")]@";
     
            
    //replace all non alphabets characters with space
            
    $after preg_replace($pattern" "$string);
     
            
    //prints the result string with having only alphabets
            
    return $after;
        } 
    http://blog.sachinkraj.com/how-to-st...from-a-string/

    but yes it could be accomplished with regex as well.

  • #3
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    PHP Code:
    $alpha preg_replace'/[^a-z\s]/i'''$text );

    $words preg_split'/\s+/'$alpha, -1PREG_SPLIT_NO_EMPTY );

    print_r$words ); 
    Note that punctuated words such as don't will turn up unpunctuated (is that even a word?), ie: dont.
    Last edited by kbluhm; 11-04-2009 at 10:20 PM.

  • #4
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    That seems a bit more complicated than necessary if just looking for letters and spaces. Won't a simple preg_replace like this work?
    PHP Code:
    $str preg_replace('/[^\w ]/i','',$str); 
    Test with this:
    PHP Code:
    $str 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus commodo ipsum vel lorem fermentum pretium. Ut lacus lorem, tempus et condimentum at, aliquam in quam. Sed vulputate orci non lectus varius non blandit odio ornare. Nulla vulputate mi tristique magna facilisis pulvinar. Nullam mattis tincidunt cursus. Ut semper mollis sollicitudin. Vivamus varius velit in velit lacinia sed tincidunt libero tincidunt. Nullam nulla urna, consectetur ut mattis sed, sollicitudin et mauris. Cras laoreet placerat tellus, in vulputate ipsum pharetra id. In tellus metus, bibendum ut vulputate vel, congue ac odio. Etiam tempor consequat tellus, vel ultrices mauris laoreet eget. Cras felis felis, tristique eu aliquet vitae, tincidunt non purus. Ut bibendum pellentesque risus ut porta. Cras vel nibh mauris. Etiam sollicitudin gravida felis quis dictum. Nullam hendrerit scelerisque tellus ac mollis. Duis scelerisque, ante dictum mollis mattis, nibh risus eleifend nunc, at egestas tortor quam nec augue. Nullam eleifend est ut neque facilisis ac volutpat orci commodo. ';
    echo 
    $str,"<br/><br/>";
    $str preg_replace('/[^\w ]/i','',$str);
    echo 
    $str,"<br/><br/>"
    When you say "words", I presume you mean you want to keep the spaces, too. If not, remove the space from the regex.

    EDIT: kbluhm got it while I was pasting.
    Are you a Help Vampire?

  • #5
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    Thanks for all the replies

    To clarify, I don't want to change any data, just ignore the non-alpabetical
    blocks, so "%6fgTw" and "Gr^>>ht))" should both be ignored but "henry"
    would make it in.

    What does the w
    in ('/[^\w ]/i','',$str); mean ?

    I guess that it means word ?
    Is that a dictionary word or any alphabetical block of letters surrounded by spaces ?

    I was passed this regex which finds 3 and 4 word phrases
    that have 1-5 letters in them.

    #((?:\b\w{1,5}\b\s+){3,4})#

    If someone can talk me through this, I would really like to understand it.
    what is the b for ?

    Thanks

  • #6
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    Quote Originally Posted by jeddi View Post
    To clarify, I don't want to change any data, just ignore the non-alpabetical
    blocks, so "%6fgTw" and "Gr^>>ht))" should both be ignored but "henry"
    would make it in.
    That's different from your original problem description of removing non-alphabetic characters. You rather want to remove any words which contain any non-alphabets. EDIT: Correcting myself after a re-read, you don't even want to remove, but just ignore. Clarify what you're trying to do - in specific terms.

    Quote Originally Posted by jeddi View Post
    What does the w
    in ('/[^\w ]/i','',$str); mean ?

    I guess that it means word ?
    Is that a dictionary word or any alphabetical block of letters surrounded by spaces ?
    That's short-hand for a word character (a letter).

    Quote Originally Posted by jeddi View Post
    I was passed this regex which finds 3 and 4 word phrases
    that have 1-5 letters in them.

    #((?:\b\w{1,5}\b\s+){3,4})#

    If someone can talk me through this, I would really like to understand it.
    what is the b for ?
    B is for word breaks - spaces, tabs, maybe more. I'm not a regex specialist, but I think replacing the {1-5} with a .* will allow that to grab any alphabetic word while ignoring mixed character "words". Removing the {3,4} ought to remove the 3 or 4 words restriction.
    Are you a Help Vampire?

  • #7
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    Thats great,
    Thanks Tom for helping out
    I do look at the manual a lot but finding these
    little bits is sometimes hard, so I appreciate the explanation.

    BTW -

    in:
    #((?:\b\w{1,5}\b\s+){3,4})#

    any idea what the s means ?

  • #8
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    Quote Originally Posted by jeddi View Post
    in:
    #((?:\b\w{1,5}\b\s+){3,4})#

    any idea what the s means ?
    Any whitespace character.

    http://www.php.net/manual/en/regexp.....backslash.php


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •