...

View Full Version : regex to truncate a string



dschmierer
12-20-2006, 03:03 AM
Can any regex experts out there help me cut off a string after ten words?

For example I have string such as:
one two three four five six seven eight nine ten eleven twelve thirteen
(note has no commas or punctuation, only spaces seperate the words)

I'd like to use an ereg_replace function (or something like that) and a regular expression to get rid of everything after the word ten in this string

Is there a way to match everything after the first 10 spaces and get rid of it?

Thanks

mlseim
12-20-2006, 03:55 AM
I found this example on the internet.
I like this example because you can pick how many words to truncate:



<?php

$string="one two three four five six seven eight nine ten eleven twelve thirteen";

echo "The answer is: <b>".word_trim($string,10)."</b>";

function word_trim($string, $count, $ellipsis = FALSE){
$words = explode(' ', $string);
if (count($words) > $count){
array_splice($words, $count);
$string = implode(' ', $words);
if (is_string($ellipsis)){
$string .= $ellipsis;
}
elseif ($ellipsis){
$string .= '&hellip;';
}
}
return $string;
}

?>


Here's the output:

The answer is: one two three four five six seven eight nine ten

NOTE:
I forgot to mention, if "$ellipses = TRUE" it puts ... at the end.
like, "eight nine ten ... "


.

_Aerospace_Eng_
12-20-2006, 03:58 AM
Use preg_split() and then use the array_pop() function in a for loop to get rid of the last elements past 10.

<?php
$string = "one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen";
echo "The string is:<br>$string<br><br>";
$array = preg_split('/[\s]+/',$string);
$total = count($array) - 10;
for($i = 0; $i < $total; $i++) // we want to pop the last element of the array until we get to the count of the array minus 10
{
array_pop($array); // takes off last element in array until for loop condition is met
}
echo "The new string is:<br>";
foreach($array as $key) // grabs all array values and echos them
{
echo "$key ";
}
?>

dschmierer
12-20-2006, 05:07 AM
thanks for the info guys. i went on a brief journey around the net to look up some stuff also and here's what i was able to cobble together. my original impression that using regex functions to accomplish the string shortening would be easier and more elegant is probably not the case. anyway, for what it's worth:

regex pattern to match first 10 words: "^([a-zA-Z0-9]* ){0,9}[a-zA-Z0-9]*";
([a-zA-Z0-9]* ) - matches a substring of letters and nums of any length followed by a space ' '
{0,9} - does this from 0 - 9 times
[a-zA-Z0-9]* - matches the 10th substring of letters and nums (note: doesn't caputure a white space if there is one)

then you can put this in a function like so:
function chop_str($str, $items){
$pattern = "^([a-zA-Z0-9]* ){0,".($items-1)."}[a-zA-Z0-9]*";
$matches = array();
if(eregi($pattern, $str, $matches)){
return rtrim($matches[0]); //possible to capture extra white space at end so trim it off
} else {
return $str;
}
}

$matches[0] contains the first match of the regex
you can read more about ththe eregi function on the php.net website

_Aerospace_Eng_
12-20-2006, 05:31 AM
Hmm I think my solution is a little better since it catches any amount of white spaces. Its also a little more efficient (less code). If you wanted to, you could put it in a function like so

<?php
$string = "one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen";
trimString($string);
function trimString($s)
{
$array = preg_split('/[\s]+/',$s); // assigns each word to array cell
$total = count($array) - 10;
for($i = 0; $i < $total; $i++) // we want to pop the last element of the array until we get to the count of the array minus 10
{
array_pop($array); // takes off last element in array until for loop condition is met
}
foreach($array as $key)
{
echo "$key ";
}
}
?>

marek_mar
12-20-2006, 06:59 AM
array_splice() should be better than the for loop, implode() should be better than the foreach() loop.


<?php
$string = "one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen";
print trimString($string);
function trimString($s)
{
$array = preg_split('/[\s]+/',$s); // assigns each word to array cell
array_splice($array, count($array) - 10); // takes off last 10 elements? Shouldn't this just leave the first 10?
return implode(' ', $array);
}
?>

_Aerospace_Eng_
12-20-2006, 07:39 AM
Hmm that is taking all of them off except the number of count - 10. Reversing it making it 10-count returns the correct number of words

array_splice($array, 10 - count($array));

ralph l mayo
12-20-2006, 06:03 PM
It's easier to match the first ten words and keep them than to match everything else and discard it:



$str = 'one two three four five six seven eight nine ten eleven twelve thirteen fourteen';
preg_match('/\A(\w+\s+){0,10}/xms', $str, $match);
echo $match[0];


edit: oops, missed the post above with nearly the same answer. carry on



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum