View Full Version : article continued... ?
craigh@mac.com
09-15-2002, 10:05 PM
Hi,
I'm looking for a PHP code sample on how to do the following:
1. compare a string length (in words) to a constant (say 248).
2. Find the end of sentence (period) that is closest to that point.
3. truncate the string at that sentence end.
4. return the truncated string and let me know if it has been truncated or not.
any ideas? Thanks!
Spookster
09-16-2002, 01:44 AM
So you are wanting to count the number of words in a string and then strip off everything after the period and compare the number to a contstant and then return the processed string?
What is the big picture you are trying to accomplish here or what is this for?
craigh@mac.com
09-16-2002, 02:01 AM
it is part of a content management system. I want to display the first several sentences of an article and then put in a link to view the rest on a separate page. You've probably seen it done on many websites like slashdot and others. I just am hung up on the way to display only the first paragraph or so...
Spookster
09-16-2002, 03:29 AM
Oh ok. That makes more sense now. :) Is your CMS using a DB backend and is the content being stored there?
craigh@mac.com
09-16-2002, 11:20 AM
yes
craigh@mac.com
09-17-2002, 11:44 PM
so... now that you got all the info, are you able to help me? or were you just curious?:confused:
Spookster
09-18-2002, 04:16 AM
I was just curious to see if you were going to ask if I could help you. :D
ok ok
I threw together a quick example. I created a function to take whatever string you pass to it and count the words assuming single spaces between words and then return a shortened version based off of how many words you want returned.
The tricky part is cleaning up the string before you count the words. You can add to the cleaning up part if you wish. For this example I removed carriage returns and double spaces and replaced them with single spaces.
Anyways this will give you a start on what you want to do. Obviously your string will not be coming from a form but instead a query from your database but I didn't feel like setting all that up on my system here so I just used a form. Easier to test it that way before pulling from a DB.
<form name="foo" method="post" action="<?php echo $PHP_SELF ?>">
<textarea name="input" cols="50" rows="6" wrap="VIRTUAL" id="input"></textarea>
<br>
<input type="submit" name="Submit" value="Submit">
</form>
<?php
$input = $_POST["input"];
function process_string($string) {
// Number of words to return.
$word_limit = 5;
// Remove line breaks, double spaces, etc from the string
// You can add to this section to clean up the string
// before attempting to count the number of words.
$string = str_replace("\n"," ",$string);
$string = str_replace(" "," ",$string);
// Split the string into an array of words.
// It is set to split when a single space is found
$word_array = explode(" ", $string);
// Counts the number of elements in the array
// which should be about how many words were in the string
$num_of_words = count($word_array);
// Trims the array of words. Removes all words after
// what is specified in the $word_limit
$word_array_trimmed = array_splice ($word_array, 0, $word_limit);
// Converts the array of words back into a string
$final_string = implode(" ",$word_array_trimmed);
return $final_string;
}
echo "<br><br>" . process_string($input);
?>
craigh@mac.com
09-18-2002, 09:42 PM
OK - now we're getting somewhere. I understand what you've done, but you've missed a step from my original question. I don't want to just truncate the string after a certain number of words - I want to truncate it at the end of the sentence that is nearset to (either side of) the selected cutoff point. SO - is there a way to find the nearset word that ends in a period (".")? then cut it off there?
Spookster
09-18-2002, 10:17 PM
Personally I would just truncate it as I had done it and then just add ... <a href="">Read Full Story</a> at the end of it.
Trying to truncate it so that a sentence stays in tact will be very tricky. You would have to search the string for a period but unfortunately a period doesn't necessarily mean it's the end of the sentence. Could be used as a decimal point, or someone could just arbitrarily put in a period which would throw everything off.
Are these articles being submitted by users of the site or by many people? Are they being proofread and cleaned up before being posted?
What defines being closest to the word limit? Can it go over the limit or does it need to stay under the limit? I've written several string parsing programs in Java so the logic is easy but every parser needs rules or guidelines it has to follow.
craigh@mac.com
09-18-2002, 11:05 PM
You are right about the period being tricky. I'm not sure how to handle that, but I have added to your code to get the basic functionality. There is some troubleshooting code still left in there. Any suggestions on how to get around special cases of decimal points and abbreviations? Thanks for all your help!
<HTML>
<HEAD>
<TITLE>.</TITLE>
</HEAD>
<BODY>
<form name="foo" method="post" action="<?php print($PHP_SELF); ?>">
<textarea name="input" cols="50" rows="6" wrap="VIRTUAL" id="input"></textarea>
<br>
<input type="submit" name="Submit" value="Submit">
</form>
<?php
//$input=$POST["input"];
function process_string($string) {
// Number of words to return.
$word_limit=145;
print ($word_limit."<BR><BR>");
// Remove line breaks, double spaces, etc from the string
// You can add to this section to clean up the string
// before attempting to count the number of words.
$string=str_replace("\n"," ",$string);
$string=str_replace(" "," ",$string);
// Split the string into an array of words.
// It is set to split when a single space is found
$word_array=explode(" ",$string);
// Counts the number of elements in the array
// which should be about how many words were in the string
$num_of_words=count($word_array);
// find the words that end in a period - assume they are the end of sentences
function find_per($str) {
return (substr($str, -1)==".");
}
$endings=array_filter($word_array, "find_per");
reset($endings);
print ("<PRE>");
print_r($endings);
print ("</PRE>");
reset($endings);
// find the closest sentence end to the word limit
$x=1000000; // start with a very large value
while (list($k,$v)=each($endings)) {
$y=abs($word_limit-$k); // y is the distance from the word limit to the end of the sentence.
if ($y<$x) {
$x=$y; // x is now the distance from the word limit to the end of the sentence
$cut_key=$k; // cut_key stores the key of the nearest end-of-sentence
}
}
$cut_key++;
print ($cut_key."<BR><BR>");
// Trims the array of words.__Removes all words after
// what is specified in the $word_limit
$word_array_trimmed=array_splice($word_array,0,$cut_key);
// Converts the array of words back into a string
$final_string=implode(" ",$word_array_trimmed);
return ($final_string);
}
if ($input) {
print ("<br><br>".process_string($input));
print ("<HR>".$input);
} // end if input
?>
</BODY>
</HTML>
Spookster
09-18-2002, 11:49 PM
Just a couple of comments first. You have a function defined inside a function. I wouldn't use two functions for this anyways. The purpose of all of this is to process the string. Also i'm guessing your server is set up with register_globals set to on but since the newest versions of php have defaulted that setting to off for degradation purposes I would get into the habit of properly referring to POST and GET variables as it will bite you in the behind later on.
If Globals are set to off:
$input = $_POST["input"];
If Globals are set to on:
$input = $HTTP_POST_VARS["input"];
with globals set to on php will allow you to simply access the variable no matter what but that if that setting gets changed it will come back to haunt you if you have to hunt through all of your code to change all occurrences of the variable. This holds true for any programming language though. It is proper to declare and initialize all your variables in one location then just refer to that one later on. That way you only have to change one verses many.
Ok enough rambling. I hadn't thought of abbreviations. Yes that one alone would be pretty much impossible to detect. There is really no way of determing if it is a word at the end of a sentence or just an abbreviation. For example:
Foo my bar.
Foo my bar in oct. of this year.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.