View Full Version : perl substr - match

09-17-2009, 05:59 PM

I am reading a text file using perl script and want to show only few 1000 characters. This text file has the html tags like <p><a href..>ada</a> etc and the content changes daily. So sometimes the characters are getting cut off in the middle of the <a href ..> tag so the problem is it doenst close properly and then it messes up the other content below this by adding a link because of this unclosed tag. Is there any way to cut off properly like a "." (period) or look for the closing tag(</a> or </p>) .?

Please help me out.

the one line code that gets the content is

$content .= substr($text,0,2100);

09-17-2009, 08:04 PM
any help please?

09-18-2009, 03:10 AM
You'll need to post your code because we can't guess what it does.


09-18-2009, 03:16 PM

I have $text with the content and then I am using substr to show 2100 chars. I am not sure what else I should provide here other than what I have below.. $text is getting the content from the text file and that has html tags(mostly <p> and <a>).

$text_file = "story.txt";

$content1[0] = <FILE>;

$content1_html = '';

foreach $content1 (@content1) {
($date,$text) = split('===',$content1);
$content1_html .= substr($text,0,2000);

09-18-2009, 07:45 PM
I've re-read your post and I understand it differently this time. :rolleyes:

I think you will need to use a regex and make it split the 'text' after a </p> tag, perhaps, where it is the first one after say 1500 characters.

I am hopeless at regexes but I'll try to think it through overnight and post back if you haven't had an answer.


09-24-2009, 06:24 PM

I couldnt get it to work. Please let me know if you have any idea now.


09-25-2009, 01:09 AM
I see 2 key problems.

1. You're trying to manually parse an html file instead of using one of Perl's html parser modules.

2. Manually truncating the file to a hard coded and possibly arbitrary length is bound to cause problems that may not have a viable solution.

What is the source of the "story.txt" file?

After processing the file, what does the script do with it?

Why truncate the file at 2100 characters?

Have you looked at any of Perl's html parsers and text formatting modules?

10-06-2009, 06:36 PM

thanks for the reply.
this is what I am doing.
1) I open a text file (this get contents from other script when its run)
2) show the contents from the text file.

As the text file has html tags like <a href="'>sample text</a> or <p>sample text</p> its causing issues when its cutting off at the defined characters. I am cutting it off at 2100 characters as I just want to show just the snippet and not the complete text. So if the text around 2100th character has some <a> tag then creates the problem as its not closed and it flows to the next section on the page or the other solution for me is to cut the text after 4th or 5th <p> tag as the text file will contain atleast 8 paragraphs.

hope this is clear.