PDA

View Full Version : Counting Paragraphs...


welshhuw
03-17-2009, 06:12 PM
Hello,

I need to count the number of paragraphs from a text file.
Below is my code.
I have tried using reg exp's for this but i cant get it to work correctly.

Any help on this would be appreciated.


open(FILE, "<$filename") || die "ERROR: Could not open $filename: $!"; #Opens the file for reading.



my($data);
while ($data = getc(FILE))
{
if ($data eq "?" || $data eq "!" || $data eq ".") #Checks how many sentences.
{
$sentences++;
}
if ($data =~ m/\n/) #Counts lines.
{
$lines++;
}
if ($data =~ m/\w/) #Counts words.
{
$words++;
}
if ($data =~ m/\w*\s*/) #Counts characters.
{
$chars++;
}
if ($data =~ m//) #Counts Paragraphs.....??????HELP!
{
$paras++;
}
}



close(FILE); #Closes the file and prints all the statistics.

print("\n\nFile Statistics:\n\n");
print("Characters: $chars\t");
print("Sentences: $sentences\t");
print("Words: $words\t");
print("Lines: $lines\t");
print("Paragraphs: $paras\t");

FishMonger
03-17-2009, 06:52 PM
perldoc -q "How can I read in a file by paragraphs?"

welshhuw
03-18-2009, 12:30 PM
Ok, so ive got paragrpahs sorted now the sentences is throwing a warning: Uninitialized value $sentences at line 84 (which is where it is printed to screen).

Any ideas whats wrong?


$/ = "";
open(FILE, "<$filename") || die "ERROR: Could not open $filename: $!"; #Opens the file for reading.

$lines = 1;

my($data);
while ($data = getc(FILE))
{
if ($data =~ m/[.?!]/) #Checks how many sentences.
{
$sentences++;
}
if ($data =~ m/\n/) #Counts lines.
{
$lines++;
}
if ($data =~ m/\w/) #Counts words.
{
$words++;
}
if ($data =~ m/\w*\s*/) #Counts characters.
{
$chars++;
}
l

while (<FILE>)
{
$paras = $.; #Counts paragraphs.
}

}


close(FILE);

FishMonger
03-18-2009, 12:48 PM
You don't want to use getc.

All you need to do is set the input record separator as shown in the documentation I referenced, then use a while loop to read-in the file via the < > diamond operator.

On a side note, when creating the filehandle, it's better to use a lexical var instead of the bareword and use the 3 arg form of open.

open my $FILE, '<', $filename or die "ERROR: Could not open '$filename' $!";

$/ = "";

while ( my $paragraph = <$FILE> ) {

# process the paragraph

}

welshhuw
03-18-2009, 01:05 PM
hi got that figured out just before you posted!
But now its throwing my sentences out? see above edited post.

thanks for your help with all of this.

FishMonger
03-18-2009, 02:39 PM
As I previously said, you don't want to use getc() so start by removing this:

while ($data = getc(FILE))
{
if ($data =~ m/[.?!]/) #Checks how many sentences.
{
$sentences++;
}
if ($data =~ m/\n/) #Counts lines.
{
$lines++;
}
if ($data =~ m/\w/) #Counts words.
{
$words++;
}
if ($data =~ m/\w*\s*/) #Counts characters.
{
$chars++;
}
l

FishMonger
03-18-2009, 03:11 PM
if ($data =~ m/\w/) #Counts words.
No, it doesn't.

if ($data =~ m/\w*\s*/) #Counts characters.
No it doesn't.

You may want to look at using tr/// for 1 or 2 of your requirements.

C:\>perldoc -q "How can I count the number of occurrences of a substring within a string?"

welshhuw
03-18-2009, 03:27 PM
OK,
Taken the getc() function out.
Now I have all my answers but...
Sentences are 4 (But there are 5)
Paragraphs are 4 (But there are 3)

!?!?!?

open(FILE, "<$filename") or die "Can't open file: $!";


my($data); #variable to data file


while ($data = <FILE>)
{

$line = $.; ###counts lines

if ($data =~ m/^$/) ##Counts paragraphs.
{
$para++;
}


if($data =~ /[?!.]/) #Counts sentences.
{
$sentences++;
}

$chars+=length($data); ###counts each character

if ($data =~ m/\w/) #Counts words.
{
$words++;
}

}

close(FILE);

welshhuw
03-19-2009, 02:15 PM
Hi again!
I have nearly done this program now!
Last problem im getting is that my Paragraph count is incorrect if there is more than 1 new line between the paragraphs.

How can i get it to count paragraphs no matter how many new lines are between the actual paragraphs in the text file?
Ive tried loads of different Reg Exp's but cannot get it to work correctly.


if ($data =~ m/^$/) ##Counts paragraphs.
{
$para++;
}

FishMonger
03-19-2009, 02:35 PM
How can i get it to count paragraphs no matter how many new lines are between the actual paragraphs in the text file?
Why aren't you using the method I've already shown?

welshhuw
03-19-2009, 02:37 PM
Ive tried 3 different methods:
the other two seem to conflict with my sentences counter??

FishMonger
03-19-2009, 02:47 PM
Then you need to adjust how you're doing 1 or more of the other calculations.

1) Read/loop through the file in paragraph mode. (Hint, you're already using a built-in var that keeps track of the count)

2) Calculate the number of sentences in the paragraph and add it to its total count.

3) Calculate the number of words in the paragraph (or in each sentence) and add it to its total count.

4) Calculate the number of characters and add it to its total count.

FishMonger
03-19-2009, 02:56 PM
Here are a few docs that could be helpful.

perldoc -f split
perldoc -f scalar
perldoc -q "How can I count the number of occurrences of a substring within a string?"

Search for tr/SEARCHLIST/REPLACEMENTLIST/cds in this doc
perldoc perlop

welshhuw
03-19-2009, 03:02 PM
ok so ive started again. again!
And got the paragraphs counted great!
Now i cannot get the sentences working !!
Im sorry for being a pain as i guess you are pretty good at this and i am totally over my head with it!!

I appreciate your help greatly.

FishMonger
03-19-2009, 03:14 PM
Have you changed how you're counting the sentences?

The method you used in your prior post should work, but it's not the most efficient and since this is for a homework assignment, I hesitate to show the most efficient, but I have given you hints i.e., the documentation I pointed out.

welshhuw
03-19-2009, 03:53 PM
No doesnt work.
I will read the doc's you advised. and start again. again!


open(FILE, "<$filename") or die "Can't open file: $!";

my ($data); #variable to data file


$/ = "";

while (<FILE>) #Counts paragrpahs.
{
$para = $.;
}

if(<FILE> =~ /[?!.]/) #Counts sentences.
{
$sentences++;
}



close(FILE);

FishMonger
03-19-2009, 04:17 PM
You have a miss-understanding on how to loop through the file using the < > diamond operator.

Start with this:


while (my $paragraph = <FILE>) {

# $para = $.;

$sentences += scalar split /[?!.]/, $paragraph; #Counts sentences.

# now, calculate the words
# now, calculate the characters
}


Or
$sentences += $paragraph =~ tr/?!.//;

welshhuw
03-19-2009, 04:55 PM
Rite. The last bit code worked for me.
Thanks very very much!

No doubt i'll be back later tho!!

welshhuw
03-19-2009, 05:49 PM
Ok. Ive done everything now except counting lines.
The coding i used in my previous programs worked, but i cannot get them to work in this one.

Ive tried while, if and for(split) methods?
Perhaps im missing a little code in them?

Any help would be greatly appreciated thanks.

KevinADC
03-19-2009, 07:51 PM
You don't need to count lines, the $. special variable does that for you as has been shown in previous posts. Of course that counts all lines in the file. If you don't want to include blank lines you shouldn't use it but instead increment a counter for lines that are not blank.

welshhuw
03-19-2009, 08:03 PM
kevinADC i have tried $lines = $.;
But i have already used $paras = $.; to count the paragraphs.

So i cant seem to get the count of lines, including the blank lines.

FishMonger
03-19-2009, 08:52 PM
You need to count the number of \n characters.

welshhuw
03-19-2009, 09:04 PM
ok ive tried this:


if ($paragraph =~ m/\n/)
{
$lines++;
}

and $_ and $. But none of them return a correct result, if a result at all.

FishMonger
03-19-2009, 09:19 PM
A paragraph will have more than 1 line just like it has more than 1 sentence. Note, that's a big hint.

welshhuw
03-19-2009, 09:23 PM
Ive tried a split() to.
Maybe i should work on that more ??!!

I feel so thick...!! :p

welshhuw
03-19-2009, 09:43 PM
C'Mon FishMonger.

Put me out of my misery!!

Ha!

KevinADC
03-19-2009, 10:18 PM
open(FH,'somefile');
while(<FH>) {
some processing
}
close FH;
print "There are $. lines in the file, including blank lines\n";

FishMonger
03-20-2009, 12:27 AM
Kevin, I believe you either forgot the meaning of $. or that we changed $/ to paragraph mode.

$. is also called $INPUT_LINE_NUMBER, but more accurately it holds the current record number for the last filehandle you read from and is affected by the value in $/ ($INPUT_RECORD_SEPARATOR), which we reset.

So in this case, $. does not hold the count of the number of lines in the file.

FishMonger
03-20-2009, 12:31 AM
welshhuw,

Since this script is for a homework assignment, I can't (or won't) simply give you the complete answer, since that defeats the purpose of your class and is cheating.

I need to work on some personal things, but will give you some more pointers when I get back.

KevinADC
03-20-2009, 01:18 AM
Kevin, I believe you either forgot the meaning of $. or that we changed $/ to paragraph mode.

$. is also called $INPUT_LINE_NUMBER, but more accurately it holds the current record number for the last filehandle you read from and is affected by the value in $/ ($INPUT_RECORD_SEPARATOR), which we reset.

So in this case, $. does not hold the count of the number of lines in the file.

Thats what I get for not reading the whole thread. I didn't notice that you guys were messing with $/. :o

welshhuw
03-20-2009, 12:32 PM
OK,
I have been really racking my brains with this. I have tried all ways to get it to count!
I think i dont have a very good understanding of programming!

I have tried to figure out your hints...I understand i need to count the \n 's in the file, but i cannot get this count to work.

Will keep on trying today tho.

FishMonger
03-20-2009, 12:38 PM
Please post your code so I can see what adjustments it needs.

welshhuw
03-20-2009, 12:45 PM
I have no code for the lines count as nothing i did worked!


open($FILE, "<$filename") or die "Can't open file: $!";


$/ = "";

while (my $paragraph = <$FILE>) #Counts Paragraphs.
{

$para = $.;

$sentences += $paragraph =~ tr/?!.//; #Counts Sentences.

for (split(/\s+/, $paragraph)) #Counts words.
{
$words++;
}

$chars += length($paragraph); #counts each character.




}



close(FILE);

print("\n\nStatistics for file:\n\n");
print("Characters:\t $chars\n");
print("Words:\t\t $words\n");
print("Lines:\t\t $lines\n");
print("Sentences:\t $sentences\n");
print("Paragraphs:\t $para\n");

FishMonger
03-20-2009, 01:28 PM
The code needed to count the lines is done exactly like the code used to count the sentences except you need to make an adjustment to the tr/// portion of the statement.

tr/?!.//

becomes
tr/\n//

$para = $.; can be moved outside of the while loop. Put it just before the close statement.

welshhuw
03-20-2009, 01:47 PM
ive moved the $para statement outside the while loop and added this code into the while loop. I dont think its counting correctly though. it doesnt seem to give me a pattern as to why? sometimes it counts blacnk lines other times it doesnt?
I have tried this line of code before.

[CODE]$lines += $paragraph =~ tr/\n//;[CODE]

FishMonger
03-20-2009, 02:18 PM
See if this gives the results you want.

Change:
$/ = "";


To:
$/ = "\n\n";

welshhuw
03-20-2009, 03:44 PM
ok ive changed the $/ and added $lines = 1;

and its working!!

Well a big thank you fishmonger for all your help.
I know i have to really learn Perl as my understanding is a bit faint!!

Thanks again.

FishMonger
03-20-2009, 04:07 PM
Glad I was able to help. Now I just wonder who's going to get the grade for the assignment, you or me? :o

welshhuw
03-20-2009, 04:30 PM
Yeah I know !!

But thanks.

I will deffo read more into Perl, because I do want to learn it!!
I just had to get this assignment off to the tutor.

Thanks again.