PDA

View Full Version : Reading Files, check for Plain Text?


Pontifex
08-12-2007, 04:51 AM
I'm compiling some functions for a basic toolkit. Functions I use a lot in many different packages. I was doing some reading here:

http://perldoc.perl.org/perlopentut.html

Where I came across this:


"When is a file not a file? Well, you could say when it exists but isn't a plain file. We'll check whether it's a symbolic link first, just in case.

if (-l $file || ! -f _) {
print "$file is not a plain file\n";
}

What other kinds of files are there than, well, files? Directories, symbolic links, named pipes, Unix-domain sockets, and block and character devices. Those are all files, too--just not plain files. This isn't the same issue as being a text file. Not all text files are plain files. Not all plain files are text files. That's why there are separate -f and -T file tests."


My code is pretty simple, for reading files:


sub read_file ($) {
my ($file_name) = shift;
my $file_handle;

open ($file_handle, "<", $file_name) or return undef;

my @contents = <$file_handle>;

close ($file_handle);

return wantarray ? @contents : join (' ', @contents);
}#end of read_file ($)


But this has got me wondering if I should put a line like this before the open:


#check that file is a text file fit for reading
return undef unless (-f $file_name and -T $file_name);


I'm fairly certain that open would catch reading the non-text files detailed in the passage above and just throw an error. But I'm unsure of behavior of it reading a symlink or pipe. I don't want unexpected return values.

I suppose, I'm just asking for some advice on how to make my code there, more bullet proof and more style-alistically pleasing.

--Pontifex

Pontifex
08-12-2007, 04:52 AM
After posting I noticed something:


PrivoxyWindowOpen()


Replaced my 'open' statement. What is that? Why did it do that?

--Pontifex

FishMonger
08-12-2007, 05:26 AM
After posting I noticed something:


PrivoxyWindowOpen()


Replaced my 'open' statement. What is that? Why did it do that?

--Pontifex

I've never used and don't know anything about it, but I assume you have Privoxy installed. This thread might be of interest.

http://cygwin.com/ml/cygwin/2003-11/msg00872.html

FishMonger
08-12-2007, 06:13 AM
I'm fairly certain that open would catch reading the non-text files detailed in the passage above and just throw an error.No it won't. It will happily open a binary file or symlink without complaining.

If there's a possibility of passing the wrong item to the sub, then yes you should preform the proper tests prior to opening the file.

Why are you using prototypes?

Do you understand the pros/cons and the proper method to use (declare and call) subs with prototypes?

Why use the prototype to "restrict" the call to a scalar and then assign $file_name in list context?

What do you think would happen if you pass an array or ref to an array?

KevinADC
08-12-2007, 09:32 PM
No it won't. It will happily open a binary file or symlink without complaining.

I know the -T operator is not perfect, but I would think it would catach a binary or symlink file and not open it. No?

FishMonger
08-12-2007, 10:16 PM
I know the -T operator is not perfect, but I would think it would catach a binary or symlink file and not open it. No?

Yes, the -T file test will fail on a binary or symlink file, but I was referring to open which doesn't do that test and won't throw an error just because it's opening a binary file.

Here's a simple example script that you can test:
#!/usr/bin/perl

use strict;
use warnings;

my $binfile = '/bin/cat';

open(CAT, '<', $binfile) || die $!;
print <CAT>;
close CAT;

KevinADC
08-13-2007, 01:23 AM
ahh, I gotcha.

Pontifex
08-13-2007, 05:07 AM
I've never used and don't know anything about it, but I assume you have Privoxy installed. This thread might be of interest.

http://cygwin.com/ml/cygwin/2003-11/msg00872.html

Bizzarre. Thanks. Let me test that:


open


Okay everything looks good. =)

Actually that's happened before, now that I think about it, I was always confused why people were calling the Privoxy function in their code. I thought it was some C-object I was missing out on!


Why are you using prototypes?


I actually fazed that out in the latest iteration of the code. I don't seem to be using the functionality, so I'm not using it anymore.


Do you understand the pros/cons and the proper method to use (declare and call) subs with prototypes?


Probably not completely, but I assume (like C) it would restrict the number of arguments to the function and cause execution to terminate if called improperly.


Why use the prototype to "restrict" the call to a scalar and then assign $file_name in list context?


Actually I'm not sure. I just learned the nomenclature of shifting off the arguments to the function like from when I was just beginning to learn Perl. I really don't know of a better way, besides assigning multiple variables in list context:


my ($some_var, $another_var) = @_;


For example.


What do you think would happen if you pass an array or ref to an array?


Well this prototype implies (from http://perldoc.perl.org/functions/open.html):


open FILEHANDLE,MODE,REFERENCE


That I could pass '&open' a reference and have it do something; But what I have no idea, since the 'REFERENCE' isn't mentioned in the perldoc, besides the prototype. Testing shows that it doesn't throw an error, and returns nothing:


Use of uninitialized value in concatenation (.) or string at ./testing.pl line 2
1.


When attempting to print the return value.

I recall that the input is flattened into the @_ array. But I'm unsure how I would make my program behave normally if I passed it a reference or an array for that matter.

So in summary, my read file is only intended for regular text files (at this time) so it would be a good idea to put in some tests to check for whether the input is plain text or not (As above).

Though this little discussion has turned into something else. Which is good! I look forward to your reply and to learning more about the points you brought up.

--Pontifex