PDA

View Full Version : not understanding data dumper output


shadkeene
09-25-2007, 01:51 AM
I've used Fishmonger's advice and went with html::tokeparser::simple and will next place data into hashes, but I'm having problems getting to the individual pieces of data.

After using html::tokeparser::simple, then using a regex and pushing data into a new array, I can't access individual elements of the array. Using data dumper, I see several variables that are undefined, in addition to the 3 digit wind directions that I'm looking to access individually via $wnddir[2] etc.
My goal is to only have what I'm trying to parse (wind directions in this case) in my array that I'm pushing to. Here's my code:

#!/usr//bin/perl

use warnings;
use strict;
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);
use LWP::Simple qw(!head);
use HTML::TokeParser::Simple;
use Data::Dumper;

print header;
print start_html("WindshftObs");
my @wnddir = ();
my @times = ();
my $sjc="sjc";
my $sfo="sfo";
my $sql="sql";



#call sub to loop through ob data and parse wnd direction and time of #observation
my @data = Winds($sfo, $sjc);


foreach my $datum (@data) {
my ($wnds) = $datum =~ (/(\d{3})+\d{2}KT|(\d{3})+\d{2}G\d{2}KT/);
push @wnddir, $wnds;
my ($Offtime) = $datum =~ (/\d{2}(\d{2})\d{2}Z/);
push @times, $Offtime;
}
print Dumper @wnddir;
print "@wnddir<br>";






sub Winds {
return "Error: No argument sent to Winds" unless @_;
my @apt = @_;
my @data;

foreach my $icao (@apt) {
my $url = "http://www.wrh.noaa.gov/mesowest/getobext.php?wfo=&sid=K$icao&num=3&raw=3&dbn=m&banner=off";
my $content= get($url) or die "Error getting file: $!";
my $p = HTML::TokeParser::Simple->new(\$content) || die "Can't open: $!";

#$p->empty_element_tags(1); # configure its behaviour

while (my $token = $p->get_token) {
next unless $token->is_text;
push @data, $token->as_is; #. "<br>\n" if $token->[1] = /^k$icao/i;
}

}
return @data;
}

print end_html;

And here's my results:
$VAR1 = undef; $VAR2 = undef; $VAR3 = undef; $VAR4 = '280'; $VAR5 = '280'; $VAR6 = '340'; $VAR7 = undef; $VAR8 = undef; $VAR9 = undef; $VAR10 = undef; $VAR11 = '330'; $VAR12 = '330'; $VAR13 = '340'; $VAR14 = undef; 280 280 340 330 330 340

Thanks for your time and any explanations you can give...
Shad

FishMonger
09-25-2007, 08:12 AM
You have 2 problems.


You're pushing every text token onto @data even if it doesn't have the required data. You can verify this by dumping $token->as_is when you push it onto the array.

If your regex fails, $wnds will eq undef and then you push that onto the array. You need to modify it so that it only pushes onto the array if the regex matches.


#!/usr//bin/perl

use warnings;
use strict;
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);
use LWP::Simple qw(!head);
use HTML::TokeParser::Simple;
use Data::Dumper;

print header;
print start_html("WindshftObs");
my @wnddir = ();
my @times = ();
my $sjc="sjc";
my $sfo="sfo";
my $sql="sql";



#call sub to loop through ob data and parse wnd direction and time of #observation
my @data = Winds($sfo, $sjc, $sql);


foreach my $datum (@data) {
push @wnddir, $1 if $datum =~ /(\d{3})+\d{2}KT|(\d{3})+\d{2}G\d{2}KT/;
push @times, $1 if $datum =~ /\d{2}(\d{2})\d{2}Z/;
}
#print Dumper @wnddir;
print "$_<br>" for @wnddir;






sub Winds {
return "Error: No argument sent to Winds" unless @_;
my @apt = @_;
my @data;

foreach my $icao (@apt) {
my $url = "http://www.wrh.noaa.gov/mesowest/getobext.php?wfo=&sid=K$icao&num=3&raw=3&dbn=m&banner=off";
my $content= get($url) or die "Error getting file: $!";
my $p = HTML::TokeParser::Simple->new(\$content) || die "Can't open: $!";

#$p->empty_element_tags(1); # configure its behaviour

while (my $token = $p->get_token) {
next unless $token->is_text;
push @data, $token->as_is if $token->as_is =~ /^k$icao/i;
print ('<br>' . $token->as_is) if $token->as_is =~ /^k$icao/i;
}

}
return @data;
}

print end_html;

FishMonger
09-25-2007, 06:12 PM
I posted an update to your question in the perl.beginners usenet group, but for some reason it takes 12-18 hours before it shows up.

My recommendation was to move the building of the @wnddir and @times arrays to the sub and use a more complex but better suited data structure i.e., use a hash of arrays or a hash of hashes of arrays.

shadkeene
09-25-2007, 06:56 PM
Fishmonger,
Thanks as always...I will get to testing out some new subroutines that create hashes. I believe I'm writing way too much code in the main parts of my scripts that instead can be simplified and placed in the subroutines...thanks for pointing me in that direction.

Shad

FishMonger
09-26-2007, 03:32 AM
This might need a little tweak, but I think it does what you need.
#!/usr/bin/perl

use warnings;
use strict;
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);
use LWP::Simple qw(!head);
use HTML::TokeParser::Simple;
use Data::Dumper;

print header;
print start_html("WindshftObs");
my $sjc = 'sjc';
my $sfo = 'sfo';
my $sql = 'sql';

#call sub to loop through ob data and parse wnd direction and time of #observation
my $wind_ref = winds($sfo, $sjc, $sql);

print Dumper $wind_ref;

sub winds {
my @apt = @_;
my $wind = {}; # initalize as a hash reference

foreach my $icao (@apt) {
my $url = "http://www.wrh.noaa.gov/mesowest/getobext.php?wfo=&sid=K$icao&num=3&raw=3&dbn=m&banner=off";
my $content= get($url) or die "Error getting file: $!";
my $p = HTML::TokeParser::Simple->new(\$content) || die "Can't open: $!";

while (my $token = $p->get_token) {
next unless $token->is_text;
if( $token->as_is =~ /^K$icao (\d{3})\d+Z (\d{3})\d+KT/i ) {
print $token->as_is . "<br>\n";
push @{$wind->{$icao}{speed}}, $1;
push @{$wind->{$icao}{time}}, $2;
}
}
}
return $wind; # return a reference to a hash of hashes of arrays
}


This is what I got when running from the command line.
[root@perlman emadmin]# ~root/test2.pl
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>WindshftObs</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
KSFO 260056Z 28017KT 10SM CLR 23/03 A3000 AO2 SLP157 T02330033<br>
KSFO 252356Z 28016KT 10SM CLR 24/04 A3000 AO2 SLP158 T02440044 10267 20189 56013<br>
KSFO 252256Z 30013KT 10SM CLR 26/06 A3001 AO2 WSHFT 2216 SLP161 T02560061<br>
KSFO 252236Z 30012KT 10SM CLR 26/06 A3001 AO2 WSHFT 2216<br>
KSJC 260053Z 32010KT 10SM CLR 27/04 A2998 AO2 SLP150 T02720039<br>
KSJC 252353Z 33010KT 10SM CLR 29/02 A2998 AO2 SLP150 T02890017 10294 20222 56014<br>
KSJC 252253Z 33008KT 10SM FEW200 29/03 A2998 AO2 SLP152 T02890028<br>
KSQL 260047Z 34010KT 30SM SKC 28/05 A2999 <br>
KSQL 252347Z 34012KT 30SM SKC 29/04 A2999 <br>
KSQL 252247Z 34007KT 30SM SKC 29/01 A3000 <br>
$VAR1 = {
'sfo' => {
'time' => [
'280',
'280',
'300',
'300'
],
'speed' => [
'260',
'252',
'252',
'252'
]
},
'sjc' => {
'time' => [
'320',
'330',
'330'
],
'speed' => [
'260',
'252',
'252'
]
},
'sql' => {
'time' => [
'340',
'340',
'340'
],
'speed' => [
'260',
'252',
'252'
]
}
};

shadkeene
09-27-2007, 03:11 PM
Fishmonger,
Thanks very much for your time helping me out, can't wait to implement,
Shad