Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    New to the CF scene
    Join Date
    Oct 2009
    Posts
    6
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Question Parsing XML with Perl

    Hello,

    First off, I am by no means a seasoned Perl coder, but I am hoping you guys can point me in the right direction. I am trying to parse an XML file from an RSS feed and extract a link from a certain item and have that subroutine return the link as a string. For example, my XML looks something like this:

    Code:
    <?xml version="1.0" encoding="iso-8859-1"?>
    <rss version="2.0">
    	<channel>
    		<title>RSS Feed Example</title>
    		<link>http://www.rss.com</link>
    		<language>en-usde</language>
    		<copyright>Copyright � 2008</copyright>
    		<webMaster>support@rss.com</webMaster>
    		<image>
    			<title>RSS Feed</title>
    			<url>http://www.rss.com/favicon.ico</url>
    			<link>http://www.rss.com</link>
    			<width>16</width>
    			<height>16</height>
    		</image>
    		<item>
    			<title>String.To.Match</title>
    			<pubDate>Thu, 01 Jan 1970 00:00:00 +0000</pubDate>
    			<link>http://www.rss.com/download.php?id=178283</link>
    			<guid>http://www.rss.com/download.php?id=178283</guid>
    		</item>
    		<item>
    			<title>String.NOT.To.Match</title>
    			<pubDate>Thu, 02 Jan 1970 00:00:00 +0000</pubDate>
    			<link>http://www.rss.com/download.php?id=178284</link>
    			<guid>http://www.rss.com/download.php?id=178284</guid>
    		</item>
    	</channel>
    </rss>
    So I want the script to find the item titled "String.to.Match" then have the routine return the string in <link> for the "String.to.Match" item. Any help would be appreciate. Just need to be pointed in the right direction, I am not expecting someone to write the code for me. (Feel free though, ha!)

    Thanks!
    Last edited by TaterSalad; 10-05-2009 at 05:57 PM.

  • #2
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    you can use XML::LibXML

    Code:
    use strict;
    use warnings;
    use XML::LibXML;
    
    my $parser = XML::LibXML->new();
    my $feed = $parser->parse_file($rssfilename);
    my $links = $feed->findnodes('//link');
    foreach my $link ($links->get_nodelist){
       print $link->value,"\n";
    }
    it's not tested.

    are many ways to do that, this is only one of them

    Edit: I was focused on parsing and extracting because searching is easy when now that

    best regards
    Last edited by oesxyl; 10-05-2009 at 06:26 PM.

  • Users who have thanked oesxyl for this post:

    TaterSalad (10-05-2009)

  • #3
    New to the CF scene
    Join Date
    Oct 2009
    Posts
    6
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thanks for the speedy reply, oesxyl!

    Correct me if I am wrong, but wouldn't that display all <link> nodes? How would I filter only <link> nodes that have "String.To.Match" in <item>?

  • #4
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by TaterSalad View Post
    Thanks for the speedy reply, oesxyl!

    Correct me if I am wrong, but wouldn't that display all <link> nodes? How would I filter only <link> nodes that have "String.To.Match" in <item>?
    yes you are right, this will display all link nodes.
    item is a container.
    In which element you want to search?
    for example if you search for a given string in title you can use
    Code:
    //item[contains(title,$mystring)]/link
    instead of //link. You can use any valid xpath expression.

    best regards

  • Users who have thanked oesxyl for this post:

    TaterSalad (10-05-2009)

  • #5
    New to the CF scene
    Join Date
    Oct 2009
    Posts
    6
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thanks again oesxyl!

    But now I have a new problem. I am trying to download a file from a link. The file downloads, but the file is not the file I want to download. Instead, the contents of the file states I need to login before I can download the file.

    For example, I have:

    Code:
    $link = "http://www.website.com/download.php?id=178452";
    LWP::Simple::getstore($link,$filepath."download".".txt");
    The file downloads, but the contents of the file is HTML of Website.com's login page. I guess my question is...is there any way I can 'login' with Perl so I can download the file appropriately?

    I am logged in with Firefox, but I am guessing it is not pulling the cookies from there. Any ideas?

  • #6
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    I guess is a better idea to use LWP::UserAgent module instead of simple, see the man page. It's a good idea to look also over lwpcook man page.

    Edit: in my opinion is a bad idea to hide a rss feed behind a login page but this is probably because of the bad habit of programmers who put scrips on their websites and fetch the feed on each page request


    best regards
    Last edited by oesxyl; 10-05-2009 at 11:13 PM.

  • #7
    New to the CF scene
    Join Date
    Oct 2009
    Posts
    6
    Thanks
    2
    Thanked 0 Times in 0 Posts
    I've tried few different things but I can't get it to work. Any suggestions? Here is what I have. The scripts runs but doesn't seem to authenticate.

    Code:
    # GET REQUEST
    
    use URI::URL;
    
    my $url = url('http://www.website.com/login.php');
    $url->query_form(username => 'my_username', password => 'my_pass');
    my $content = get($url);  
    
    #-----------------------------
    
    # POST REQUEST
    
    use HTTP::Request::Common qw(POST);
    use LWP::UserAgent;
    
    my $ua = LWP::UserAgent->new();
    my $req = POST 'http://www.website.com/login.php',
       [ username => 'my_username', password => 'my_pass' ];
    $content = $ua->request($req)->as_string;

    Then the HTML I am trying to authenticate....


    Code:
    	<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    	<html xmlns="http://www.w3.org/1999/xhtml"><head>
    	
    	
    	<title>Welcome to the Website</title>
    	<link rel="stylesheet" type="text/css" href="templates/default/common.css">
    	<link rel="shortcut icon" href="pic/roundcube.ico">
    	<style type="text/css">
    	
    	#login-form {
    	  margin-left: auto;
    	  margin-right: auto;
    	  margin-top: 50px;
    	  width: 350px;
    	}
    	
    	</style>
    	<meta http-equiv="content-type" content="text/html; charset=UTF-8">
    	
    	
    	</head><body>
    
    		
    	<img src="pic/roundcube_logo.png" id="rcmbtn104" alt="RoundCube Webmail" border="0" height="55" hspace="10" width="165">
    	<br /><br /><br />
    		
    <form method="post" action="takelogin.php">
    	<br />		
    <table align="center"><tbody>
    	<tr>
    	<td class="title">Username</td><td><input type="text" size="26" name="username" style="width: 200px; border: 1px solid gray" /></td>
    </tr>
    <tr>
    	<td class="title">Password</td><td><input type="password" size="26" name="password" style="width: 200px; border: 1px solid gray"/></td>
    
    </tr>
    <tr>
    	<td colspan="2" align=left><input type="checkbox" name="logout" value="yes"><h0>Log me out after 15 minutes inactivity<h0></td>
    </tr>
    <tr>
    	<td colspan="2" align=left><input type="checkbox" name="securelogin" value="yes" /><h0>Secure Login <h0></td>
    </tr>
    <tr>
    	<td colspan="2" align="center"><input type="submit" value="Log in!" class="button"> <input type="reset" value="Reset" class="button"></td>
    </tr>
    <tr>
    
    	<td colspan="2" align="center"><br /><br /></td>
    </tr>
    </tbody></table>
    <center><h4>Forget your password? Recover <a href="recover.php"><b>via email</b></a></h4><nobr><center>
    <center><h4>Need help? <a href="http://embed.site.com/?server=irc.site.net&channel=%23gft-support&noServerNotices=true&noServerMotd=true"><b>Click here</a></h4></center>
    </form>
    	
    </body></html>

    Thanks!

  • #8
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    you must post a valid username, password and submit to the script from the action tag of the form, takelogin.php.
    Code:
    'username' => ..., 'password' => ...., 'submit' => 'Log in!'
    best regards

  • #9
    New to the CF scene
    Join Date
    Oct 2009
    Posts
    6
    Thanks
    2
    Thanked 0 Times in 0 Posts
    So...

    Code:
    # GET REQUEST
    
    use URI::URL;
    
    my $url = url('http://www.website.com/login.php');
    $url->query_form(username => 'my_username', password => 'my_pass');
    my $content = get($url);  
    
    #-----------------------------
    
    # POST REQUEST
    
    use HTTP::Request::Common qw(POST);
    use LWP::UserAgent;
    
    my $ua = LWP::UserAgent->new();
    my $req = POST 'http://www.website.com/takelogin.php',
       [ username => 'my_username', password => 'my_pass', submit => 'Log in!' ];
    $content = $ua->request($req)->as_string;
    Tried that but that didnt work. I'm a bit lost. Not really sure how to debug or how to tell if its working other than my script returning a file that is a "not logged in webpage".

  • #10
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by TaterSalad View Post
    So...

    Code:
    # GET REQUEST
    
    use URI::URL;
    
    my $url = url('http://www.website.com/login.php');
    $url->query_form(username => 'my_username', password => 'my_pass');
    my $content = get($url);  
    
    #-----------------------------
    
    # POST REQUEST
    
    use HTTP::Request::Common qw(POST);
    use LWP::UserAgent;
    
    my $ua = LWP::UserAgent->new();
    my $req = POST 'http://www.website.com/takelogin.php',
       [ username => 'my_username', password => 'my_pass', submit => 'Log in!' ];
    $content = $ua->request($req)->as_string;
    Tried that but that didnt work. I'm a bit lost. Not really sure how to debug or how to tell if its working other than my script returning a file that is a "not logged in webpage".
    print $content, maybe you can see why didn't work.
    could be a proxy, cookies or something else.
    Usualy webmasters try to stop this since can be used by spam bots.

    best regards


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •