Go Back   CodingForums.com > :: Server side development > Perl/ CGI

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 07-07-2002, 02:03 PM   PM User | #1
Sancho
New to the CF scene

 
Join Date: Jul 2002
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Sancho is an unknown quantity at this point
Parsing parts of an HTML file?

I have a huge webpage (over 300kb of just links) and what I want to be able to do is parse just pieces of the big page onto a template or another page. Basically I want to be able to put comments or anchors or something in the big HTML file to tell a CGI parsing script where to start parsing and where to stop parsing. Not only do I want it to be able to do that, I want it to work with variables. Being able to parse Part A or Part B not the entire page. I have found many scripts that use CGI and SSI to parse entire webpages, but I can't find anything that will parse customly defined parts of a page. Is this possible to do? If so, somebody please point me in the right direction of a script that already accomplishes this, or some code that I could use to start writing a script like this.

To help you visualize what I want to do....I want to use a CGI script to parse out different parts of this (www.smasonline.com/lyrics/list.html) lyrics page. So I can divide it into sections for each letter of the alaphbet.


If you could help me I would be forever greatful.

Thanks in advance,
Sancho
Sancho is offline   Reply With Quote
Old 07-08-2002, 03:13 AM   PM User | #2
mr_ego
Regular Coder

 
Join Date: Jun 2002
Location: Brisbane, Australia
Posts: 181
Thanks: 1
Thanked 0 Times in 0 Posts
mr_ego is an unknown quantity at this point
Lightbulb

What you could do is something like this:
(not guarenteed to work and youd definately have to test it)

Code:
#!/usr/bin/perl

use LWP; # not sure if this is correct ... maybe LWP::Simple;

$addr = "http://www.somewhere.com/";

$html = get("$addr");

@data = split(/\n/,$html);

foreach (@data) {
 if ($_ =~ /<!--(.*)-->/gis) {
  if ($1 eq "LIST START") {
   $start_typing = "true";
  } elsif ($1 eq "LIST END") {
    $start_typing = "false";
  }
 }
 if ($start_typing eq "true") {
  print $_;
 }
}
Note: you have to put a comment (eg: <!--LIST START--> and <!--LIST END-->) where the content or links start.

Last edited by mr_ego; 07-14-2002 at 04:25 AM..
mr_ego is offline   Reply With Quote
Old 07-08-2002, 11:40 AM   PM User | #3
Mouldy_Goat
Regular Coder

 
Join Date: Jul 2002
Location: London, UK
Posts: 126
Thanks: 0
Thanked 0 Times in 0 Posts
Mouldy_Goat is an unknown quantity at this point
What exactly are you trying to do here?

Do you want to split the whole page into a group of pages or just print out the content within the <!-- LIST ... --> comments?

By the way, it is LWP::Simple that you want here .

If you want to parse HTML documents there are a few modules out there which can help you..
Mouldy_Goat is offline   Reply With Quote
Old 07-08-2002, 02:21 PM   PM User | #4
Sancho
New to the CF scene

 
Join Date: Jul 2002
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Sancho is an unknown quantity at this point
Hmm...

I want to be able to split the page into lots of smaller pages. But I want a script that will do it for me. I want to continue to make the big webpage full of links, and have it split into smaller pages by a script using comments. I want a page for each letter of alphabet.

That way when I get new lyrics I can just update the big page and all the other pages would include the new lyrics as well; because they are just parsing whats in between comments. The idea I have is to use the big HTML file in the same kind of way I would use a database. Except I just want pull things from the database instead of searching it or anything like that.

I know this all sounds confusing, sorry. Hopefully you will understand what I mean.

As far as modules go, I can't use them. Thanks for the idea though. The site is being hosted by a crappy webhost company. So I can't change anything like that, or use PHP or use anything useful besides Perl and SSI.

I have tried the script you posted mr_ego. Thanks for pointing me in the right direction. But I know very little about Perl....I've always just used other peoples scripts, never took time out to learn any language. Anyways, I set up my own web server to test it out on temporarly. I always get a 500 error and when I check the Apache error log, I get "Syntax error on line 23 of EOF". Anybody got any ideas how to fix this, or what I'm doing wrong?

I have posted this same question in multiple forums, you guys are the first people that even responded. Thanks alot
Sancho is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:19 PM.


Advertisement
Log in to turn off these ads.