Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New Coder
    Join Date
    Sep 2006
    Posts
    24
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Exporting body of HTML pages - sitewide

    I have a need to export only the contents of the main content cell from each page throughout my site. This is for a translation agency.

    Does anyone know of a speedy way I can export only this specific section and save each file as HTML?

    I'm thinking of crawling with WinHTTrack site copier, and then somehow I'd need to strip out all the code above and below the main content cell.

    Thanks for any ideas...

  • #2
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,472
    Thanks
    8
    Thanked 1,085 Times in 1,076 Posts
    Do all of these pages already exist, or do you have the ability to
    put some sort of code before and after each section?

    I'm thinking if you were to flag the sections, you could use PHP to
    extract those easily, sort of like an RSS Feeder.

    Like:

    <div id='english'>
    This is the content in English.
    </div>

    The PHP script could find all text between the <div>'s that have an
    id = 'english' and do whatever you want with the content.

    This is what an RSS Feeder does when it parses an HTML page and
    creates the XML for the RSS Reader.

  • #3
    New Coder
    Join Date
    Sep 2006
    Posts
    24
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Yes, I have an html comment before and after this section like

    <!-- start main section body -->
    <!-- end main section body -->

    Would this work? Do you know of example code that would be of use to me?

    Thanks for your suggestion! It sounds like it could work...

  • #4
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,472
    Thanks
    8
    Thanked 1,085 Times in 1,076 Posts
    Here is an example page I copied from the internet and
    added the two lines in the middle of it (view HTML source):
    You'll see the <!-- start ... and <! -- end ... parts:

    http://www.catpin.com/lorem.html

    Then,
    This is the PHP script (see below) that extracts the part you want:

    http://www.catpin.com/lorem.php

    Here is the PHP script source:
    PHP Code:
    <?php

    // Get the page you want to parse
    $url "http://www.catpin.com/lorem.html";
    $data implode(""file($url)); 

    // Get all content between <body> and </body>
    preg_match_all ("/<body>([^`]*?)<\/body>/"$data$matches);

    // Loop through the page
    foreach ($matches[0] as $match) {

    // Get Content between your comment lines
    preg_match ("/<!-- start main section body -->([^`]*?)<!-- end main section body -->/"$match$temp);
    $content $temp['1'];
    $content strip_tags($content);
    $content trim($content);

    // Print Content Found
    echo $content;

    // you would save the content instead of printing it
    // or do whatever you want with it.

    }

    ?>
    EDIT:

    There is an extra "grab content" part because you may want to extract more than
    just the part between your two comment lines ... example, you may also want to
    grab between <title> and </title>. This allows the ability to do that. If each page
    only has one section between one set of comment lines, you could do it with fewer
    lines of code ... but I took this from pieces of an RSS Feeder, so it's what it is ....


    .
    Last edited by mlseim; 05-23-2007 at 06:53 PM.

  • #5
    New Coder
    Join Date
    Sep 2006
    Posts
    24
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thanks mlseim!

    I removed the strip_tags line because I wanted to actually preserve the html for this content.

    Now for the clincher, how can I get this to go through my entire site, parsing all pages and saving them as html files?

  • #6
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,472
    Thanks
    8
    Thanked 1,085 Times in 1,076 Posts
    First, tackle the save-as HTML part.
    Instead of printing, determine the path
    and filename you'll be giving it (or them),
    open the file and write the HTML you want
    along with $content.
    Look for tutorials here:
    http://www.google.com/search?hl=en&q...le&btnG=Search

    You didn't mention anything about the $content
    written into one file, multiple files, various directories ...
    so you'll have to figure that part out.

    Then, tackle the loop where PHP looks for all files ending
    with .html and goes through them one by one.
    Start by looking for possible examples here:
    http://www.google.com/search?q=php+l...es&btnG=Search

    I don't have time to write any examples for these.
    This would be a good time for you to learn PHP.

    The intent of the forum is to help with existing code ...
    I gave you a good start -- to get you going.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •