Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder
    Join Date
    Dec 2002
    Posts
    169
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question How to strip the extraneous stuff out of FrontPage and Word generated html?

    I have an awful task... I need to sanitize some html generated by FrontPage and/or Word. It is terrible!

    Do you know of a standalone tool (or a set of regular expressions) to strip the extraneous stuff out of FrontPage and Word generated html?

    Thanks [in advance] for any help.

  • #2
    Regular Coder
    Join Date
    Sep 2002
    Location
    Saskatoon SK Canada
    Posts
    174
    Thanks
    2
    Thanked 0 Times in 0 Posts
    If you have access to Dreamweaver there is a tool in Dreamweaver specifically designed to do this. I took it in school but can't remember exactly how it's done. Do a search in Dreamweaver help and it will come up with how to do it.

    In Word go save as HTML.

    Open Dreamweaver and start a new page. Then I think there is a file->import->Word HTML or something like that.

    And... never copy and paste from Word into FrontPage 2000. You can in FronPage 2002 but make sure to click on that little paste options icon that shows up when you do and choose keep text only!!!


  • #3
    Senior Coder
    Join Date
    Jun 2002
    Location
    UK
    Posts
    1,137
    Thanks
    0
    Thanked 0 Times in 0 Posts
    if you knew what was wrong with the code and what it should be and posted it somebody should be able to help construct some regular expressions by doing this you are halving the workload of the person doing regular expressions.

    scroots
    Spammers next time you spam me consider the implications:
    (1) that you will be persuaded by me(in a legitimate mannor)
    (2)It is worthless to you, when i have finished

  • #4
    Regular Coder
    Join Date
    Dec 2002
    Posts
    169
    Thanks
    0
    Thanked 0 Times in 0 Posts

    An excellent idea! However...

    The html content in question is company proprietary. I am not authorized to publish it. However, I might be able to produce a Latinus Nonsensicus version which maintains the markup but replaces the proprietary content with gibberish.

    That should be an acceptable way to handle this problem..

    Originally posted by scroots
    if you knew what was wrong with the code and what it should be and posted it somebody should be able to help construct some regular expressions by doing this you are halving the workload of the person doing regular expressions.

    scroots
    Thanks for the idea!

  • #5
    Senior Coder
    Join Date
    Jun 2002
    Location
    UK
    Posts
    1,137
    Thanks
    0
    Thanked 0 Times in 0 Posts
    just the code would do if the problem was it produced tags like, etc.
    <P align = center> </ P>



    scroots
    Spammers next time you spam me consider the implications:
    (1) that you will be persuaded by me(in a legitimate mannor)
    (2)It is worthless to you, when i have finished

  • #6
    Senior Coder
    Join Date
    Jun 2002
    Location
    Wichita
    Posts
    3,880
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Here are some links that should help.

    (Google is your friend!)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •