...

View Full Version : How to strip the extraneous stuff out of FrontPage and Word generated html?



ca_redwards
03-10-2003, 08:24 PM
I have an awful task... I need to sanitize some html generated by FrontPage and/or Word. It is terrible!

Do you know of a standalone tool (or a set of regular expressions) to strip the extraneous stuff out of FrontPage and Word generated html?

Thanks [in advance] for any help.

dreamingdigital
03-10-2003, 08:34 PM
If you have access to Dreamweaver there is a tool in Dreamweaver specifically designed to do this. I took it in school but can't remember exactly how it's done. Do a search in Dreamweaver help and it will come up with how to do it.

In Word go save as HTML.

Open Dreamweaver and start a new page. Then I think there is a file->import->Word HTML or something like that.

And... never copy and paste from Word into FrontPage 2000. You can in FronPage 2002 but make sure to click on that little paste options icon that shows up when you do and choose keep text only!!!

:thumbsup:

scroots
03-10-2003, 08:35 PM
if you knew what was wrong with the code and what it should be and posted it somebody should be able to help construct some regular expressions by doing this you are halving the workload of the person doing regular expressions.

scroots

ca_redwards
03-10-2003, 09:09 PM
The html content in question is company proprietary. I am not authorized to publish it. However, I might be able to produce a Latinus Nonsensicus version which maintains the markup but replaces the proprietary content with gibberish.

That should be an acceptable way to handle this problem..


Originally posted by scroots
if you knew what was wrong with the code and what it should be and posted it somebody should be able to help construct some regular expressions by doing this you are halving the workload of the person doing regular expressions.

scroots

Thanks for the idea!

scroots
03-10-2003, 09:13 PM
just the code would do if the problem was it produced tags like, etc.
<P align = center> </ P>



scroots

Roy Sinclair
03-10-2003, 09:41 PM
Here (http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=MS+Word+2000+html+clean+up) are some links that should help.

(Google is your friend!)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum