If you were to do this using javascript, a regular expression would be the best way. I would certainly suggest loading the text to be formatted into a textarea, rather than loading the file into memory using an activeX control or equivelant.
Depending on how large the files are if they are less than 100k, then use javascript without a thought. For larger files (MB and above, I would recommend Perl or C as these languages will manipulate the files more efficiently on a cellular level.
you can do a low-level speed test by running a regular expression such as can be found below on some text. Paste html page sources of varying sizes into the first textarea to find out how much text can be processed efficiently:
Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
<script type="text/javascript">
function cleanHTML(objIn, objOut) {
objOut.value = objIn.value.replace(/<[^>]*>/g, function tagMatch(s) {
if(s.indexOf("<img") > -1) { // leave images alone!
return s;
} else if(s.indexOf("/") > -1) { // add a newline after closing tags
return "\n";
} else {
return ""; // clean out all opening tags
}
}
);
}
</script>
</head>
<body>
<form name="frm" onsubmit="cleanHTML(this.txtInput, this.txtOutput);return false;">
<textarea name="txtInput" cols="100" rows="10">
<p>Hello world</p>
<h3>line 2!</h3>
<br>
Line 3!
<img src="img.gif">
</textarea>
<br><br>
<textarea name="txtOutput" cols="100" rows="10"></textarea>
<br><br>
<input type="submit" name="submit" value="submit">
</form>
</body>
</html>
Hope this helps
m_n