View Full Version : Modify all <td> tags (not </td>) via regular expressions.

01-12-2007, 01:46 AM
To those who can help I would greatly appreciate it as I'm not quite sure how to write this out. Ok, here's the situation.

I need a function that will search a single variable of HTML code and find all opening <td> tags. I'll need to use regular expressions since each <td> tag could contain varying attributes, spacing, etc. With each tag that is found, I need to add a small string of fixed characters to the end (ex: <td>XXXXX).

After some other functions run, I will need to apply the same in reverse, removing just the XXXXX strings.

Thank you!!

If you're curious as to why it's because I've come across an excellent JScript that removes all of the unnecessary code from bloated web pages (usuallly from MS Office docs). The only snag is that the script is removing empty TD tags, even if they contain non-breaking spaces. As a result cells are shifting in directions they shouldn't. What I'm asking above should resolve the problem. The script I'm speaking of can be found here: http://ethilien.net/websoft/wordcleaner/cleaner.htm
Thanks again!

01-12-2007, 03:06 AM
There are undoubtedly a few ways to do this. I think you can use CSS to locate all <td> elements. But here is a way to edit them all with javascript:

var cells = document.getElementsByTagName("td");
for (var i=0; i<cells.length; i++) {
cells[i].innerHTML = 'blahblahblah' + cells[i].innerHTML;
}This would need to be run after all the tables load (or just after the page loads). After you run your reduction script, call this:

var cells = document.getElementsByTagName("td");
for (var i=0; i<cells.length; i++) {
cells[i].innerHTML = cells[i].innerHTML.substring(cells[i].innerHTML.indexOf('blahblahblah'));
}On a different note, I'm required to ask this: must you use tables? I don't know what your web page contains, but you should use <div> tags for layout, and tables only for tabular (spreadsheet-like) data.

01-12-2007, 03:38 AM
I appreciate your suggestion but the contents between the opening and closing TD tags cannot be replaced. I only need to have the string XXXX added to the right of all opening <TD> tags (<TD>XXXX LKLKLKLKLK</TD>), LKLKLKLKLK represents existing content that needs to stay.

Also, the function needs to read the contents of a variable containing the HTML code and td tags. When finished, the updated code replaces the existing variable, or is contained in a new one.

01-12-2007, 06:15 AM
Here are the results I'm getting so far. Though it works as desired, the attributes within the TD tags are being stripped out. The attributes such as class, height, etc, need to stay upon the flagTags function being ran.

I only want to replace the ">" portion of each <td> tag with ">__".

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
<script type="text/JavaScript">

function flagTags(){

var content = document.myform.code.value;
var re= new RegExp('<td[^><]*>|<td[^><]*>','g')
content = content.replace(re,'<td>__');
document.myform.code.value = content;
function unflagTags(){

var content = document.myform.code.value;
content = content.replace(/<td>__/g,'<td>');
document.myform.code.value = content;



<form name="myform">
<textarea name="code" cols="100" rows="9" value=""><table border="1" width="100%" id="table1">
<td class='something'>&nbsp;</td>
<td class='something'>Some Content</td>
<td>Some More Content</td>
<input type="button" value="Add TD Flags" onclick="flagTags()">
<input type="button" value="Remove TD Flags" onclick="unflagTags()">
<input type="reset" value="Reset">