View Full Version : MS Office tables/spreadsheets to HTML
Zvona
01-01-2003, 04:47 PM
This idea came yesterday at #html, when someone was needing a tool to convert Word table to HTML without Microsoft's own attributes etc.
Tool isn't ready yet,but can convert simple tables/spreadsheets to valid XHTML.It's based on regular expressions and I'm not very familiar with them,so I'd appreciate some help.Also suggestions are heard :
http://www24.brinkster.com/zvona/parser.html
Zvona
01-01-2003, 04:47 PM
Example
Before :
<TABLE class=MsoTableGrid style="BORDER-RIGHT: medium none; BORDER-TOP: medium none; BORDER-LEFT: medium none; BORDER-BOTTOM: medium none; BORDER-COLLAPSE: collapse; mso-border-alt: solid windowtext .5pt; mso-yfti-tbllook: 480; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-border-insideh: .5pt solid windowtext; mso-border-insidev: .5pt solid windowtext" cellSpacing=0 cellPadding=0 border=1>
<TBODY>
<TR style="mso-yfti-irow: 0">
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 1pt solid; PADDING-LEFT: 5.4pt; BACKGROUND: #999999; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 1pt solid; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><B style="mso-bidi-font-weight: normal">alfa<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></B></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 1pt solid; PADDING-LEFT: 5.4pt; BACKGROUND: #999999; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><B style="mso-bidi-font-weight: normal">beta<o:p></o:p></B></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 1pt solid; PADDING-LEFT: 5.4pt; BACKGROUND: #999999; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><B style="mso-bidi-font-weight: normal">gamma<o:p></o:p></B></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 1pt solid; PADDING-LEFT: 5.4pt; BACKGROUND: #999999; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><B style="mso-bidi-font-weight: normal">delta<o:p></o:p></B></P></TD></TR>
<TR style="mso-yfti-irow: 1">
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 1pt solid; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><I style="mso-bidi-font-style: normal">a<o:p></o:p></I></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><U>b<o:p></o:p></U></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><I style="mso-bidi-font-style: normal">c<o:p></o:p></I></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><U>d<o:p></o:p></U></P></TD></TR>
<TR style="mso-yfti-irow: 2">
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 1pt solid; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">e<o:p></o:p></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">f<o:p></o:p></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">G<o:p></o:p></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">h<o:p></o:p></P></TD></TR>
<TR style="mso-yfti-irow: 3; mso-yfti-lastrow: yes">
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; BACKGROUND: #99cc00; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 1pt solid; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">I<o:p></o:p></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; BACKGROUND: #99cc00; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.7pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">j<o:p></o:p></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; BACKGROUND: #99cc00; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">k<o:p></o:p></P></TD>
<TD style="BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; BACKGROUND: #99cc00; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 130.75pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" vAlign=top width=174>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt">l<o:p></o:p></P></TD></TR></TBODY></TABLE>
..and after :
<table cellspacing="0" cellpadding="0" style="border:1px;">
<tbody>
<tr>
<td style="background-color:#999999;width:174px;">
<b>alfa</b>
</td>
<td style="background-color:#999999;width:174px;">
<b>beta</b>
</td>
<td style="background-color:#999999;width:174px;">
<b>gamma</b>
</td>
<td style="background-color:#999999;width:174px;">
<b>delta</b>
</td>
</tr>
<tr>
<td style="width:174px;">
<i>a</i>
</td>
<td style="width:174px;">
<span style="text-decoration:underline;">b</span>
</td>
<td style="width:174px;">
<i>c</i>
</td>
<td style="width:174px;">
<span style="text-decoration:underline;">d</span>
</td>
</tr>
<tr>
<td style="width:174px;">
e
</td>
<td style="width:174px;">
f
</td>
<td style="width:174px;">
g
</td>
<td style="width:174px;">
h
</td>
</tr>
<tr>
<td style="background-color:#99cc00;width:174px;">
i
</td>
<td style="background-color:#99cc00;width:174px;">
j
</td>
<td style="background-color:#99cc00;width:174px;">
k
</td>
<td style="background-color:#99cc00;width:174px;">
l
</td>
</tr>
</tbody>
</table>
chrismiceli
01-01-2003, 06:50 PM
you use onkeyup to view the rendored, but when you use the mouse instead of the easy keyboard shortcuts, it doesn't count as a onkeyup to make the view rendored enabled. you should use onChange instead. it looks good though, great work !:)
kwhubby
01-02-2003, 07:09 AM
wow!!! thats really cool!! I did not know javascript could do such a thing. A question though about certain syntax and its meaning
the sytax here really confuses me, it does not look like javascript:
var rgStyle = /\s?style="[\w|\-|\s|\;|\.|\:|\#]*[\!|\"]?/gi;
what is all that after the var rgStyle = ??? it looks like garbage!
basically from line 22 to 30 I see wierd junk like that. Please describe how and what all that mumbo jumbo does please. The first one lookes like a really wierd if/else statement, but I cant tell
beetle
01-02-2003, 05:52 PM
Good work, Zvona.
kwhubby, all that 'garbage' you see are what's called regular expressions. It's a pretty big topic to cover, so instead of attempting to tell you myself, I'll give you some links.
http://devedge.netscape.com/library/manuals/2000/javascript/1.5/guide/regexp.html#1010922
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/jscript7/html/jsjsgrpRegExpSyntax.asp
http://www.webreference.com/js/column5/index.html
Zvona, I took the liberty of re-doing some stuff in your code. Here's what I did Added arrayReplace() method and made appropriate changes to accomodate it's use
Finessed regular expressions (many of them were too busy)
Added tagsToLowerCase() method. You do handle some conversion to lowercase, but it also lowercases all the data from Excel or Word, which a parser like this should not do
Changed all FONT tags to SPAN tags for XHTML compliancehttp://www.peterbailey.net/dhtml/parser.htm
There's still more to be done for this to create fully valid XHTML, but it's getting close!
theexo51
07-27-2005, 03:05 PM
sorry to drag up an old post, but i was using this cool parser...
however, when it loads my table in, it has narrowed some of the columns so that the are 2 rows deep. the only way i have found to make them have the text on one line is to manually go down the code and adjust the settings of each individual cell. kinda ruins the usefulness of the parser :P
anyone got ideas on how to do this in a more time efficient manner?
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.