View Full Version : Excel to XML with Python

11-11-2010, 10:34 PM
I really need some help. I am a relatively new Python programmer, but I am not that familiar with XML. I have been tasked with creating an XML file from an Excel spreadsheet. The structure of the XML file is set by my vendor. I have a sample XML file, as well as an XSD file, but I am not sure how to progress.

I have been able to pull the data out of the Excel file in to a CSV using lxrd, but I am stuck there. I have created a header row in my CSV to create the appropriate tags in XML, but I cannot get the format to match the sample XML.

My next thought was to replace all the data in the sample XML file with variables (e.g. $2C), and to do a search and replace using the CSV. This seems to be very awkward (and truthfully, I don't know how to script that).

I am not opposed to doing the work, and I would love to actually learn how to do this, but I could really use some assistance.

Thanks in advance!!

11-12-2010, 11:43 PM
It can all be as easy as this:

# xlrd usage, see: http://scienceoss.com/read-excel-files-from-python/
import xlrd

# Open the XLS file as a "workbook" and select the first sheet as our source.
# We'll get our tag names from the sheet's first row contents.
wb = xlrd.open_workbook("my_excel_file.xls")
sh = wb.sheet_by_index(0)
tags = [n.replace(" ", "").lower() for n in sh.row_values(0)]

# This is going to come out as a string, which will write to a file in the end.
result = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<myItems>\n"

# Now, we'll just create a string that looks like an XML node for each row
# in the sheet. Of course, a lot of things will depend on the prescribed XML
# format but since we have no idea what it is, we'll just do this:
for row in range(1, sh.nrows):
result += " <item>\n"
for i in range(len(tags)):
tag = tags[i].encode("utf-8")
val = sh.row_values(row)[i].encode("utf-8")
result += " <%s>%s</%s>\n" % (tag, val, tag)
result += " </item>\n"

# Close our pseudo-XML string.
result += "</myItems>"

# Write the result string to a file using the standard I/O.
f = open("myfile.xml", "w")

But as I put in the comments, a lot depends on the prescribed XML format. Once we know that, we might (and I guess, should) use a real XML-processing module like xml.dom.minidom instead of writing strings out to file like this.

11-15-2010, 03:39 PM
Thanks for the response Samhain!! If I sent you the XML and XLS, would that be more helpful?? I also have an XSD for validation (but that is a lesser issue at this point).

11-16-2010, 03:40 AM
^ Of course, the XML and sample XLS would be helpful. Maybe you can attach them in your next reply so we can have a look?

11-16-2010, 02:44 PM
Here you go!! I appreciate all the help and advice!!

11-17-2010, 03:35 PM
Woohoo, now that's an example! Anyway, it was a good exercise.

First things first: in order to keep my sanity, I needed to write a helper class that provides a few methods for manipulating the XML (and the nodes) that we have to build. I've attached the script as a TXT file, you should replace the extension to .py if you want to try it.

I also attached a sample usage script (just in case I get disconnected and not be able to get back right away). It has lots and lots of comments, which I'll just be repeating here in parts. Like the helper script, just change the extension to .py.

Now, on using the helper script.

We'll import our helper class, instaniate it, create our basic XML structure:

from xs2ml import xs2ml

# xs2ml takes the XLS filename as an argument on __init__.
x = xs2ml("sample.xls")

# Give this method the root node name and it's attribute-values, if any.
x.create_root_node("PIES", xmlns="http://www.aftermarket.org/eCommerce/Pies")

# Create our basic node structure. This method takes in the target parent node
# and the node names that need to be created under it. We'll get a list in return.
root_nodes = x.add_nodes(x.dom_.firstChild, "Header", "Items", "Trailer")

Start filling-up the Items node. This is a bit complicated because the values are coming from multiple worksheets, and some values are text nodes while others are attribute-value nodes. Anyway, we'll just use the `Main Item Data Tab` as the basis for our row count. Also, we'll start at row[2] because we know that's were the real data starts.

for i in range(2, x.sheets["Main Item Data Tab"].nrows):
# Create the Item node in Items. We'll modify this node as we go along.
item = x.add_nodes(root_nodes[1], "Item")[0]

# Create an attribute called MaintenanceType. We know that the data for
# this node is in the Digital Assets Tab, column[1].
x.cell_to_node(item, "Digital Assets Tab", col=1, row=i,

# It seems the MaintenanceType value is reusable, so let's save it.
MaintenanceType = item.attributes["MaintenanceType"].value

# Create the ItemLevelGTIN node. We know the value is in column[8] of
# Main Item Data. We also know about its GTINQualifier attribute in col[9].
ItemLevelGTIN = x.add_nodes(item, "ItemLevelGTIN")[0]
x.cell_to_node(ItemLevelGTIN, "Main Item Data Tab", col=8, row=i)
x.cell_to_node(ItemLevelGTIN, "Main Item Data Tab", col=9, row=i,

# This is going to be a long for loop... but you get the idea.

# One last example, Descriptions nodes:
Descriptions = x.add_nodes(item, "Descriptions")[0]
d_nodes = x.add_nodes(Descriptions, "Description", "Description")

# We know what DescriptionCode to put in. LanguageCode, I don't know
# where to find it but either use the setAttribute way or the cell_to_node
# way if you know the sheet and cell the value is in.
d_nodes[0].setAttribute("DescriptionCode", "DES")
d_nodes[1].setAttribute("DescriptionCode", "SHO")

# MaintenanceType, we'll just get from item since we know that already.
d_nodes[0].setAttribute("MaintenanceType", MaintenanceType)
d_nodes[1].setAttribute("MaintenanceType", MaintenanceType)

# Short and long descriptions, according to the XLS file.
x.cell_to_node(d_nodes[0], "Main Item Data Tab", col=16, row=i)
x.cell_to_node(d_nodes[1], "Main Item Data Tab", col=15, row=i)

The cell_to_node method, by the way, takes the following arguments:
parent node, worksheet name, column, row, and attribute name (optional, None is default)

cell_to_node reads the cell value of column:row. By default, this value is turned into a text node and appended to the parent node, which results in something like <parent>value</parent>. But if the attribute name is given, an attribute is created within parent and given the cell value, like: <parent attribute_name="value" />


Composing the Header and Trailer is more straight forward because their child node values come from a single worksheet and we can easily where the values are coming from. We'll just use some dictionaries here that contain each element's respective column and row. Then we'll feed these maps to a convenience method for generating simple, mapped nodes.

# Header.
h_map = {
"PIESVersion": {"col": 1, "row": 2},
"ParentAAIAID": {"col": 1, "row": 3},
"BrandOwnerAAIAID": {"col": 1, "row": 4},
"TechnicalContact": {"col": 1, "row": 5},
"ContactEmail": {"col": 1, "row": 6},
"BlanketEffectiveDate": {"col": 1, "row": 7},

# Trailer.
t_map = {
"ItemCount": {"col": 1, "row": 10},
"TransactionDate": {"col": 1, "row": 11},

x.map_simple(root_nodes[0], "Header and Trailer Tab", h_map)
x.map_simple(root_nodes[1], "Header and Trailer Tab", t_map)

The map_simple method takes these arguments:
parent node, worksheet name, and a map that we can feed to cell_to_node as **kargs.


Finally, we can choose to save the resulting XML to file or to print it on standard out. Then we'll free our resources (unlink the XML, etc.).

# x.save_to_file("filename.xml", overwrite=False)
print x.dom_.toprettyxml()

Sorry if I didn't make too much sense but I hope this helps you even if it's only a bit. :D

11-17-2010, 05:53 PM
Wow!! Thank you so much!! I have read through the code, and it makes sense. I will try it out, and modify the code to accept the rest of the data. I cannot express how much I appreciate the help on this. I have read so many blogs, how-to sites, etc, but it didn't make a whole lot of sense.

I will let you know how it progresses!! :thumbsup:

11-18-2010, 09:13 PM
Ok, so I have taken the code, and completed the project. I had to make a couple of small tweaks to the code, but all in all the class you created worked beautifully!!

The only question I had was concerning looping. In one area, I need to manually set the Language Code (as an attribute) to EN. Currently, I am using this:

# Manually set the language code to EN
e_nodes[0].setAttribute("LanguageCode", "EN")
e_nodes[1].setAttribute("LanguageCode", "EN")
e_nodes[2].setAttribute("LanguageCode", "EN")
e_nodes[3].setAttribute("LanguageCode", "EN")
e_nodes[4].setAttribute("LanguageCode", "EN")
e_nodes[5].setAttribute("LanguageCode", "EN")
e_nodes[6].setAttribute("LanguageCode", "EN")
e_nodes[7].setAttribute("LanguageCode", "EN")
e_nodes[8].setAttribute("LanguageCode", "EN")
e_nodes[9].setAttribute("LanguageCode", "EN")

I attempted to create a loop by:

for node in e_nodes:
e_node(node).setAttribute("LanguageCode", "EN")

However, I recieved an error. What would be the proper way to loop this?? Or did I do it correctly??

Once again, I can't thank you enough for your assistance!!!!

11-19-2010, 03:00 AM
I'm glad the thing worked! :D

for node in e_nodes:
node.setAttribute("LanguageCode", "EN")

11-19-2010, 03:52 AM
*smacks his forehead*

D'oh!! Kind of embarrassed that I missed that..., once again, I'm indebted. Thanks!!

11-19-2010, 02:24 PM
This is what I have as final code. I'm sure there are things that could be tweaked, but I will work on that.

I added 2 functions to xs2ml for placing static text into nodes, and I created the rest of the code in xs2ml_usage to complete the xml.

Thanks again for all your help!! :thumbsup:

11-19-2010, 04:14 PM
^Cool! This was actually a very enlightening side project, I'm now working on something that gets stuff from OpenDocument Spreadsheets. Putting what I've learned from here to practice. :D

11-08-2013, 12:25 PM
I really appreciate the work you have done here. i have a similar project at hand and wouldn't mind any help from you.

and if there is a way i can link up with you, outside of this group, kindly let me know... i already tried your yahoo messenger and Skype with no success