Go Back   CodingForums.com > :: Server side development > Other server side languages/ issues > Python

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 11-11-2010, 09:34 PM   PM User | #1
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Excel to XML with Python

I really need some help. I am a relatively new Python programmer, but I am not that familiar with XML. I have been tasked with creating an XML file from an Excel spreadsheet. The structure of the XML file is set by my vendor. I have a sample XML file, as well as an XSD file, but I am not sure how to progress.

I have been able to pull the data out of the Excel file in to a CSV using lxrd, but I am stuck there. I have created a header row in my CSV to create the appropriate tags in XML, but I cannot get the format to match the sample XML.

My next thought was to replace all the data in the sample XML file with variables (e.g. $2C), and to do a search and replace using the CSV. This seems to be very awkward (and truthfully, I don't know how to script that).

I am not opposed to doing the work, and I would love to actually learn how to do this, but I could really use some assistance.

Thanks in advance!!
grazopper is offline   Reply With Quote
Old 11-12-2010, 10:43 PM   PM User | #2
Samhain13
Regular Coder

 
Samhain13's Avatar
 
Join Date: Aug 2008
Location: Pilipinas
Posts: 165
Thanks: 4
Thanked 18 Times in 18 Posts
Samhain13 is on a distinguished road
It can all be as easy as this:

Code:
# xlrd usage, see: http://scienceoss.com/read-excel-files-from-python/
import xlrd

# Open the XLS file as a "workbook" and select the first sheet as our source.
# We'll get our tag names from the sheet's first row contents.
wb = xlrd.open_workbook("my_excel_file.xls")
sh = wb.sheet_by_index(0)
tags = [n.replace(" ", "").lower() for n in sh.row_values(0)]

# This is going to come out as a string, which will write to a file in the end.
result = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<myItems>\n"

# Now, we'll just create a string that looks like an XML node for each row
# in the sheet. Of course, a lot of things will depend on the prescribed XML
# format but since we have no idea what it is, we'll just do this:
for row in range(1, sh.nrows):
    result += "  <item>\n"
    for i in range(len(tags)):
        tag = tags[i].encode("utf-8")
        val = sh.row_values(row)[i].encode("utf-8")
        result += "    <%s>%s</%s>\n" % (tag, val, tag)
    result += "  </item>\n"

# Close our pseudo-XML string.
result += "</myItems>"

# Write the result string to a file using the standard I/O.
f = open("myfile.xml", "w")
f.write(result)
f.close()
But as I put in the comments, a lot depends on the prescribed XML format. Once we know that, we might (and I guess, should) use a real XML-processing module like xml.dom.minidom instead of writing strings out to file like this.
__________________
I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
** Independent Multimedia Artist in Pasig **
Samhain13 is offline   Reply With Quote
Old 11-15-2010, 02:39 PM   PM User | #3
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Thanks for the response Samhain!! If I sent you the XML and XLS, would that be more helpful?? I also have an XSD for validation (but that is a lesser issue at this point).
grazopper is offline   Reply With Quote
Old 11-16-2010, 02:40 AM   PM User | #4
Samhain13
Regular Coder

 
Samhain13's Avatar
 
Join Date: Aug 2008
Location: Pilipinas
Posts: 165
Thanks: 4
Thanked 18 Times in 18 Posts
Samhain13 is on a distinguished road
^ Of course, the XML and sample XLS would be helpful. Maybe you can attach them in your next reply so we can have a look?
__________________
I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
** Independent Multimedia Artist in Pasig **
Samhain13 is offline   Reply With Quote
Old 11-16-2010, 01:44 PM   PM User | #5
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Here you go!! I appreciate all the help and advice!!
Attached Files
File Type: zip AAP PILS Sample File.zip (45.0 KB, 238 views)
grazopper is offline   Reply With Quote
Old 11-17-2010, 02:35 PM   PM User | #6
Samhain13
Regular Coder

 
Samhain13's Avatar
 
Join Date: Aug 2008
Location: Pilipinas
Posts: 165
Thanks: 4
Thanked 18 Times in 18 Posts
Samhain13 is on a distinguished road
Woohoo, now that's an example! Anyway, it was a good exercise.

First things first: in order to keep my sanity, I needed to write a helper class that provides a few methods for manipulating the XML (and the nodes) that we have to build. I've attached the script as a TXT file, you should replace the extension to .py if you want to try it.

I also attached a sample usage script (just in case I get disconnected and not be able to get back right away). It has lots and lots of comments, which I'll just be repeating here in parts. Like the helper script, just change the extension to .py.

Now, on using the helper script.

We'll import our helper class, instaniate it, create our basic XML structure:
Code:
from xs2ml import xs2ml

# xs2ml takes the XLS filename as an argument on __init__.
x = xs2ml("sample.xls")

# Give this method the root node name and it's attribute-values, if any.
x.create_root_node("PIES", xmlns="http://www.aftermarket.org/eCommerce/Pies")

# Create our basic node structure. This method takes in the target parent node
# and the node names that need to be created under it. We'll get a list in return.
root_nodes = x.add_nodes(x.dom_.firstChild, "Header", "Items", "Trailer")
Start filling-up the Items node. This is a bit complicated because the values are coming from multiple worksheets, and some values are text nodes while others are attribute-value nodes. Anyway, we'll just use the `Main Item Data Tab` as the basis for our row count. Also, we'll start at row[2] because we know that's were the real data starts.

Code:
for i in range(2, x.sheets["Main Item Data Tab"].nrows):
    # Create the Item node in Items. We'll modify this node as we go along.
    item = x.add_nodes(root_nodes[1], "Item")[0]
    
    # Create an attribute called MaintenanceType. We know that the data for
    # this node is in the Digital Assets Tab, column[1].
    x.cell_to_node(item, "Digital Assets Tab", col=1, row=i, 
                   attr_name="MaintenanceType")
    
    # It seems the MaintenanceType value is reusable, so let's save it.
    MaintenanceType = item.attributes["MaintenanceType"].value
    
    # Create the ItemLevelGTIN node. We know the value is in column[8] of
    # Main Item Data. We also know about its GTINQualifier attribute in col[9].
    ItemLevelGTIN = x.add_nodes(item, "ItemLevelGTIN")[0]
    x.cell_to_node(ItemLevelGTIN, "Main Item Data Tab", col=8, row=i)
    x.cell_to_node(ItemLevelGTIN, "Main Item Data Tab", col=9, row=i,
                   attr_name="GTINQualifier")
    
    # This is going to be a long for loop... but you get the idea.
    
    # One last example, Descriptions nodes:
    Descriptions = x.add_nodes(item, "Descriptions")[0]
    d_nodes = x.add_nodes(Descriptions, "Description", "Description")
    
    # We know what DescriptionCode to put in. LanguageCode, I don't know
    # where to find it but either use the setAttribute way or the cell_to_node
    # way if you know the sheet and cell the value is in.
    d_nodes[0].setAttribute("DescriptionCode", "DES")
    d_nodes[1].setAttribute("DescriptionCode", "SHO")
    
    # MaintenanceType, we'll just get from item since we know that already.
    d_nodes[0].setAttribute("MaintenanceType", MaintenanceType)
    d_nodes[1].setAttribute("MaintenanceType", MaintenanceType)
    
    # Short and long descriptions, according to the XLS file.
    x.cell_to_node(d_nodes[0], "Main Item Data Tab", col=16, row=i)
    x.cell_to_node(d_nodes[1], "Main Item Data Tab", col=15, row=i)
The cell_to_node method, by the way, takes the following arguments:
parent node, worksheet name, column, row, and attribute name (optional, None is default)

cell_to_node reads the cell value of column:row. By default, this value is turned into a text node and appended to the parent node, which results in something like <parent>value</parent>. But if the attribute name is given, an attribute is created within parent and given the cell value, like: <parent attribute_name="value" />

---

Composing the Header and Trailer is more straight forward because their child node values come from a single worksheet and we can easily where the values are coming from. We'll just use some dictionaries here that contain each element's respective column and row. Then we'll feed these maps to a convenience method for generating simple, mapped nodes.

Code:
# Header.
h_map = {
    "PIESVersion": {"col": 1, "row": 2},
    "ParentAAIAID": {"col": 1, "row": 3},
    "BrandOwnerAAIAID": {"col": 1, "row": 4},
    "TechnicalContact": {"col": 1, "row": 5},
    "ContactEmail": {"col": 1, "row": 6},
    "BlanketEffectiveDate": {"col": 1, "row": 7},
}

# Trailer.
t_map = {
    "ItemCount": {"col": 1, "row": 10},
    "TransactionDate": {"col": 1, "row": 11},
}

x.map_simple(root_nodes[0], "Header and Trailer Tab", h_map)
x.map_simple(root_nodes[1], "Header and Trailer Tab", t_map)
The map_simple method takes these arguments:
parent node, worksheet name, and a map that we can feed to cell_to_node as **kargs.

---

Finally, we can choose to save the resulting XML to file or to print it on standard out. Then we'll free our resources (unlink the XML, etc.).

Code:
# x.save_to_file("filename.xml", overwrite=False)
print x.dom_.toprettyxml()
x.done()
Sorry if I didn't make too much sense but I hope this helps you even if it's only a bit.
Attached Files
File Type: txt xs2ml.txt (6.0 KB, 237 views)
File Type: txt xs2ml_usage.txt (3.4 KB, 183 views)
__________________
I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
** Independent Multimedia Artist in Pasig **

Last edited by Samhain13; 11-17-2010 at 03:10 PM..
Samhain13 is offline   Reply With Quote
Users who have thanked Samhain13 for this post:
grazopper (11-17-2010)
Old 11-17-2010, 04:53 PM   PM User | #7
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Wow!! Thank you so much!! I have read through the code, and it makes sense. I will try it out, and modify the code to accept the rest of the data. I cannot express how much I appreciate the help on this. I have read so many blogs, how-to sites, etc, but it didn't make a whole lot of sense.

I will let you know how it progresses!!
grazopper is offline   Reply With Quote
Old 11-18-2010, 08:13 PM   PM User | #8
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Ok, so I have taken the code, and completed the project. I had to make a couple of small tweaks to the code, but all in all the class you created worked beautifully!!

The only question I had was concerning looping. In one area, I need to manually set the Language Code (as an attribute) to EN. Currently, I am using this:


Code:
# Manually set the language code to EN
    e_nodes[0].setAttribute("LanguageCode", "EN")
    e_nodes[1].setAttribute("LanguageCode", "EN")
    e_nodes[2].setAttribute("LanguageCode", "EN")
    e_nodes[3].setAttribute("LanguageCode", "EN")
    e_nodes[4].setAttribute("LanguageCode", "EN")
    e_nodes[5].setAttribute("LanguageCode", "EN")
    e_nodes[6].setAttribute("LanguageCode", "EN")
    e_nodes[7].setAttribute("LanguageCode", "EN")
    e_nodes[8].setAttribute("LanguageCode", "EN")
    e_nodes[9].setAttribute("LanguageCode", "EN")
I attempted to create a loop by:

Code:
for node in e_nodes:
    e_node(node).setAttribute("LanguageCode", "EN")
However, I recieved an error. What would be the proper way to loop this?? Or did I do it correctly??

Once again, I can't thank you enough for your assistance!!!!
grazopper is offline   Reply With Quote
Old 11-19-2010, 02:00 AM   PM User | #9
Samhain13
Regular Coder

 
Samhain13's Avatar
 
Join Date: Aug 2008
Location: Pilipinas
Posts: 165
Thanks: 4
Thanked 18 Times in 18 Posts
Samhain13 is on a distinguished road
I'm glad the thing worked!

Code:
for node in e_nodes:
    node.setAttribute("LanguageCode", "EN")
__________________
I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
** Independent Multimedia Artist in Pasig **
Samhain13 is offline   Reply With Quote
Old 11-19-2010, 02:52 AM   PM User | #10
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Thumbs up

*smacks his forehead*

D'oh!! Kind of embarrassed that I missed that..., once again, I'm indebted. Thanks!!
grazopper is offline   Reply With Quote
Old 11-19-2010, 01:24 PM   PM User | #11
grazopper
New to the CF scene

 
Join Date: Nov 2010
Posts: 7
Thanks: 1
Thanked 0 Times in 0 Posts
grazopper is an unknown quantity at this point
Smile

This is what I have as final code. I'm sure there are things that could be tweaked, but I will work on that.

I added 2 functions to xs2ml for placing static text into nodes, and I created the rest of the code in xs2ml_usage to complete the xml.


Thanks again for all your help!!
Attached Files
File Type: txt xs2ml.txt (6.3 KB, 253 views)
File Type: txt xs2ml_usage.txt (16.7 KB, 220 views)
grazopper is offline   Reply With Quote
Old 11-19-2010, 03:14 PM   PM User | #12
Samhain13
Regular Coder

 
Samhain13's Avatar
 
Join Date: Aug 2008
Location: Pilipinas
Posts: 165
Thanks: 4
Thanked 18 Times in 18 Posts
Samhain13 is on a distinguished road
^Cool! This was actually a very enlightening side project, I'm now working on something that gets stuff from OpenDocument Spreadsheets. Putting what I've learned from here to practice.
__________________
I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
** Independent Multimedia Artist in Pasig **
Samhain13 is offline   Reply With Quote
Reply

Bookmarks

Tags
csv, python, xml

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:54 PM.


Advertisement
Log in to turn off these ads.