Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 13 of 13
  1. #1
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Excel to XML with Python

    I really need some help. I am a relatively new Python programmer, but I am not that familiar with XML. I have been tasked with creating an XML file from an Excel spreadsheet. The structure of the XML file is set by my vendor. I have a sample XML file, as well as an XSD file, but I am not sure how to progress.

    I have been able to pull the data out of the Excel file in to a CSV using lxrd, but I am stuck there. I have created a header row in my CSV to create the appropriate tags in XML, but I cannot get the format to match the sample XML.

    My next thought was to replace all the data in the sample XML file with variables (e.g. $2C), and to do a search and replace using the CSV. This seems to be very awkward (and truthfully, I don't know how to script that).

    I am not opposed to doing the work, and I would love to actually learn how to do this, but I could really use some assistance.

    Thanks in advance!!

  • #2
    Regular Coder Samhain13's Avatar
    Join Date
    Aug 2008
    Location
    Pilipinas
    Posts
    169
    Thanks
    4
    Thanked 18 Times in 18 Posts
    It can all be as easy as this:

    Code:
    # xlrd usage, see: http://scienceoss.com/read-excel-files-from-python/
    import xlrd
    
    # Open the XLS file as a "workbook" and select the first sheet as our source.
    # We'll get our tag names from the sheet's first row contents.
    wb = xlrd.open_workbook("my_excel_file.xls")
    sh = wb.sheet_by_index(0)
    tags = [n.replace(" ", "").lower() for n in sh.row_values(0)]
    
    # This is going to come out as a string, which will write to a file in the end.
    result = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<myItems>\n"
    
    # Now, we'll just create a string that looks like an XML node for each row
    # in the sheet. Of course, a lot of things will depend on the prescribed XML
    # format but since we have no idea what it is, we'll just do this:
    for row in range(1, sh.nrows):
        result += "  <item>\n"
        for i in range(len(tags)):
            tag = tags[i].encode("utf-8")
            val = sh.row_values(row)[i].encode("utf-8")
            result += "    <%s>%s</%s>\n" % (tag, val, tag)
        result += "  </item>\n"
    
    # Close our pseudo-XML string.
    result += "</myItems>"
    
    # Write the result string to a file using the standard I/O.
    f = open("myfile.xml", "w")
    f.write(result)
    f.close()
    But as I put in the comments, a lot depends on the prescribed XML format. Once we know that, we might (and I guess, should) use a real XML-processing module like xml.dom.minidom instead of writing strings out to file like this.
    I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
    ** Independent Multimedia Artist in Pasig **

  • #3
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thanks for the response Samhain!! If I sent you the XML and XLS, would that be more helpful?? I also have an XSD for validation (but that is a lesser issue at this point).

  • #4
    Regular Coder Samhain13's Avatar
    Join Date
    Aug 2008
    Location
    Pilipinas
    Posts
    169
    Thanks
    4
    Thanked 18 Times in 18 Posts
    ^ Of course, the XML and sample XLS would be helpful. Maybe you can attach them in your next reply so we can have a look?
    I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
    ** Independent Multimedia Artist in Pasig **

  • #5
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Here you go!! I appreciate all the help and advice!!
    Attached Files Attached Files

  • #6
    Regular Coder Samhain13's Avatar
    Join Date
    Aug 2008
    Location
    Pilipinas
    Posts
    169
    Thanks
    4
    Thanked 18 Times in 18 Posts
    Woohoo, now that's an example! Anyway, it was a good exercise.

    First things first: in order to keep my sanity, I needed to write a helper class that provides a few methods for manipulating the XML (and the nodes) that we have to build. I've attached the script as a TXT file, you should replace the extension to .py if you want to try it.

    I also attached a sample usage script (just in case I get disconnected and not be able to get back right away). It has lots and lots of comments, which I'll just be repeating here in parts. Like the helper script, just change the extension to .py.

    Now, on using the helper script.

    We'll import our helper class, instaniate it, create our basic XML structure:
    Code:
    from xs2ml import xs2ml
    
    # xs2ml takes the XLS filename as an argument on __init__.
    x = xs2ml("sample.xls")
    
    # Give this method the root node name and it's attribute-values, if any.
    x.create_root_node("PIES", xmlns="http://www.aftermarket.org/eCommerce/Pies")
    
    # Create our basic node structure. This method takes in the target parent node
    # and the node names that need to be created under it. We'll get a list in return.
    root_nodes = x.add_nodes(x.dom_.firstChild, "Header", "Items", "Trailer")
    Start filling-up the Items node. This is a bit complicated because the values are coming from multiple worksheets, and some values are text nodes while others are attribute-value nodes. Anyway, we'll just use the `Main Item Data Tab` as the basis for our row count. Also, we'll start at row[2] because we know that's were the real data starts.

    Code:
    for i in range(2, x.sheets["Main Item Data Tab"].nrows):
        # Create the Item node in Items. We'll modify this node as we go along.
        item = x.add_nodes(root_nodes[1], "Item")[0]
        
        # Create an attribute called MaintenanceType. We know that the data for
        # this node is in the Digital Assets Tab, column[1].
        x.cell_to_node(item, "Digital Assets Tab", col=1, row=i, 
                       attr_name="MaintenanceType")
        
        # It seems the MaintenanceType value is reusable, so let's save it.
        MaintenanceType = item.attributes["MaintenanceType"].value
        
        # Create the ItemLevelGTIN node. We know the value is in column[8] of
        # Main Item Data. We also know about its GTINQualifier attribute in col[9].
        ItemLevelGTIN = x.add_nodes(item, "ItemLevelGTIN")[0]
        x.cell_to_node(ItemLevelGTIN, "Main Item Data Tab", col=8, row=i)
        x.cell_to_node(ItemLevelGTIN, "Main Item Data Tab", col=9, row=i,
                       attr_name="GTINQualifier")
        
        # This is going to be a long for loop... but you get the idea.
        
        # One last example, Descriptions nodes:
        Descriptions = x.add_nodes(item, "Descriptions")[0]
        d_nodes = x.add_nodes(Descriptions, "Description", "Description")
        
        # We know what DescriptionCode to put in. LanguageCode, I don't know
        # where to find it but either use the setAttribute way or the cell_to_node
        # way if you know the sheet and cell the value is in.
        d_nodes[0].setAttribute("DescriptionCode", "DES")
        d_nodes[1].setAttribute("DescriptionCode", "SHO")
        
        # MaintenanceType, we'll just get from item since we know that already.
        d_nodes[0].setAttribute("MaintenanceType", MaintenanceType)
        d_nodes[1].setAttribute("MaintenanceType", MaintenanceType)
        
        # Short and long descriptions, according to the XLS file.
        x.cell_to_node(d_nodes[0], "Main Item Data Tab", col=16, row=i)
        x.cell_to_node(d_nodes[1], "Main Item Data Tab", col=15, row=i)
    The cell_to_node method, by the way, takes the following arguments:
    parent node, worksheet name, column, row, and attribute name (optional, None is default)

    cell_to_node reads the cell value of column:row. By default, this value is turned into a text node and appended to the parent node, which results in something like <parent>value</parent>. But if the attribute name is given, an attribute is created within parent and given the cell value, like: <parent attribute_name="value" />

    ---

    Composing the Header and Trailer is more straight forward because their child node values come from a single worksheet and we can easily where the values are coming from. We'll just use some dictionaries here that contain each element's respective column and row. Then we'll feed these maps to a convenience method for generating simple, mapped nodes.

    Code:
    # Header.
    h_map = {
        "PIESVersion": {"col": 1, "row": 2},
        "ParentAAIAID": {"col": 1, "row": 3},
        "BrandOwnerAAIAID": {"col": 1, "row": 4},
        "TechnicalContact": {"col": 1, "row": 5},
        "ContactEmail": {"col": 1, "row": 6},
        "BlanketEffectiveDate": {"col": 1, "row": 7},
    }
    
    # Trailer.
    t_map = {
        "ItemCount": {"col": 1, "row": 10},
        "TransactionDate": {"col": 1, "row": 11},
    }
    
    x.map_simple(root_nodes[0], "Header and Trailer Tab", h_map)
    x.map_simple(root_nodes[1], "Header and Trailer Tab", t_map)
    The map_simple method takes these arguments:
    parent node, worksheet name, and a map that we can feed to cell_to_node as **kargs.

    ---

    Finally, we can choose to save the resulting XML to file or to print it on standard out. Then we'll free our resources (unlink the XML, etc.).

    Code:
    # x.save_to_file("filename.xml", overwrite=False)
    print x.dom_.toprettyxml()
    x.done()
    Sorry if I didn't make too much sense but I hope this helps you even if it's only a bit.
    Attached Files Attached Files
    Last edited by Samhain13; 11-17-2010 at 03:10 PM.
    I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
    ** Independent Multimedia Artist in Pasig **

  • Users who have thanked Samhain13 for this post:

    grazopper (11-17-2010)

  • #7
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Wow!! Thank you so much!! I have read through the code, and it makes sense. I will try it out, and modify the code to accept the rest of the data. I cannot express how much I appreciate the help on this. I have read so many blogs, how-to sites, etc, but it didn't make a whole lot of sense.

    I will let you know how it progresses!!

  • #8
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Ok, so I have taken the code, and completed the project. I had to make a couple of small tweaks to the code, but all in all the class you created worked beautifully!!

    The only question I had was concerning looping. In one area, I need to manually set the Language Code (as an attribute) to EN. Currently, I am using this:


    Code:
    # Manually set the language code to EN
        e_nodes[0].setAttribute("LanguageCode", "EN")
        e_nodes[1].setAttribute("LanguageCode", "EN")
        e_nodes[2].setAttribute("LanguageCode", "EN")
        e_nodes[3].setAttribute("LanguageCode", "EN")
        e_nodes[4].setAttribute("LanguageCode", "EN")
        e_nodes[5].setAttribute("LanguageCode", "EN")
        e_nodes[6].setAttribute("LanguageCode", "EN")
        e_nodes[7].setAttribute("LanguageCode", "EN")
        e_nodes[8].setAttribute("LanguageCode", "EN")
        e_nodes[9].setAttribute("LanguageCode", "EN")
    I attempted to create a loop by:

    Code:
    for node in e_nodes:
        e_node(node).setAttribute("LanguageCode", "EN")
    However, I recieved an error. What would be the proper way to loop this?? Or did I do it correctly??

    Once again, I can't thank you enough for your assistance!!!!

  • #9
    Regular Coder Samhain13's Avatar
    Join Date
    Aug 2008
    Location
    Pilipinas
    Posts
    169
    Thanks
    4
    Thanked 18 Times in 18 Posts
    I'm glad the thing worked!

    Code:
    for node in e_nodes:
        node.setAttribute("LanguageCode", "EN")
    I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
    ** Independent Multimedia Artist in Pasig **

  • #10
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Thumbs up

    *smacks his forehead*

    D'oh!! Kind of embarrassed that I missed that..., once again, I'm indebted. Thanks!!

  • #11
    New to the CF scene
    Join Date
    Nov 2010
    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Smile

    This is what I have as final code. I'm sure there are things that could be tweaked, but I will work on that.

    I added 2 functions to xs2ml for placing static text into nodes, and I created the rest of the code in xs2ml_usage to complete the xml.


    Thanks again for all your help!!
    Attached Files Attached Files

  • #12
    Regular Coder Samhain13's Avatar
    Join Date
    Aug 2008
    Location
    Pilipinas
    Posts
    169
    Thanks
    4
    Thanked 18 Times in 18 Posts
    ^Cool! This was actually a very enlightening side project, I'm now working on something that gets stuff from OpenDocument Spreadsheets. Putting what I've learned from here to practice.
    I am a Man of Truth. I am a Free Human Person. I am a Peacemaker.
    ** Independent Multimedia Artist in Pasig **

  • #13
    New to the CF scene
    Join Date
    Oct 2013
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Samhain13,
    I really appreciate the work you have done here. i have a similar project at hand and wouldn't mind any help from you.

    and if there is a way i can link up with you, outside of this group, kindly let me know... i already tried your yahoo messenger and Skype with no success


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •