lxml - How to Generate Elements with Namespaces

lxml is an amazing Python package. However, if you work seriously with XML namespaces, you will need more than the two examples included in the otherwise great tutorial. That's why I want to share with the community what I have learned so far.

Laying the foundation

Let's define some small stuff to avoid verbosity

from lxml import etree
Element    = etree.Element
SubElement = etree.SubElement

def pprint (root) :
    print (etree.tostring (root, pretty_print = True, xml_declaration = True))

Here is our simple XML document

videos  = Element ("VIDEOS")
movie   = SubElement (videos, "MOVIE")
series  = SubElement (videos, "TV-SERIES")

Which our pprint function renders as

<?xml version='1.0' encoding='ASCII'?>
<VIDEOS>
  <MOVIE/>
  <TV-SERIES/>
</VIDEOS>

Now we can start decorating the XML tree

Our first namespace

Our first step is to assign a namespace to the top-level element

top_ns         = "http://duenas.at/media"
namespace_dict = {"m" : top_ns}
videos         = Element ( "{%s}%s" % (top_ns, "VIDEOS")
                         , nsmap = namespace_dict
                         )
movie          = SubElement (videos, "MOVIE")
series         = SubElement (videos, "TV-SERIES")
pprint (videos)

Note the namespace information and the m: prefix:

<?xml version='1.0' encoding='ASCII'?>
<m:VIDEOS xmlns:m="http://duenas.at/media">
  <MOVIE/>
  <TV-SERIES/>
</m:VIDEOS>

Before moving on, let's encapsulate the making of the somewhat cryptic first parameter of the Element function:

lxml_tag = lambda namespace, tag : "{%s}%s" % (namespace, tag)

Now we can assign the subelements to the top-level namespace in a simple and readable way.

videos = Element ( lxml_tag (top_ns, "VIDEOS")
                 , nsmap = namespace_dict
                 )
movie  = SubElement ( videos
                    , lxml_tag (top_ns, "MOVIE")
                    , nsmap = namespace_dict
                    )
series = SubElement ( videos
                    , lxml_tag (top_ns, "TV-SERIES")
                    , nsmap = namespace_dict
                    )

This is how our pprint function renders the XML document's latest version:

<?xml version='1.0' encoding='ASCII'?>
<m:VIDEOS xmlns:m="http://duenas.at/media">
  <m:MOVIE/>
  <m:TV-SERIES/>
</m:VIDEOS>

Adding a second namespace

If we use two namespaces then the corresponding dictionary must have two entries:

top_ns         = "http://duenas.at/media"
db_ns          = "http://duenas.at/database"
namespace_dict = { "m"  : top_ns
                 , "db" : db_ns
                 }

Now we can introduce a new element which belongs to the second namespace:

videos    = Element ( lxml_tag (top_ns, "VIDEOS")
                    , nsmap = namespace_dict
                    )
movie     = SubElement ( videos
                       , lxml_tag (top_ns, "MOVIE")
                       , nsmap = namespace_dict
                       )
name      = SubElement ( movie
                       , lxml_tag (db_ns, "NAME")
                       , nsmap = namespace_dict
                       )
name.text = "The Godfather"
series    = SubElement ( videos
                       , lxml_tag (top_ns, "TV-SERIES")
                       , nsmap = namespace_dict
                       )

Note both namespaces as attributes of the top-level element and the db: prefix in the new element:

<?xml version='1.0' encoding='ASCII'?>
<m:VIDEOS xmlns:db="http://duenas.at/database" xmlns:m="http://duenas.at/media">
  <m:MOVIE>
    <db:NAME>The Godfather</db:NAME>
  </m:MOVIE>
  <m:TV-SERIES/>
</m:VIDEOS>

That's all for now! This is already a non-trivial XML document.

Final Word

If you have read this far, you are most likely a serious lxml user. Please consider supporting the lxml project (look for the donate button on the lxml website)

P.S.: If you liked this page, share it on Twitter (hashtag: #lxmlNamespaces).