Link Search Menu Expand Document

XML Files

Python XML with ElementTree: Beginner’s Guide

# scores dictionary
scores_xml = """
<?xml version="1.0"?>
<scores>
    <score>
        <date>2019-05-01</date>
        <home-team>Pirates</home-team>
        <home-score>0</home-score>
        <away-team>Cubs</away-team>
        <away-score>10</away-score>
    </score>
    <score>
        <date>2019-05-15</date>
        <home-team>Reds</home-team>
        <home-score>7</home-score>
        <away-team>Pirates</away-team>
        <away-score>0</away-score>
    </score>
</scores>
"""
print(scores_xml)
<?xml version="1.0"?>
<scores>
    <score>
        <date>2019-05-01</date>
        <home-team>Pirates</home-team>
        <home-score>0</home-score>
        <away-team>Cubs</away-team>
        <away-score>10</away-score>
    </score>
    <score>
        <date>2019-05-15</date>
        <home-team>Reds</home-team>
        <home-score>7</home-score>
        <away-team>Pirates</away-team>
        <away-score>0</away-score>
    </score>
</scores>
scores_xsd = """
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="scores">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="score" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:date" name="date"/>
              <xs:element type="xs:string" name="home-team"/>
              <xs:element type="xs:int" name="home-score"/>
              <xs:element type="xs:string" name="away-team"/>
              <xs:element type="xs:int" name="away-score"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>
"""
print(scores_xsd)
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="scores">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="score" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:date" name="date"/>
              <xs:element type="xs:string" name="home-team"/>
              <xs:element type="xs:int" name="home-score"/>
              <xs:element type="xs:string" name="away-team"/>
              <xs:element type="xs:int" name="away-score"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Parsing using ElementTree

import xml.etree.ElementTree as ET
tree = ET.parse('scores.xml')
root = tree.getroot()
print(root.tag)
scores
print(root.attrib)
{'{http://www.w3.org/2001/XMLSchema-instance}noNamespaceSchemaLocation': 'scores.xsd'}

for loops

for child in root:
    print(child.tag, child.attrib)
score {}
score {}

Typically it is helpful to know all the elements in the entire tree. A useful function for doing that is root.iter(). You can put this function into a “for” loop and it will iterate over the entire tree.

for elem in root.iter():
    print(elem.tag)
scores
score
date
home-team
home-score
away-team
away-score
score
date
home-team
home-score
away-team
away-score

If you need to print the entire string you can use ElementTree.tostring(). However, you must specify both the encoding and decoding of the document you are displaying as the string because ElementTree is a powerful library that can interpret more than just XML, For XMLs, use utf8 - this is the typical document format type for an XML.

print(ET.tostring(root, encoding='utf8').decode('utf8'))
<?xml version='1.0' encoding='utf8'?>
<scores xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="scores.xsd">
    <score>
        <date>2019-05-01</date>
        <home-team>Pirates</home-team>
        <home-score>0</home-score>
        <away-team>Cubs</away-team>
        <away-score>10</away-score>
    </score>
    <score>
        <date>2019-05-15</date>
        <home-team>Reds</home-team>
        <home-score>7</home-score>
        <away-team>Pirates</away-team>
        <away-score>0</away-score>
    </score>
</scores>

Iterating over specific elements

for game_date in root.iter('date'):
    print(game_date.text)
2019-05-01
2019-05-15

XPath Expressions

for score in root.findall('./score[date="2019-05-01"]/date'):
    print(score.text)
2019-05-01

Writing xml files

tree.write("scores2.xml")