Tuesday, March 3, 2015

py30. Reading and parsing XML document in Python

We will use a test xml file, from python documentation country_data.xml.


The module bs4, Beautiful Soup 4, is used to parse the document. If this module is not installed, you will have to install it using the methods described at the their website, such as pip. If using a distribution like Anaconda, use the pip inside the distribution. It is possible to use other modules as described in the python link.


The name of the country, the year, as well as the neighbours are returned for all 3 countries in document.

# ex30.py
from __future__ import print_function, division
from bs4 import BeautifulSoup
doc = BeautifulSoup(open('ex30data.xml'),"xml")
country = doc.findAll("country")
for c in country:
    print("\n",c.attrs['name'],sep = "")
    y = c.find('year')
    print('year :',y.text)
    n = c.findAll('neighbor')
    for i,ni in enumerate(n):
        print("neighbor",i, end=" ", sep=" : ")
        print(ni.attrs)
    
#Liechtenstein
#year : 2008
#neighbor : 0 {'direction': u'E', 'name': u'Austria'}
#neighbor : 1 {'direction': u'W', 'name': u'Switzerland'}
#
#Singapore
#year : 2011
#neighbor : 0 {'direction': u'N', 'name': u'Malaysia'}
#
#Panama
#year : 2011
#neighbor : 0 {'direction': u'W', 'name': u'Costa Rica'}
#neighbor : 1 {'direction': u'E', 'name': u'Colombia'}

No comments:

Post a Comment