Saturday, February 7, 2015

py10. Reading a csv table in Python

This data is from https://raw.githubusercontent.com/datasets/population/master/data/population.csv. Download the csv file into the working directory. If you want, you can write the url, instead of the filename in the read_csv call.


The read_csv function from the pandas module is used to read text csv file and return a DataFrame.


If working in IPython, typing the DataFrame, such as df, pressing period, and then hitting tab will list the methods that may be applied. To get help on a particular function, you can type ? at the end such as df.head?

# ex10.py
from __future__ import division, print_function
import matplotlib.pyplot as plt
from pandas import read_csv
df = read_csv('population.csv')
print('***type of df = \n',type(df))
print('\n***The first 5 rows are\n',df.head())
print('\n***The last 5 rows are\n',df.tail())
print('\n***There are %d rows.' % len(df))
df1 = df[df['Country Name'] == 'United States']
df2 = df[df['Country Name'] == 'Arab World']
plt.plot(df1['Year'],df1['Value']/1e6,'b',
         df2['Year'],df2['Value']/1e6,'g')
plt.xlabel('Year')
plt.ylabel('Popuplation (millions)')
plt.title('USA - blue, Arab World - green')
plt.show()
#***type of df =
#  <class 'pandas.core.frame.DataFrame'>
#
#***The first 5 rows are
#   Country Name Country Code  Year        Value
#0   Arab World          ARB  1960   96388069.0
#1   Arab World          ARB  1961   98882541.4
#2   Arab World          ARB  1962  101474075.8
#3   Arab World          ARB  1963  104169209.2
#4   Arab World          ARB  1964  106978104.6
#
#***The last 5 rows are
#       Country Name Country Code  Year     Value
#12402     Zimbabwe          ZWE  2006  12529655
#12403     Zimbabwe          ZWE  2007  12481245
#12404     Zimbabwe          ZWE  2008  12451543
#12405     Zimbabwe          ZWE  2009  12473992
#12406     Zimbabwe          ZWE  2010  12571000
#
#***There are 12407 rows.

Output:

No comments:

Post a Comment