Tuesday, May 26, 2015

nlp12. Fileids in Python NLTK

We can access a specific text within a corpus by using a fileid.


The length of inaugural, that is, len(inaugural.words()) is 145735. However, by putting a fileid, in the call to the words method, we can select only a particular text.


The particular text we selected has a world length of, that is, len(inaugural.words('1789-Washington.txt')) is equal to 1538. We can use the fileids attribute of inaugural, or whatever the corpus happens to be, to get a list with the text names.


The first few words of the first inaugural is printed.

# nlp12.py
from __future__ import print_function, division
from nltk.corpus import inaugural
A = inaugural.fileids()
s = 2*' '
for a in A[:5]:
    print(s+a)
B = inaugural.words(A[0])
for b in B[:20]:
    print(b, end = s)

#  1789-Washington.txt
#  1793-Washington.txt
#  1797-Adams.txt
#  1801-Jefferson.txt
#  1805-Jefferson.txt
# Fellow  -  Citizens  of  the  Senate  and  of
# the  House  of  Representatives  :  Among  the
# vicissitudes  incident  to  life  no  

No comments:

Post a Comment