We can access a specific text within a corpus by using a fileid.
The length of inaugural, that is, len(inaugural.words()) is 145735. However, by putting a fileid, in the call to the words method, we can select only a particular text.
The particular text we selected has a world length of, that is, len(inaugural.words('1789-Washington.txt')) is equal to 1538. We can use the fileids attribute of inaugural, or whatever the corpus happens to be, to get a list with the text names.
The first few words of the first inaugural is printed.
# nlp12.py
from __future__ import print_function, division
from nltk.corpus import inaugural
A = inaugural.fileids()
s = 2*' '
for a in A[:5]:
print(s+a)
B = inaugural.words(A[0])
for b in B[:20]:
print(b, end = s)
# 1789-Washington.txt
# 1793-Washington.txt
# 1797-Adams.txt
# 1801-Jefferson.txt
# 1805-Jefferson.txt
# Fellow - Citizens of the Senate and of
# the House of Representatives : Among the
# vicissitudes incident to life no
No comments:
Post a Comment