Sunday, May 31, 2015

nlp26. Reading a corpus using WordListCorpusReader in Python NLTK

We use WordListCorpusReader to read a corpus, that was created in the last program. We import that Reader as WLCR.


We have to give the list of the files as the second parameter of WLCR.


The LazyCorpusLoader, and not WordListCorpusReader, is the main Reader used for corpus loading, and which we will go over that later.

# nlp26.py
from __future__ import print_function
from nltk.data import path
from nltk.corpus.reader import WordListCorpusReader as WLCR
path = path[0] + '/MyTestCorpus'
fileids = ['Test1','Test2']
reader = WLCR(path, fileids)
word1 = reader.words(fileids[0])
print("words1 =\n",word1)
word2 = reader.words(fileids[1])
print("words2 =\n",word2)

# words1 =
#  [u'One', u'Two', u'Five']
# words2 =
#  [u'Three', u'Four', u'Seven']

No comments:

Post a Comment