Thursday, May 28, 2015

nlp20. Chunking in Python NLTK

We can break a text into chunks. This is the chunk structure we use: determiner (0 or 1), adjective (0 or more), noun.


The part of speech structure is sent to nltk.chunk.RegexpParser. We use the parse method of this class on our tagged text.


We get 3 chunks. We should see that the only required element of a chunk, used in this example, is a noun.

# nlp20.py
from __future__ import print_function
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.chunk import RegexpParser
text = "The big dog barked at the little cat. The cat ran away."
tag_text = [tag for tag in pos_tag(word_tokenize(text))]

chunk = "CHNK: {<DT>?<JJ>*<NN>}"
cp = RegexpParser(chunk)
result = cp.parse(tag_text)
print(result)
result.draw()

#(S
#  (CHNK The/DT big/JJ dog/NN)
#  barked/VBD
#  at/IN
#  (CHNK the/DT little/JJ cat/NN)
#  ./.
#  (CHNK The/DT cat/NN)
#  ran/VBD
#  away/RB
#  ./.)

Output:

No comments:

Post a Comment