Thursday, May 28, 2015

nlp18. Stemmer in Python NLTK

We use PorterStemmer for stemming a bunch of words.


Since PorterStemmer is a class as seen from the beggining capital letter (the convention), we have to first make an object and then use a method. Since we will only use the method stem, the object is not stored but only the method.

# nlp18.py
from __future__ import print_function
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
text = """
cats catlike cat stemmer stemming stemmed stem
fishing fished fisher fish argue argued argues
arguing argument arguments
"""
PS = PorterStemmer().stem
for a in word_tokenize(text):
    print('%10s --> %10s' % (a,PS(a)) )

#      cats -->        cat
#   catlike -->     catlik
#       cat -->        cat
#   stemmer -->    stemmer
#  stemming -->       stem
#   stemmed -->       stem
#      stem -->       stem
#   fishing -->       fish
#    fished -->       fish
#    fisher -->     fisher
#      fish -->       fish
#     argue -->       argu
#    argued -->       argu
#    argues -->       argu
#   arguing -->       argu
#  argument -->   argument
# arguments -->   argument

No comments:

Post a Comment