Sunday, May 24, 2015

nlp2. Concordance in Python NLTK

Concordance gives the context of some text inside a corpus.


Here, we iterate over three strings in a Python list and see what is contained in Wall Street Journal for those entries.


Unlike the count method, which returns the integer, the concordance method returns None, but just prints its results.

# nlp2.py
from __future__ import print_function, division
from nltk.book import text7
print('text7 =',text7)
print('text 7 length =',len(text7))
St = ["Indonesia","Singapore","Malaysia"]
for st in St:
    n = text7.count(st)
    print("The string %s ocurrs %d times" % (st,n))
    print("The occurences:")
    text7.concordance(st,50)
    
#    text7 = <Text: Wall Street Journal>
#    text 7 length = 100676
#    The string Indonesia ocurrs 2 times
#    The occurences:
#    Displaying 2 of 2 matches:
#     and export them to Indonesia . `` The effect wil
#    aysia , Singapore , Indonesia , the Philippines a
#    The string Singapore ocurrs 4 times
#    The occurences:
#    Displaying 4 of 4 matches:
#     tobacco smoke . In Singapore , a new law require
#    cial said 0 *T*-1 . Singapore already bans smokin
#    ailand , Malaysia , Singapore , Indonesia , the P
#    es closed higher in Singapore , Taipei and Wellin
#    The string Malaysia ocurrs 6 times
#    The occurences:
#    Displaying 6 of 6 matches:
#    ing slow progress in Malaysia . '' She did n't ela
#    eocassette piracy in Malaysia and disregard for U.
#    ood restaurants . In Malaysia , Siti Zaharah Sulai
#    such as Thailand and Malaysia , the investment wil
#    assemble the sets in Malaysia and export them to I
#    ations -- Thailand , Malaysia , Singapore , Indone

No comments:

Post a Comment