Getting Started with Python for the Internet of Things

上QQ阅读APP看书，第一时间看更新

How to do it...

Introduce sentence tokenization:

from nltk.tokenize import sent_tokenize

Form a new text tokenizer:

tokenize_list_sent = sent_tokenize(text)
print "nSentence tokenizer:" 
print tokenize_list_sent

Form a new word tokenizer:

from nltk.tokenize import word_tokenize 
print "nWord tokenizer:" 
print word_tokenize(text)

Introduce a new WordPunct tokenizer:

from nltk.tokenize import WordPunctTokenizer 
word_punct_tokenizer = WordPunctTokenizer() 
print "nWord punct tokenizer:" 
print word_punct_tokenizer.tokenize(text)

The result obtained by the tokenizer is shown here. It divides a sentence into word groups: