Python Data Structures and Algorithms
上QQ阅读APP看书,第一时间看更新

Dictionaries for text analysis

A common use of dictionaries is to count the occurrences of like items in a sequence; a typical example is counting the occurrences of words in a body of text. The following code creates a dictionary where each word in the text is used as a key and the number of occurrences as its value. This uses a very common idiom of nested loops. Here we are using it to traverse the lines in a file in an outer loop and the keys of a dictionary on the inner loop:

def wordcount(fname): 
try:
fhand=open(fname)
except:
print('File cannot be opened')
exit()

count= dict()
for line in fhand:
words = line.split()
for word in words:
if word not in count:
count[word] = 1
else:
count[word] += 1
return(count)

This will return a dictionary with an element for each unique word in the text file. A common task is to filter items such as these into subsets we are interested in. You will need a text file saved in the same directory as you run the code. Here we have used alice.txt, a short excerpt from Alice in Wonderland. To obtain the same results, you can download alice.txt from davejulian.net/bo5630, or use a text file of your own. In the following code, we create another dictionary, filtered, containing a subset of items from count:

count=wordcount('alice.txt') 
filtered = { key:value for key, value in count.items() if value < 20 and value > 15 }

When we print the filtered dictionary, we get the following:

Note the use of the dictionary comprehension used to construct the filtered dictionary. Dictionary comprehensions work in an identical way to the list comprehensions we looked at in Chapter 1, Python Objects, Types, and Expressions.