# Hash Table Word Count

Problem:

Write a program which takes as its input a `String` of natural language text and outputs a `HashMap<String,Integer>` whose keys are the unique words in the input and whose values are the number of times those words occur. The algorithm should be case-insensitive (e.g. “Program” and “program” would count as the same word) and ignore punctuation and whitespace.

Example:

Given the input `"To be or not to be, that is the question"`, the outputted `HashMap` would contain 8 entries, with two words having a count of 2 and six words having a count of 1:

``````"to"        → 2
"be"        → 2
"or"        → 1
"not"       → 1
"that"      → 1
"is"        → 1
"the"       → 1
"question"  → 1``````

### 1. Understand

``````# Write a program that takes a String as input
# And outputs a dictionary with String:Int key:value pairs
# The keys are unique words (case insensitive
# and ignoring whitespace and punctuation)
# Values are frequency counts of that word in the string

## Example input: "To be or not to be, that is the question"
## Should output:
# "to"        → 2
# "be"        → 2
# "or"        → 1
# "not"       → 1
# "that"      → 1
# "is"        → 1
# "the"       → 1
# "question"  → 1``````

### 2. Match

``````# This is a string processing problem
# Can use Python's split() function to get individual words
# in a list and use .lower() to convert each word to lowercase
# and strip() to strip whitespace and punctuation``````

### 3. Plan / Pseudocode

``````# Start with empty dictionary and input string
# First, convert string to a list with split()
# Second, iterate through each word in the list and:
# * convert to lowercase lower()
# * strip whitespace and punct strip()
# * check if its in the dictionary
#   * if it's not, add it with a value of 1
#   * if it is, increment the value associated with the key
# return the dictionary``````

### 6. Evaluate

Currently all non-alphabetical characters are stripped from the string by using regex.

The algorithm runs at:

O(N) Time and Space