Veritas Logic & Computation Data Systems & Algorithms Homework 2:
Tom Sawyer Most Common Word Phrases

Welcome to Hash Maps. A quick run down with hash maps is that they are an array that can point one set of data inputs to another in runtime O(1). A hash map takes two values, a "key" and a "value". Each key points to its own value, which can be of different or similar variable types, I.E., String keys pointing to Int values. For example, using hash maps, you can keep track of inventory by making the "key" value the item, and the "value" value be the number of stock that you own of that item. const fruits = new Map([ ["apples", 500], ["bananas", 300], ["oranges", 200] ]); You would have 500 apples, 300 bananas, and 200 oranges with this hash map. Your assignment is as follows: Given the entirety of The Adventures of Tom Sawyer, you are to write a program that will return the top 1-word phrases, 2-word phrases, 3-word phrases, 4-word phrases, 5-word phrases, 6-word phrases, 7-word phrases, 8-word phrases, 9-word phrases, and 10-word phrases in the book, with the challenge being that the program has to do it under two minutes. Here is the novel: Tom Sawyer novel!!! Download all of the text as its own text file, and then read up how to open up .txt files with your IDE and language as input to use. You should struggle a bit doing this assignment, as it is more difficult than your previous one. If your code doesn't seem to be working, remember that it's perfectly ok to restart from scratch. Hints: - Start small: get the 1-word phrase counter working first before tackling multi-word phrases. - Think about how you can use a sliding window to grab N consecutive words from the text. - Be mindful of punctuation and capitalization — should "The" and "the" count as the same word? - A HashMap<String, Integer> is your best friend here. The key is the phrase, the value is how many times you've seen it. - To find the "top" phrase, you don't need to sort the entire map — but sorting it is one valid approach. - If your program is too slow, think about where you're doing redundant work. Are you scanning the entire text more times than you need to?