This year’s PyCon was again, extraordinary. Our heartfelt thanks go out to the entire community for such an amazing event. We especially enjoyed getting to know other engineers from all over the world.
In fact, we’ve found that one of the best ways to start a conversation with an engineer is to start by talking about code. Thumbtack’s second annual code challenge gave us a great opportunity to do this.
Last year’s challenge was to determine whether there was a winner on a given Connect 4 game board. This year’s challenge required looking for anagrams of word pairs in large bodies of text. Big thanks to Jeff Kramer, who posted the problem description online.
In response to feedback from last year’s challenge, this year’s anagram challenge had two components: an easier challenge that required finding anagrams in a paragraph of text, and a harder challenge that looked for anagrams in the full corpus of Alice in Wonderland.
We offered two prize sets to match the two difficulty levels: a mug and sunglasses for the easier challenge, and a beer mug and shot glass for the harder challenge.
Solutions
The general form of solutions to this year’s challenge was fairly consistent. In fact, there were no solutions that deviated from this general pattern:
- Parse the input text to filter and transform input words
- Remove duplicate words (can be done at this point or later)
- Find all possible anagram pairs of those words (often using
itertools.combinations()
) - Create a data structure to hold matching anagram pairs, usually keyed off the sorted letters of the pair
We received 53 total (working) solutions to the Alice In Wonderland challenge.
The best solutions were all on the order of O(N2) (where N is the number of unique words in the corpus), while slower solutions were often O(N4). The difference was almost entirely due to the data structures used in step 4: faster solutions used a dictionary (search is constant time), while slower solutions used a list (search takes time O(N2), where search is linear on the number of word pairs).
We tested all the solutions on a modern Mac laptop and pulled out some of the fastest solutions to highlight here. O(N2) solutions generally run in the 15-40s range. We see a massive spike (literally, off this chart) for slower solutions.
Hall of Fame
Chris Moultrie’s solution was the fastest of all the solutions we received. On our testing machine, his solution to the Alice problem routinely ran in under 12 seconds. There are several clever optimizations specific to the challenge, so, while this is not a general-purpose solution, it is extremely quick. (Find Chris on GitHub and Twitter.)
Second place for speed goes to Łukasz Balcerzak. His solution routinely took ~13.5s. This is a more general-purpose solution that is also well-factored and readable. (Find Łukasz on GitHub and Twitter.)
Samuel Merritt’s solution is also commendable for its speed (ranked in the top 5) as well as its clarity and excellent word choice. (Find Samuel on GitHub and Twitter.)
Congratulations to everyone who participated! If you submitted your solution but were not able to claim your prize at PyCon, please email us and we’ll try to get your prize to you.
We had a great time with this challenge and hope to see you all again at next year’s PyCon in Montreal.