Kurt Schlosser at Geekwire: How hard is the New York Times crossword? This is a description of the Puzzle Difficulty Index that Puzzazz, a puzzle solving app, has been calculating. Unsurprisingly, if you know anything about that puzzle, later-in-the-week crosswords take longer and are less frequently solved (with the exception that a few more people solve on Thursdays than Wednesdays, which I’d attribute to either noise or the fact that Thursday puzzles tend to have some sort of “gimmick” and are not just halfway between Wednesdays and Fridays). Both links are worth reading, although there’s some redundancy. I’ve thought for a while that this sort of thing would be possible if I had enough data.

The next frontier in this sort of analysis would be seeing which individual clues are the hardest – what do people solve immediately and what do they leave until the end, when they have a lot of crossing letters? I’m not sure if crossword constructors would be interested in this, although anecdotally they seem to be a mathy bunch…

Of course, all of this would be irrelevant if crosswords didn’t exist, and it’s not immediately obvious that enough different strings of letters make words that crosswords should be possible. In his book Information Theory, Inference, and Learning Algorithms, the late David MacKay analyzed this; here’s the relevant excerpt from that book (three-page PDF) and a more elaborated version of the analysis. This actually goes back to Shannon’s founding paper although he doesn’t give the detailed analysis. Shannon writes that:

A more detailed analysis shows that if we assume the constraints imposed by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when the redundancy is 50%.

Here “redundancy” has a specific information-theoretic meaning, and it turns out that the redundancy of English is just around 50%; MacKay’s analysis further shows that crosswords should be harder to construct (i. e. there should be fewer valid ways to fill in a given pattern of black and white squares) as words get longer.

Since I’m talking about crosswords, I’d be remiss if I didn’t point out the famous quote of Tukey:

Doing statistics is like doing crosswords except that one cannot know for sure whether one has found the solution.

Brillinger, in this paper memorializing Tukey, tells us that this quote or something like it came from books of crosswords which he gave to his students as gifts… but from which removed the answers!