Location: Coral Reef
Depth: 131

Solution to The 10,000 Puzzle Pyramid

concept by Charles Steinhardt; implementation by Joshua Oreman with assistance from Brian Chen

Answer: ROBERT E LEE

The puzzle is a zip archive that, when extracted, produces a tree of 10101 files: a wordlist, 50 "detailed" examples, 50 "normal" examples, and 10,000 puzzles indexed by row and column.

pyramid/
  words.txt
  examples/
    detailed_0.txt
    detailed_1.txt
    ...
    detailed_49.txt
    normal_0.txt
    normal_1.txt
    ...
    normal_49.txt
  row0/
    row0_col62.txt
    ...
    row0_col79.txt
  row1/
    row1_col61.txt
    ...
    row1_col79.txt
  row2/
    row2_col61.txt
    ...
    row2_col80.txt
  ...
  row123/
    row123_col0.txt
    ...
    row123_col140.txt
  row124/
    row124_col0.txt
    ...
    row124_col141.txt

The 0-indexed rows and columns form a pyramidal structure embedded within a 142x125 grid: the bottom row (row 124) has 142 puzzles, the next one up (row 123) has 141, ..., the top row (row 0) has 18 puzzles. As one goes up the pyramid (to lower-numbered rows) the current row always has one fewer puzzle than the previous row had, with the missing puzzle taken from either the left side or the right side, alternating.

The wordlist contains 108286 words and is otherwise unremarkable.

Each of the "detailed" examples contains a number of clues that describe a word from the wordlist, which is given to you (because it's an example). For example, detailed_44.txt tells you:

True statements about CAGINESS:

Base Scrabble score: 11 points
Can be Caesar shifted to produce another word in the word list: NO
Can be combined with one additional letter to produce an anagram of something in the word list: YES
Can be combined with two additional letters to produce an anagram of something in the word list: YES
Contains at least one doubled letter: YES
Contains at least two different doubled letters: NO
Contains at least two nonoverlapping occurrences of the same doubled letter: NO
Distinct consonants: 4
Has at least one anagram that is also in the word list: NO
If you marked nonoverlapping US state postal abbreviations, you could mark at most: exactly 50.0% of the letters
Length: 8 letters
Letters located in the bottom row on a QWERTY keyboard: exactly 25.0% of the letters
Most common consonant(s) each account(s) for: between 23.8% and 25.0% (inclusive) of the letters
SHA-1 hash of lowercased word, expressed in hexadecimal, starts with: 2B0E
Starts with a vowel: NO
Starts with: CAGI
Sum of letters (A=1, B=2, etc) is divisible by 2: NO
Sum of letters (A=1, B=2, etc) is divisible by 3: NO
Sum of letters (A=1, B=2, etc) is divisible by 5: NO
Sum of letters (A=1, B=2, etc) is divisible by 7: YES
Sum of letters (A=1, B=2, etc): 77
Vowels: exactly 37.5% of the letters
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 2: YES
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 3: YES
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 5: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 7: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is exactly representable in IEEE 754 double-precision floating point format: YES
Word interpreted as a base 26 number (A=0, B=1, etc) is exactly representable in IEEE 754 single-precision floating point format: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is representable as an unsigned 32-bit integer: NO

Each of the "normal" examples contains far fewer clues, just enough to uniquely identify the given word. They range in length from normal_46.txt:

Some statements that uniquely identify CONSUME:

Contains: SU
If you marked nonoverlapping US state postal abbreviations, you could mark at most: between 57.1% and 57.2% (inclusive) of the letters
SHA-1 hash of lowercased word, expressed in hexadecimal, contains: 3D2

to normal_5.txt:

Some statements that uniquely identify BAGWORM:

Base Scrabble score: 15 points
Can be combined with one additional letter to produce an anagram of something in the word list: YES
Can be combined with two additional letters to produce an anagram of something in the word list: NO
Contains at least one doubled letter: NO
SHA-1 hash of lowercased word, expressed in hexadecimal, contains: 5B
Sum of letters (A=1, B=2, etc) is divisible by 2: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 3: YES
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 5: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is divisible by 7: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is exactly representable in IEEE 754 single-precision floating point format: NO
Word interpreted as a base 26 number (A=0, B=1, etc) is representable as an unsigned 32-bit integer: YES

Each puzzle looks like one of the "normal" examples: a smallish set of clues that, taken together, uniquely identify one word from the wordlist. Based on the examples and puzzles, solvers can develop a program to interpret the clues and determine what word each puzzle is describing.

Some of the puzzles have an informational section, separated from the clues by a blank line. This section does not constrain the solution of the puzzle (the puzzle has a unique solution before considering the things in the informational section); instead, it relates the solution of the puzzle to various nonsense keywords, which happen to all be names of Egyptian pharaohs who have pyramids or locations where Egyptian pyramids can be found (according to Wikipedia). The nonsense keywords are not in the word list. The given facts can be used to determine what the nonsense keywords mean. They look like:

As you can tell from this list, the nonsense keywords are divided into three nonoverlapping groups: "concepts", "colors", and "properties". Properties map to some straightforward property, determinable by inspection and implementable programmatically, that any given word will either have or not have. They are things like "contains at least three of the same vowel" or "all letters are in the top row of a QWERTY keyboard"; the complete list is in the Reference section below. Each property has about 50 examples and about 50 counterexamples in the pyramid. Colors map to one of the 16 basic colors defined by HTML and Windows, and are each clued by four words that together can only reasonably identify one particular color name. Concepts map to the word "color", "then", or "everything", and are clued by an appropriate combination of words. The complete lists are at the end of the solution.

About 10% of the puzzles use the nonsense-keyword properties (defined by their appearance informationally in other puzzles) as clues that constrain the answer, so you can't solve them until you figure out what those properties mean. (You never need to know what a property means in order to solve a puzzle that will tell you what a property means.) For example, row 10 column 63 says:

Base Scrabble score: between 14 and 28 (inclusive) points
Contains: M
Has property AMENEMHAT: YES
Word interpreted as a base 26 number (A=0, B=1, etc) is representable as an unsigned 32-bit integer: YES

The other clues narrow it down to about 892 words; you have to know that AMENEMHAT means "starts with the letter U" to discover that the answer of row 10 column 63 is UNMIXT.

Once the nonsense keywords for properties are understood and implemented, all puzzles should have unique solutions except for a span of 79 puzzles in the middle of the bottom row of the pyramid (row 124 columns 31 through 109), which each have a set of clues that is not satisfied by any word in the wordlist. As indicated by the flavortext, this region of the pyramid is special: it tells you how to find the "treasure" (the answer to the overall puzzle). It turns out that each of these puzzles is satisfied by one of the nonsense keywords themselves. Reading (from left to right) the meanings of the keywords that solve these special puzzles produces instructions for solving the overall puzzle.

The keywords are:

SENUSRET SEKHEMKHET NIUSERRE KHUFU SENUSRET BIKHERIS ILLAHUN KHUFU SENUSRET QAKAREIBI USERKAF KHUFU SENUSRET ABUSIR SAQQARA KHUFU SENUSRET NURI KHABA KHUFU SENUSRET SETHKA DJEDKAREISESI KHUFU SENUSRET AMENEMHAT TETI KHUFU SENUSRET MEIDUM SNEFERU KHUFU SENUSRET LISHT NIUSERRE KHUFU SENUSRET HAWARA TETI KHUFU SENUSRET PEPI USERKAF KHUFU SENUSRET MENKAURE DJEDEFRE KHUFU SENUSRET AMENYQEMAU SNEFERU KHUFU SENUSRET MAZGHUNA DJEDKAREISESI KHUFU SENUSRET KHUI DAHSHUR KHUFU SENUSRET SOBEKNEFERU ILLAHUN KHUFU SENUSRET NEFEREFRE KHABA KHUFU SENUSRET UNAS SAQQARA KHUFU SENUSRET DJOSER DJEDEFRE KHUFU SENUSRET MERENRE DAHSHUR

which means:

color everything black then
color words containing SH maroon then
color words that start with B white then
color words that start with EX red then
color words whose first two letters equal their last two letters yellow then
color words ending in ED or ING silver then
color words that start with U purple then
color words with one kind of vowel navy then
color words with no vowels black then
color words with at least 3 copies of any one consonant purple then
color words with at least 3 copies of any one vowel white then
color words whose letters appear in sorted order teal then
color words whose first letter matches their last letter navy then
color words containing a doubled letter silver then
color words ending in S gray then
color words that are 50% vowels or greater maroon then
color words that can be typed with only the top row of a QWERTY keyboard yellow then
color words that contain at least one instance of all five vowels red then
color words containing CH teal then
color words that alternate consonants and vowels gray

Treating each puzzle as a pixel, and following those instructions in order, produces the following image:

As indicated by the Confederate flag and the fill-in-the-blanks pattern, this image clues the answer ROBERT E LEE.

(Note: Each color used in this puzzle has a defined hex value in the standard 16-color palette used by HTML and Windows, and these are the values used in constructing the image above, but getting the colors wrong doesn't adversely impact the recognizability of the image too poorly.)

Reference

This section contains the details you would discover through solving the puzzle — what everything means, the finer points of the semantics of the clues, etc — that are not necessary to know if you simply want to understand the high-level concepts involved in solving it.

Clue format

Each clue takes up one line. Clues contain a fixed part and a variable part, separated by a colon: fixed: variable. The various fixed parts are described in the "Clues" section, just below. The format of the variable part depends on the type of clue. There are three types:

Clues

In general, when a clue has multiple reasonable interpretations, it has been verified that only the intended interpretation matches all the examples.

All colors and concepts

Concepts:

Colors: The fact that there are 16 of them, and the strange words used to uniquely identify some of them, should hint that these each have a specific correct RGB value. Even if solvers get that wrong, though, the final image will probably come out recognizable. Not all 16 colors are used in the image; only 10 are.

All properties

All words

JSON of the entire pyramid (array of 125 rows, each of which is an array of 142 words; locations outside the pyramid contain "").

Sample solution implementation

code.zip (750ish lines of Python)