Introduction to Writing Good Puzzle Hunt Puzzles

Version 1.0 - 2014-02-24 - cleanup, add metapuzzle and solving tools sections
Version 0.9 - 2014-02-18

0. About This Document

This document is advice for writing puzzles. It is specifically geared toward the MIT Mystery Hunt, though the techniques and ideas will likely be useful in other puzzle-writing. The target audience is those who have participated in one or a few puzzle hunts, but not written a substantial number of puzzles before. I will attempt to explain not just some good practices, but (in the interest of "teaching how to fish") try to explain why they are good.

Most of the references to specific puzzles are from the 2014 MIT Mystery Hunt, for the simple reason that I'm the most familiar with those puzzles.

0a. Authorship, Influences, License

This document was originally written by David Wilson in 2014, though a lot of this content is my explanation of other people's advice that I've encountered. An incomplete list of influences:

Foggy Brume's posts giving tips on puzzle creation (see sec. 7)
Puzzlecraft and the ideas/philosophy of Selinker and Snyder (see sec. 7)
A puzzle-writing seminar given to my Mystery Hunt team by Chris Morse
Numerous in-person and email interactions with members of my Mystery Hunt team (whose 2014 name was [Alice Shrugged]), with special recognition to Erin Rhode and Dan Katz.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License.

0b. Spoiler Warning

This document CONTAINS SPOILERS for numerous puzzles in previous Mystery Hunts, mostly but not exclusively 2014. You have been warned.

1. Introduction: The Goal

Why do solvers participate in puzzle hunts?

The nominal answer might be "to find the coin" (in the Mystery Hunt), or "to win", or for a non-competitive team "to solve puzzles". But this is a surface goal. Every solver's ultimate goal is to have fun.

What makes an experience fun varies from solver to solver--on a large, competitive team, winning the entire hunt might be fun, whereas a noncompetitive team might want to figure out a few puzzles and get a sense of the hunt's storyline. Tastes will vary at the individual-puzzle level as well, but there are some common themes (which will be discussed below). Regardless, the puzzle-writer's ultimate goal should be to provide a fun experience for the solver. It is worth revisiting this for every puzzle, and every step--ask yourself, "Is this fun for the solver?" This should be the high-level principle of writing a puzzle.

Sometimes puzzle-writers see themselves as playing a game of wits against the solver. This analogy can hold, and some texts on puzzle-writing use this comparison. There is, however, one major, major caveat:

As puzzle-writer, you are ultimately playing to lose.

Anyone can write a puzzle no one else can solve. Here's one: the solver has to guess what number I'm thinking of. That's all they get--no hints. Obviously, that puzzle isn't very fun for the solver, but it does illustrate a point: you as puzzle-writer have every advantage in this battle of wits, since you get to define the rules of the game. But your goal is not to win the battle of wits, or to show that you're smarter than the solver. Your goal is to provide a challenge for the solver...but to let the solver emerge victorious, feeling accomplished. If that happens, the solver has fun. And that's how you win.

2. How to Start

A novice puzzle-writer's first instinct is to grab a topic they know a lot about, and say "I want to write a puzzle about Topic X" (where Topic X is their favorite academic subject, TV show, music genre, hobby, etc.). There's nothing wrong with that, and it's good to write puzzles in areas where you have a lot of knowledge. So this can easily be the starting point of writing a perfectly fine puzzle--but in terms of puzzle-writing, far more interesting and elegant puzzles come about if the puzzle mechanism is involved from the start.

2a. The Simplest Puzzle-Hunt Mechanism: ISIS Puzzles

One straightforward puzzle type that occurs in many, many puzzle-hunt puzzles is the ISIS puzzle (term coined by Foggy Brume). ISIS stands for "Identify, Sort, Index, Solve." So you're presented with a puzzle that has a bunch of clues/pictures/sound clips/etc., and you have to:

Identify what they are
Sort them into a logical order
Index into their name, title, or other identifier to obtain letters (etc.)
Solve by reading down the letters(/words/etc.) to get either the answer or a cluephrase.

tl;dr of ISIS puzzles: They're ok, but generally unexciting. If possible, try to come up with a more unique style of puzzle; if you do end up making an ISIS or ISIS-ish puzzle, take steps to make it fun.

Stereotypical examples of ISIS puzzles from the 2014 Hunt include:

A Puzzle with the Answer SULLIVAN - Identify the Simpsons and Doctor Who references, sort by Simpsons episode number, index into the title of the Simpsons episode by Doctor number, solve by solving the resulting cluephrase.
Zoinks! - Identify the Scooby-Doo episodes and monsters/villains, sort by episode number, index into the name of the villain, solve since the answer is spelled out.

It is not necessary to fit these four steps precisely in order to fit into the general ISIS category; if taken broadly, the category includes all puzzles where the bulk of the solver's time is spent identifying and the subsequent "puzzly" steps are relatively straightforward. For example, I would classify the following puzzles as ISIS as well:

Bumblebee Tune-A - Identify the songs, sort by year, realize that the puzzle refers to A-sides and B-sides, index into the B-side track, solve by taking the B-side of the spelled-out single.
Apocalypse - Identify the images in relation to the song lyrics, sort in lyrical order, solve by reading the filenames.
I Came Across A Japanese Rose Garden - Identify the nail polish names, translate the colors into resistor codes, solve the cluephrase.
The Circle of Life - Identify the characters and voice actors, index into the voice actor's name by where the line intersects the name, solve by reading out the answer.

There is nothing wrong with ISIS puzzles per se. ISIS puzzles are a tempting style of puzzle for novice writers, since they allow a large amount of freedom in subject matter while still being one of the simplest puzzle types to construct. However, they are a very common puzzle type, so despite potentially being about a wide range of topics, experienced puzzlers have seen the mechanism many, many times before. Thus, some solvers will view them as unexciting--the surface elements might vary, but the mechanism can get repetitive and unoriginal.

Thus, when writing, I'd recommend keeping the following in mind:

First of all, be aware of ISIS puzzles. Recognize them when you see them (either when you're writing or solving one).
I don't think it's necessary to completely avoid them. However, due to their relative commonality, don't overload on them. Particularly if there's a lot of time to spare, try to come up with a more interesting mechanism for your puzzles. One of the benefits of ISIS puzzles is that they are often faster to write than other types, so if you need a late-breaking replacement puzzle you can potentially step up and write one. (When seven puzzles missed a writing deadline in December 2013, a group of experienced writers on [Alice Shrugged] took an afternoon and wrote seven ISIS puzzles as backups to potentially replace them. Four or five of these backups made it into the 2014 Hunt, including Apocalypse, I Came Across a Japanese Rose Garden, and The Circle of Life.) So if you have time, try to come up with something different.
If you're writing an ISIS puzzle, make it fun.
- Generally, the identification step tends to be the most fun step, so the sorting/indexing/solving should be relatively easy and not grind-y.
- Ask yourself if the solver (who doesn't necessarily share your interests) would have fun doing the identification. This generally means that there shouldn't be too many things to identify (typically 15-20 is a reasonable number), and the process itself should be something interesting rather than trawling through Google.
- Throw in some small (but gettable) twists so that it isn't a straight ISIS puzzle. For example, Bumblebee Tune-A has an additional step of relating A-sides to B-sides.
- Make the presentation fun. Again, instead of simply referencing a song and a number, Bumblebee Tune-A has a number of copies of a person dressed as a bee buzzing the tune. Crow Facts is essentially just "identify the Game of Thrones quotes and index into the speaker's name", but because of the uniqueness of presentation (text messages in imitation of the "Cat Facts" meme) people generally found it enjoyable.

2b. Finding a Mechanism

As mentioned, if you want to write an interesting, non-ISIS puzzle, it generally makes for a better puzzle if you either start with the mechanism rather than the subject matter, or start with them side-by-side. Finding new, unique mechanisms for how a puzzle works can be difficult and requires creativity. Fortunately, there are plenty of sources of inspiration around. You can explicitly go look at past puzzles, and to some extent train yourself to look for mechanisms you encounter in daily life.

2bi. Past Puzzles

One good way to find puzzle mechanism ideas is to look back on previous puzzles. There is a large archive of Mystery Hunt puzzles at http://web.mit.edu/puzzle/www/huntsbyyear.html, and Devjoe (of Team Luck) has a categorical index at http://devjoe.appspot.com/huntindex/Hunt_Index.html . Furthermore, there are other sources of puzzles, from puzzle magazines to other puzzle hunts (BAPHL, DASH, MS College Puzzle Challenge, ...) to look to for inspiration.

Inspiration can come from individual mechanisms to overall "puzzle feel". Thus, a good option is to review past puzzles one finds enjoyable, and try to see if the presentation or mechanisms trigger any new ideas. A few examples:

One member of [Alice Shrugged], Quinn Mahoney, greatly enjoyed solving Whoa -- I Know Drawing (2003), a puzzle which involved figuring out what was being drawn on an Etch-A-Sketch based on video of someone's hands turning the knobs. He was inspired to write Covert Tops, which gives a video of hands manipulating an unseen Rubik's Cube.
The puzzle Set Match (2003) involved combining two elements from a trio to get a third word. This general concept went through a few permutations along the way, but served as a loose inspiration for Inscriptions (2004).

[Remember that plagiarism is not cool. Use old puzzles for inspiration, but try to supply your own take on them.]

2bii. Outside Inspirations

Another good source of puzzle mechanisms is looking at objects, occurrences, and events in daily life and thinking "How could this be used as a mechanism for a puzzle?" This is a difficult habit at first, but it becomes easier with practice. Examples:

Safety First was fairly simple puzzle-wise, but was distributed with the first-aid kits given to each team at kickoff. This was inspired by the previous Hunt organizers, after having to distribute first-aid kits, repeatedly telling teams "the first-aid kit is not a puzzle".
Feed Your Head went through multiple revisions before its final version, but the mechanism of attaching extra functional groups to chemicals, which then needed to be reassembled, came from the author's ideas as a chemist.
Edge of Your Seat FUN! (2012) had a mechanism that acted on words, but was inspired by Jenga. The words were obtained through solvers actually playing Jenga.
JFK SHAGS A SAD SLIM LASS (2012, and yes, that is the entire puzzle content) appears to have been inspired by a computer keyboard.
Oyster Card came from realizing that the circular signs indicating London Tube stops were reminiscent of the white and black circles in a Masyu (a type of logic puzzle). While Masyu is generally played on a grid, it could potentially be superimposed on a complicated graph...such as the Tube map.

Sometimes, the inspiration will come for the final extraction, and then the rest of the puzzle is built to match.

Any group of 26 items is a prime candidate for indexing into the alphabet (particularly if they have a canonical order). Cruciform Heraldry uses the 26 cantons of Switzerland, and the White Queen metapuzzle uses the 26 tracks of the Beatles' Red Album compilation.
The inspiration can come from answer words. The puzzle Initial Impressions came about from noting that AA MILNE was an answer word, and someone's casual comment that this was almost an anagram of "animal". Once this final-extraction was established, the rest of the puzzle was built to match.

2biii. Other People

Collaboration can be very useful. If you have half an idea for a puzzle mechanism, someone else might be able to come up with the other half. (Particularly if you and your coauthor have complementary skills.) This is especially true if you have a general idea for a puzzle but don't have the skill to write it yourself. More generally, as with any creative endeavor, brainstorming with other people can often be productive.

Now Let's Create Melodies was proposed by one author on the musical side, and then another author determined how the extraction would work.
Falling Into Place required a significant amount of experience in both game programming and crossword construction in order to write; it is likely that neither author could have written the puzzle on their own.

3. Length and Difficulty

Good puzzles can be long, short, easy, or hard--but this is all in the context of the solver's expectations, knowledge, and mindset. Good puzzles should be fun, and what's fun is determined by the solver's experience. Similarly, sub-parts of puzzles can feel "too long", to the detriment of the puzzle as a whole.

3a. Difficult versus obtuse puzzles

Obfuscation is not the same thing as difficulty. If a puzzle has a ton of information, the solver will attempt to identify, record, and find patterns in all of it. If only a small amount of the information is relevant to the puzzle, not only is there potential for going down wrong paths and finding false patterns (which is frustrating for solvers), but the puzzle becomes an exercise in "guess what the author is thinking". That is, there are many potential ways that the puzzle could have been written; from the solver's perspective there's no real way to know which one is the "right" one. ("Guess what the author is thinking" can occur in many capacities, and doesn't necessarily require information overload--merely a large number of viable possibilities with no indication as to which way to go.)

A classic bad example is the puzzle Taipei from 1999 - the solution as described makes sense in terms of how to take steps in order to get the answer, but there is no indication to the solver that these are the correct steps to take (rather than any number of other arbitrary possibilities).

The puzzle author should never intentionally lead a solver down a false path, and should make an effort to eliminate false paths as much as possible.

3b. Making Puzzles Less Obtuse

There are several strategies for making sure puzzles don't fall into the "obtuse" category:

Cutting out any extraneous information in the puzzle (see also "Elegance" below). This eliminates potential unintended false paths, as well as implicitly directing solvers to the right one.
Make steps that are not the central "aha" of the puzzle fairly straightforward. For example, the clues with blanks in the Mock Turtle metapuzzle were originally far more ambiguous, with the crucial "aha" as "each of the clues can be answered by a word that's almost a Wonderland character." However, during testsolving, this ambiguity seemed to obfuscate rather than puzzle, and the clues were changed to have fairly unique answers (with the aha moment simply as "each of these answers is almost a Wonderland character").
Add clues to indicate the correct way to go; however, note that these clues are generally less apparent to the solver than the author thinks they are. Thus, more explicit cluing and/or overcluing the correct idea is far preferable to undercluing. For the 2014 Mystery Hunt, [Alice Shrugged] had a rule that flavortext should not be necessary to solve any puzzle, and a general guideline to discourage hints in flavortext in favor of explicit instructions.
A specific case of the above is "help people get started". If a solver sees an obvious first step, there's a greater chance they'll be "hooked" on the puzzle and end up spending more time on it.
Add "checkpoints" where the solver gets some sort of feedback that they are doing the right thing. This can vary in explicitness; Duck Konundra (e.g.) are long and complicated enough that there are often numerous explicit checks in the puzzle text itself. Less blatantly, The Great Collapse first clues folding three of the four squares together on the dotted lines by highlighting a five-letter block that would overlap when folded; when the solver adds the letters (by taking their numerical sum modulo 26), they immediately get the word TOTAL in the highlighted space, confirming that this is the correct step. A standard crossword is basically all mini-"checks"--every time letters match in an Across and Down answer, the solver becomes more confident that both answers are correct; if they don't match, the solver immediately knows that something is wrong.

3c. Frustration

Solvers get frustrated when they don't seem to be making forward progress on solving a puzzle. Frequently this is due to not knowing what to do next, or doing a lot of work and then not getting anything sensible out of it. While you don't want the puzzle to be boring, an easy puzzle is better than a frustrating one. Err on the side of adding more--and more blatant--clues as to how to solve the puzzle. Similarly, this goes along with obfuscation--remove extra information that's not needed. If a solver is stuck, they should be able to ask, "What information in the puzzle haven't I used yet?" and get a useful answer.

3d. Tedium

A separate problem is tedium--the solver knows what to do, but it's a lot of work that the solver doesn't find particularly fun. To avoid this, make sure the puzzles have a reasonable number of sub-parts, and a reasonable length overall.

Puzzles generally have "aha" steps and "grind" stages. But when it comes to small sub-parts of puzzles, they generally find themselves somewhere on a scale between the two. The more "grindy" the sub-parts are, the less time you want the solver to spend on them, since their fun comes from the "aha". You can reduce the time spent both by reducing the number of clues and by making them easier.

Some examples:

A standard crossword can have over 100 clues. This is a lot, but is generally ok since solving each individual clue makes the solver feel accomplished. (In addition, the answers are not independent of each other, which helps each clue be more of a puzzle than a lookup.)
For independent clues such as identification of images or songs, the number should generally be two dozen or less, depending on the difficulty of identification. The Circle of Life goes as high as 30, though identifying Disney characters is on the easy end of the identification scale. If steps have been taken to avoid automated tools such as Tineye/Google Reverse Image Search or the various song-identifying apps, it's wise to be stricter about the limit.
If the typical solver will solve your clues handily by simply looking up facts on Google/IMDb/etc., understand that simply looking things up is "grind" and not "aha". It's ok to have these, just limit the number so that solvers are done with this step before they get bored. If I need to look up 15 actors on IMDb to grab a fact about each, I'm fine doing that. If I need to look up 50, I'll be annoyed.

3e. Triviality

Finally, each puzzle should have one "aha" step. (There are a few Mystery Hunt traditions, such as the scavenger hunt, that might not. However, note that the 2014 "bring us food" puzzle--A Puzzle with the Answer GARCIAPARRA--included some minor puzzle content so that it wasn't only "bring us food", although hopefully not enough to seriously slow down a team.) To quote Foggy Brume, "No a-has is dull, one a-ha is fun, two a-has is a stretch, and three a-has is a slog." If the puzzle doesn't really have any "aha" step, solvers in a puzzle-hunt won't find it very fun. (Solving a puzzle should make the solver feel good about themselves. If they feel that every step of the puzzle was trivial, they won't feel very accomplished.)

3f. Solving Tools

Related to frustration, tedium, and triviality: Know what solving tools teams will have available to them, make plans for them one way or the other. Technology advances rapidly, and a puzzle might go from challenging to trivial in a few years due to advances in the tools available. If your puzzle involves photos, make sure it's not trivially solvable with Tineye/Google Reverse Image Search. If your puzzle involves music, make sure it's not trivially solvable with Shazam. In general, assume that top teams will make full use of Google, Wolfram Alpha, automated crossword/sudoku/anagram/etc solvers, cloud-based computing services, smartphone applications, and the like.

You can work around these tools in three ways:

Avoid writing puzzles where tools might be of use: A bit hard to predict a few years out, but you can probably make reasonably sure there aren't automated shortcuts to your puzzle for the Hunt itself.
Change the presentation to foil the automated tools: One of the motivations for buzzing/humming the songs in Bumblebee Tune-a was to evade automated song-identification tools. Be careful doing this, though--you still want the puzzle to be human-solvable! (The puzzle Good Neighbors from 2012 may have gone too far in this direction--obscuring the images fooled automated image-recognition tools, but also made things extremely difficult for human solvers.)
Assume teams will use the tools: You can also use the tools to your advantage, by making a puzzle that explores teams' use of technology. In this way, either the step involving the tool isn't the interesting step of the puzzle, or you find a way to make the use of the tool itself interesting. Stalk Us Maybe was a web-stalking puzzle that assumes the solvers can track down people on the Web and social media. Numerous puzzles assume the solver can readily look up facts on Wikipedia or IMDb. If your puzzle requires use of a tool, though, make sure that it's one you can reasonably expect all teams to use (even small noncompetitive teams). A search engine, or resources that can be found by searching, are fine. But not every team will think to feed 50 music clips into Shazam, so this poses the dangers of frustration/tedium as they manually take on the task that a large experienced team might automate.

4. Elegance

On a more positive note, a puzzle author should strive for elegance. Unfortunately, elegance in general is not easy to define. It's generally a feeling that the puzzle is a coherent whole, is internally consistent, and fits appropriately with the larger Hunt--and that each subpart of the puzzle is coherent, consistent, and fits appropriately with the puzzle as a whole.

There's no single metric for puzzle elegance, but there are a few patterns that tend to contribute to it. Depending on the type of puzzle, not all of these may be relevant, but these are properties that one should try to incorporate if they make sense with the puzzle type.

4a. No extra information

An elegant puzzle wastes nothing (or, as little as possible). Every word contributes to its clue. Every clue contributes to the final answer. The solver doesn't have to spend time sorting out relevant from irrelevant information--all of the information is relevant. If the solver is stuck on what to do, they should able to look at what they haven't used yet, and use it.

This is true for everything in the puzzle, but especially true for flavortext. Make an effort to minimize the flavortext, or else solvers will find unintended "clues" in it and any intentional clues will be buried. An earlier version of Compose Yourself had a different title and included flavor text; testsolvers found false paths in the text and suggested that the only necessary part was a clue to composers. The title was changed to reflect the "compose" clue, and the flavortext was otherwise eliminated.

4b. Hierarchical structure

If a puzzle has multiple "levels", it is elegant to repeat the same operation at each level. (This is related to there being a single aha, rather than one per level.)

In many puzzles, the solver gets a bunch of items (whether words, images, clues, etc.) and performs some transformation on them individually to extract something (perhaps a word from each group of clues, etc.) An elegant puzzle then uses the same transformation on these newly-extracted values as a group in order to extract the final answer.

Examples:

Operator Test - The solver gets a list of crossword-style clues in groups, with the highlighted clue leading to the name of a previous puzzle. They use the extraction method from each of the puzzles to get a word, and the words collectively form...a crossword-style clue leading to the name of a previous puzzle. Using that puzzle's extraction mechanism yields the answer.
The Great Collapse - The solver takes the following steps (approximately): Add letters mod 26, generate a new word that's a homonym of one of their words and a synonym of another, add the letters of those words mod 26, find new words that are homonyms/synonyms, and add the letters of those words mod 26.
Opposites Are Not Downbeats - Each of the clues is a false-negation (cluing two words which are not opposites but look like they might be simply from their structure). Extracting letters yields another clue that is also a false-negation; solving that clue yields the answer.

4c. Incorporation of theme

A strong contributor of elegance is the relation of a puzzle's various parts (text, clues, mechanism, answer word, etc.) to each other, and the relation of the puzzle to the Hunt as a whole. It's very hard if not impossible to have all of these relate to each other, but the more a puzzle feels like it "fits together", the more elegant it is.

4ci. Relation of puzzle content/flavor to mechanism

If a puzzle's solving mechanism in some way mirrors the topic, initial step, etc., it feels very cohesive.

Edge of Your Seat FUN! (2012) had solvers obtain words by playing a physical game of Jenga. The operation they were then to perform on the words? Taking a letter from the middle of the word and putting it at the front.
SKI Trees is topically about ski trails. How does it get a number to index into the ski resort name? Using SKI combinator calculus.

4cii. Relation of puzzle content/flavor to clues/subparts

If there's an overall theme to the puzzle, it's nice if every subpart contributes to that theme. This both feels elegant and can help in cluing (if knowledge of the theme is useful for the aha).

Captain's Log (2014 backup puzzle) not only clues Star Trek with the title, star-shaped grids, and stardates, but every single clue has some sort of Star Trek reference.
It wasn't necessary for the BEEs to exist or to "cross-pollinate" the grids-turned-into-origami-flowers in Cross-Pollination, but it was nice.

4ciii. Relation of puzzle content/flavor to answer word

A solver should know they have the answer when they see it, and ideally say "oh, that totally makes sense as the answer". If the puzzle flavor relates to a certain topic, it's helpful if the cluephrase/answer also relates to that topic. Since the puzzle is typically written in order to work for a specific answer, sometimes the answer word will inspire an entire puzzle; sometimes a puzzle mechanism comes first and the topic is added on top of the mechanism later.

Stronghold Fire initially had the Clabbers mechanism, with the idea to simply make a high-scoring play. Once it received the answer WORMTONGUE (a character who is, after all, someone who twists words), the author (with the help of the editors) tried to make it more Lord of the Rings-flavored. This included the puzzle title (an anagram of "Lord of the Rings"), making all of the winning words "phonies" from LOTR, and including many LOTR-related words on the given boards.
The cluephrase to the baseball-related puzzle Round Tripper is a piece of baseball trivia with the answer TED WILLIAMS. The answer ("title") to the curling-related puzzle A Puzzle with the Answer A HARD DAYS NIGHT is ROCK, a term for the stone used in curling.
The Puzzle Your Puzzle Could Smell Like, which contains a bunch of Old Spice video parodies, has the answer TESTOSTERONE.
Black and White intersects the Alice in Wonderland passage where Alice shrinks with the lyrics to "Never Grow Up" ("...just stay this little..."), and the passage where Alice grows with the lyrics to "Mean" ("...Someday I'll be big enough..."). The answer is BODY SIZE.

4civ. Relation of puzzle content to the round/Hunt theme

Just as a puzzle is more elegant if all the pieces (clues, presentation, answer word, etc.) coherently fit together with each other, a round--or a Hunt--feels more elegant the more the puzzles feel like they're a coherent part of a larger structure. If it's possible--and makes sense with the Hunt story--for a puzzle to have a Hunt-thematic "feel", that helps contribute to the narrative experience of the entire Hunt.

For the 2014 Hunt, we had Wonderland leaking into MIT, and teams eventually exploring Wonderland. Since Wonderland is a large and varied world, each puzzle that used the theme became a little anecdote that the solvers encountered. None of the following puzzles needed to be Wonderland-themed, but adding that flavor hopefully made the puzzles feel more like a part of the Hunt as a whole, and made the solvers better appreciate the Hunt theme.

A Mad Cocktail-Party could have featured any set of characters, but why not Wonderland B-list celebrities?
Marking Territory is a variant of standard logic puzzle types, but why not theme it as a battle for territory between the Lion and the Unicorn (from Through the Looking-Glass)?
Callooh Callay, World! could have used any symbols, but why not nonsense words, cards, and chess pieces?

4cv. Case Study: The Mega Man supermeta (2011)

Metapuzzles should perform the most work in contributing to the theme. Their position and structure within the Hunt means that they necessarily tend to have some bearing on the overall theme/plot of the Hunt; unlike normal puzzles, they typically themselves define a part of the story. However, some of the general principles for thematic cohesion can be applied to normal puzzles as well.

One very-thematic metapuzzle is the Mega Man supermeta from 2011. In the Mega Man games, defeating a robot allows you to take that robot's weapon, which you can then use. While you can defeat any of the robots using Mega Man's starting weapon, the robots are frequently vulnerable to specific weapons obtained from the other robots. Thus, in the games, a good strategy is to know which robot's weapon to use against which other robot.

Consistent with the overall Mega Man round theme, each of the robots drops a weapon (i.e. that is their answer phrase).
Furthermore, there is internal consistency from the subparts (submetas): the weapons are (generally) thematically appropriate weapons for that robot to have. Bio Man has a DNA DESTROYER. Stagecraft Man has a WORD SWORD, the gambling-themed Craps Man has an ODDS FINDER, etc.
The mechanism is consistent with the Mega Man theme: the solvers use these weapons on the other robots, by interpreting each "weapon" as a letter transform, and applying it to one of the other robot's names. The DNA DESTROYER eliminates all As, Cs, Gs, and Ts, turning "Stagecraft" into "Serf". The WORD SWORD adds an S to the front, turning "Craps" into "Scraps". The ODDS FINDER takes the odd letters, turning "Blackberry" into "Baker", etc.
The final answer is thematic: Dr. Wily is the final boss of the Mega Man games, and the villain of the round; he creates various robots, and hid the star fragment (the MacGuffin) in a robotic canine--a WILY COYOTE.

4d. Answers

When writing a puzzle first, you will (with few exceptions) have an answer first and write the puzzle to match. In this case you should think about the process of discovering the answer, from the solver's perspective. If it's a common word that's thematically related to the puzzle, it's fine if it's spelled out letter by letter. If the mechanism you have works letter-by-letter but the answer is obscure or has strange spelling (such that people won't be able to get it if they have a few letters wrong), consider spelling out a cluephrase instead.

Cluephrases and other "final steps" not related to the crucial aha of the puzzle can be testsolved independently from the main puzzle. It's useful to do this to make sure that the cluephrase actually yields the answer before doing all the work to write a puzzle around the cluephrase.

Regardless of how the solver gets to the answer, make sure they know it when they see it. Put yourself in the solver's shoes and walk through the solving process (testsolvers are useful here as well, but you can do this before the puzzle is fully-written). In the ideal case, the various tips on elegance above will collectively imply that when they get the answer, the solving process should feel "finished", there's nothing else left for them to do, and they have a word or phrase in front of them that feels thematic as an answer to the puzzle. That said, it's not reasonable to expect every puzzle to live up to this "ideal case"--but a rule of thumb is that if a solver legitimately solves the a strong majority of your puzzle (as opposed to calling in an early guess), they should be able to call in the answer word, already having a very high confidence that it's correct. If the answer is obscure or unthematic, consider either a cluephrase or an "answerphrase" (e.g. though the names in The Circle of Life were written in wavy lines, the answer word was otherwise unthematic, so the extracted phrase is ANSWER SQUIGGLES).

5. Metapuzzles

Metapuzzles are puzzles that incorporate the answers to other puzzles into their structure. They are frequently used as the "capstone" to a particular round; the goal of a round is generally to solve the metapuzzle rather than to solve all the individual puzzles. (This doesn't have to be the case depending on the hunt structure, but is extremely common.) A lot of the general advice for metapuzzles is the same as for regular puzzles, except "more so"--metas are the usually the main vehicle for advancing the overall story and establishing the theme and "feel" of a Hunt, so it's important to make sure that they contribute in those ways.

5a. Types of metapuzzles

Metapuzzles fall into two main categories: "pure" metas and "shell" metas.

Pure metas do not require a "metapuzzle page"; the only information needed to solve the meta is the answers--and perhaps some ancillary information--from the constituent puzzles. All three of the metas in the MIT round of the 2014 Hunt are pure metas. The solutions to the Clubs, Diamonds, and Spades metas only involved the answers from the round puzzles, the playing cards associated with each puzzle, and perhaps (for the Diamonds) the puzzle titles. Pure metas thus necessarily tend to be based on fairly direct letter- and word-play.

Shell metas have a framework (the "shell") specifically for the metapuzzle itself, that the other puzzle answers fit into. The shell is frequently presented on its own page. All six of the Wonderland metas in the 2014 Hunt are shell metas. Shell metas offer a potentially broader class of mechanisms for one's metapuzzle (e.g., embedding a chess puzzle or a game of I Spy into the meta).

Both pure and shell metas are viable meta strategies; shell metas potentially allow more creativity, but there's something to be said for the simple elegance of a pure meta that the solver can get purely from round-puzzle answers. The breakdown thus tends to be by individual taste, though it's worth coordinating with other meta-writers to make sure that the Hunt, overall, has a consistent "feel" in terms of its metas. (As mentioned, in 2014 all of the MIT metas were pure and all of the Wonderland metas were shells. This aided consistency of feel and presentation within each "world", as well as contributing to a sense of difference between the two, even if most solvers didn't necessarily know exactly what caused it.)

5b. Writing a Metapuzzle - Theme

As mentioned, metas are story-heavy, so story should be the starting place when writing a meta. What's the "plot" of the Hunt as a whole? What is solving this meta supposed to represent to the solver? The meta-answer is, Hunt-wise, a major step the solver is making toward the end of the Hunt--does it feel that way to them story-wise?

So, moreso than other puzzles, try to make the puzzle incorporate the theme (sec. 4c) in as many ways as possible. The answer word/phrase should be a major thematic point that advances the plot of the Hunt. The mechanism should be as thematic as possible.

5c. Writing a Metapuzzle - Answer Words

Metapuzzles should generally be solvable with about three-fourths of the round puzzle answers. It definitely shouldn't require all of them (sometimes, despite best efforts, round puzzles are broken in various ways--or just much longer/harder than anticipated--and you don't want that to block the metapuzzle). If it requires less than about 60%, then it might be too easy--solvers might get it quickly and never open some of the puzzles in the round. If a group of skilled meta-solvers can solve the puzzle with 66-75% of the answers, you're probably in good shape. (Relatedly: Testsolving is even more important for metapuzzles than it is for regular puzzles.)

The meta also defines constraints on the answer words it uses (typically, all the answer words in one round). In many cases, the meta fully defines the answer words. Since this defines the answers for the "regular" puzzles, is important to make these answer words reasonable. (The 2004 Hunt, run by an earlier incarnation of [Alice Shrugged], was notorious for having poor answer words, a number of which were foreign, archaic, or extremely obscure. In addition to RECTION, KLAKRING, and HUERFANA, one of the answers was BABEWYNERY, which as of this writing has four hits on Google--two of which are from the 2004 Hunt.) Make things easier for the round-puzzle authors--every answer should be a plausible answer word/phrase.

6. Miscellaneous Other Dos and Don'ts

Don't use unclued (random) anagrams.
Do use logical reorderings.

Random anagrams are very inelegant; they're more an obfuscation technique than a puzzle per se (see sec. 3a).

Don't present clues, etc. in random order.
Do present clues in some sort of order; if the ordering is unimportant or reordering them is part of the puzzle, present them in alphabetical or some other totally-obvious ordering.

Remember that solvers will try to find patterns in, and glean meaning from, everything you present to them. Presenting clues in random order not only introduces the possibility for red herrings (and red herrings in random data creep in quite a bit more frequently than you'd expect), but also potentially wastes the solver's time as they try to search for meaning. If the presented order doesn't matter, then e.g. alphabetizing will instantly tell the solver "the presented order doesn't matter". This is a good thing--remember, eliminate false paths.

Don't edit puzzles to make them harder.
Do edit puzzles to make them easier, if necessary.

If a testsolver solves the puzzle quickly, it doesn't necessarily mean that the puzzle is too easy--it may just mean that the solver happens to be on the same mental wavelength as the author. In addition, taking an existing fully-formed puzzle and making it harder typically involves adding obfuscating material rather than material that's interesting puzzle-wise. It's much easier to add hints and clues (or cut out steps) in a way that doesn't detract from the essence of the puzzle. Remember the goal of solver fun: an easy but well-constructed puzzle will still be fun for the solver, and a Hunt containing some easy puzzles will be more fun for less experienced teams. Even an experienced, competitive team will have more fun solving an easy but elegant puzzle than a difficult but clunkier puzzle.

Don't go it alone.
Do use coauthors, editors, and testsolvers to the fullest extent--and in particular, listen to puzzle feedback from these people.

This is getting into the meta-process of puzzle creation, but all puzzles should be testsolved (preferably multiple times), and have someone (preferably several people) go over the full puzzle and solution as an editor. For the 2014 Hunt, every puzzle had three experienced puzzlers assigned as editors (even puzzles where the author was also experienced). These editors were involved and gave feedback at every step--from the initial idea to the first draft to incorporating testsolver feedback to looking over the final product. Every puzzle was also required to be testsolved twice, independently, in a reasonable amount of time, in order to make it into the Hunt. Editors are typically experienced and can help you avoid some of the pitfalls, both in this document and in general. Testsolvers are your model for how actual solvers will react to your puzzle. Listen to both of these groups!

7. References

Foggy Brume's tips for puzzle-writing: Part 1, Part 2, Part 3

The book Puzzlecraft by Selinker and Snyder is an excellent resource (and is also inexpensive). It covers how to write a wide breadth of standard puzzle types. It doesn't go in detail about puzzle-hunt-style puzzles, but it's useful if you want to incorporate a standard type as a part of a Hunt puzzle. It also provides good philosophical intuition, and some of the listings of puzzle types may give you ideas for mechanisms for Hunt puzzles. Amazon link