~/ me art writing

Graphs, Embeddings, and LLM-Generated Jokes

I started Recurse Center last week as a part of the Spring 2 2025 batch and am having a blast so far. This was my first mini project of the batch.

Ever since GPT-2 was announced, I’ve been curious whether or not LLMs can be funny. Somewhat surprisingly, the latest reasoning models are outclassing the majority of humans at mathematical problem solving, but at least anecdotally, they struggle with writing jokes and assessing humor. My guess is that this has to do with how challenging it is to quantify humor in the first place, making training for “funniness” a difficult thing to benchmark, let alone compute gradients for.

In this post, I talk about how I apply a theory of joke construction as a framework for LLM-generated humor. The approach takes us through some very light graph theory, text embeddings, and of course LLM prompting. In the end, the jokes are, dare I say, even a little bit funny.

A structure for jokes

A few months ago, I came across this article describing a theory and structure behind jokes. It doesn’t try to explain all of humor, but it provides a solid groundwork from which to write short one- or two-line lines jokes. I found the article fascinating, and was immediately curious if LLMs could leverage the ideas it presents.

The core argument of the post is that humor relies on subverting expectations. This can be restated as taking two unrelated subjects (the setup and the punchline) and joining them with some common aspect. This way, the setup and common aspect work together to simultaneously set expectations, and also break them (to a reasonable degree) using the punchline.

Diagram showing how the setup and punchline are joined by a common aspect.
The setup creates a pattern, which is continued by the common aspect. The punchline subverts that pattern.

Source: RPG Adventures

This is best shown with an example:

The joke: My grandfather had the heart of a lion, and a lifetime ban from the Central Park Zoo.
Source: RPG Adventures
The setup topic is 'bravery', the common aspect is 'heart of a lion', and the punchline topic is 'internal organ'.
Clearly, Bravery and Internal Organs are not closely related, but Heart of a Lion ties them together.

Source: RPG Adventures

This is clearly a promising way to start generating jokes. To begin, we need to start generating these “triplets” of setup, common aspect, and punchline.

Generating setups, punchlines, and common aspects

The obvious way generate these triplets automatically would be ask to an LLM. Unfortunately, creativity here isn’t their strong suit:

ChatGPT is asked to generate a triplet, but hardly deviates from the pattern it is shown.
Clearly it has the right idea, but to generate novel jokes we need a better approach.

Source: ChatGPT

Perhaps unsurprisingly, LLMs seem to struggle with both randomness and unexpected connections between topics. Instead, we can simplify the task and only ask them to find adjacent or similar topics, and then navigate the graph that’s formed to find unexpected connections:

ChatGPT finds words or phrases related to 'lemon'
Source: ChatGPT

Using this prompt, we can run a breadth-first-search by repeatedly asking our LLM to find more adjacent topics. This requires just a few tokens per query and is surprisingly inexpensive even using the large (non-reasoning) frontier models.

The result of running a BFS on repeatedly asking ChatGPT for similar topics.
After a search depth of three starting from "lemon" we are left with this small graph. Already we can see how some unrelated topics are joined by a common aspect, e.g. "golden retriever" and "lemon" by way of "yellow".

Finding unrelated setups and punchlines

Now that we have a graph of concepts, we can pick a random node, and look at the 2nd-degree neighbors to generate these setup, common aspect, and punchline triplets. However, without further guidance, this approach will often produce triplets like (“lemon”, “citrus”, “lime”), which clearly fails to subvert expectations. A triplet like (“lemon”, “yellow”, “golden retriever”) is a much stronger triplet. But how do we quantify that?

Luckily, this is where text-embeddings shine. Text-embeddings convert text into high-dimensional vectors, which unlike text, can easily be compared numerically. They are typically used to rank similarity between passages of text, augmenting search algorithms with the ability to find similar “concepts” rather than just matching keywords. In our case, we can search to minimize the vector similarity between setups and punchlines, relying on our graph to provide candidates linked by a common aspect.

Using our earlier example and cosine similarity yields:

dot(lemon, lime) = 0.8718...
dot(lemon, golden retriever) = 0.2144...

The jokes

Now that we can generate a setup, common aspect, and punchline, all that’s left is to provide them to an LLM and explain the joke writing structure. Doing this, I generated nearly one thousand jokes. Maybe ~10% were kind of funny. Here are some of my favorites: