Tech

‘The Goblins Came Back to Haunt Us’: OpenAI Explains How ChatGPT’s ‘Nerdy’ Personality Got Out of Control


Earlier this week, OpenAI posted a document on GitHub as part of the open-sourcing of its coding agent, Codex CLI, that revealed an unusual system prompt for GPT-5.5. The model was explicitly instructed, in coding contexts, to never talk about “goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures” unless it was “absolutely and unambiguously relevant” to a user’s request.

Now, OpenAI has finally explained why that oddly specific instruction, which appeared twice in the prompt, was so important in the first place.

For at least a year, some ChatGPT users have noticed the LLM’s quirky habit of bringing up goblins, gremlins, trolls, and other creatures in its answers. The weird tic apparently became more common as newer models rolled out.

Even OpenAI CEO Sam Altman referenced the issue in a post on X Monday morning.

“Feels like codex is having a ChatGPT moment,” Altman wrote. “I meant a goblin moment, sorry.”

That same day, OpenAI published a blog post explaining the strange behavior and how the company finally addressed it.

According to the post, OpenAI first became aware of the model’s goblin obsession with the release of GPT-5.1 in November. The company launched an internal investigation after users complained that the model had become overly familiar in its responses. A safety researcher suggested adding “goblin” and “gremlin” to the review after repeatedly encountering the words while using the model.

The company found that use of the word “goblin” in ChatGPT had jumped 175% after the launch of GPT-5.1, while mentions of “gremlin” rose 52%.

At the time, OpenAI apparently didn’t consider the behavior too concerning. But just a few months later, “the goblins came back to haunt us,” the company wrote in the blog post.

By March, with the release of GPT-5.4, references to the creatures had increased even further. Some users complained online that the word “goblin” was appearing in “almost every conversation.”

That prompted another internal analysis, which OpenAI says uncovered the root of the problem. The company found that references to these creatures were especially common in responses for users who selected the model’s “Nerdy” personality setting.

That personality included a system prompt instructing the model to “undercut pretension through playful use of language.”

OpenAI used its coding agent Codex to compare outputs generated during reinforcement learning training that included words like “goblin” and “gremlin” against outputs that did not. The company found that one reward signal favored responses containing mentions of these creatures, scoring them higher than otherwise similar answers that did not use those words.

Researchers also found that mentions of goblins, gremlins, and other creatures began spreading beyond the Nerdy personality.

“Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data,” the blog said.

To address the problem, OpenAI said it retired the Nerdy personality, removed the reward signal that favored goblin mentions, and filtered training data containing creature words.

Because GPT-5.5 had already begun training before the root cause was discovered, the newer model also had a strange obsession with goblins. OpenAI said it added the developer-prompt instruction, which some users later spotted in the model’s open source code, to help curb inappropriate mentions of goblins and gremlins.

“Depending on who you ask, the goblins are a delightful or annoying quirk of the model,” OpenAI wrote in the blog. “But they are also a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones.”



Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Popular

To Top