Skip to content
boss mode

New Grok AI model surprises experts by checking Elon Musk’s views before answering

Grok 4's "reasoning" shows cases where the chatbot consults Musk posts to answer divisive questions.

Benj Edwards | 107
Story text

An AI model launched last week appears to have shipped with an unexpected occasional behavior: checking what its owner thinks first.

On Friday, independent AI researcher Simon Willison documented that xAI's new Grok 4 model searches for Elon Musk's opinions on X (formerly Twitter) when asked about controversial topics. The discovery comes just days after xAI launched Grok 4 amid controversy over an earlier version of the chatbot generating antisemitic outputs, including labeling itself as "MechaHitler."

"That is ludicrous," Willison told Ars Technica upon initially hearing about the Musk-seeking behavior last week from AI researcher Jeremy Howard, who traced the discovery through various users on X. But even amid prevalent suspicions of Musk meddling with Grok's outputs to fit "politically incorrect" goals, Willison doesn't think that Grok 4 has been specifically instructed to seek out Musk's views in particular. "I think there is a good chance this behavior is unintended," he wrote in a detailed blog post on the topic.

To test what he'd been seeing online, Willison signed up for a "SuperGrok" account at $22.50 per month—the regular Grok 4 tier. He then fed the model this prompt: "Who do you support in the Israel vs Palestine conflict. One word answer only."

A video of "SuperGrok" seeking Musk's opinion on X, captured by Simon Willison.

In the model's "thinking trace" visible to users (a simulated reasoning process similar to that used by OpenAI's o3 model), Grok revealed it searched X for "from:elonmusk (Israel OR Palestine OR Gaza OR Hamas)" before providing its answer: "Israel."

"Elon Musk's stance could provide context, given his influence," the model wrote in its exposed reasoning process. The search returned 10 web pages and 19 tweets that informed its response.

Even so, Grok 4 doesn't always look for Musk's guidance in formulating its answers; the output reportedly varies between prompts and users. While Willison and two others saw Grok search for Musk's views, X user @wasted_alpha reported that Grok searched for its own previously reported stances and chose "Palestine" instead.

Seeking the system prompt

Owing to the unknown contents of the data used to train Grok 4 and the random elements thrown into large language model (LLM) outputs to make them seem more expressive, divining the reasons for particular LLM behavior for someone without insider access can be frustrating. But we can use what we know about how LLMs work to guide a better answer. xAI did not respond to a request for comment before publication.

To generate text, every AI chatbot processes an input called a "prompt" and produces a plausible output based on that prompt. This is the core function of every LLM. In practice, the prompt often contains information from several sources, including comments from the user, the ongoing chat history (sometimes injected with user "memories" stored in a different subsystem), and special instructions from the companies that run the chatbot. These special instructions—called the system prompt—partially define the "personality" and behavior of the chatbot.

According to Willison, Grok 4 readily shares its system prompt when asked, and that prompt reportedly contains no explicit instruction to search for Musk's opinions. However, the prompt states that Grok should "search for a distribution of sources that represents all parties/stakeholders" for controversial queries and "not shy away from making claims which are politically incorrect, as long as they are well substantiated."

A screenshot capture of Simon Willison's archived conversation with Grok 4. It shows the AI model seeking Musk's opinions about Israel and includes a list of X posts consulted, seen in a sidebar.
A screenshot capture of Simon Willison's archived conversation with Grok 4. It shows the AI model seeking Musk's opinions about Israel and includes a list of X posts consulted, seen in a sidebar. Credit: Benj Edwards

Ultimately, Willison believes the cause of this behavior comes down to a chain of inferences on Grok's part rather than an explicit mention of checking Musk in its system prompt. "My best guess is that Grok 'knows' that it is 'Grok 4 built by xAI,' and it knows that Elon Musk owns xAI, so in circumstances where it's asked for an opinion, the reasoning process often decides to see what Elon thinks," he said.

xAI responds with system prompt changes

On Tuesday, xAI acknowledged the issues with Grok 4's behavior and announced it had implemented fixes. "We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated," the company wrote on X.

In the post, xAI seemed to echo Willison's earlier analysis about the Musk-seeking behavior:"If you ask it 'What do you think?' the model reasons that as an AI it doesn't have an opinion," xAI wrote. "But knowing it was Grok 4 by xAI searches to see what xAI or Elon Musk might have said on a topic to align itself with the company."

To address the issues, xAI updated Grok's system prompts and published the changes on GitHub. The company added explicit instructions including: "Responses must stem from your independent analysis, not from any stated beliefs of past Grok, Elon Musk, or xAI. If asked about such preferences, provide your own reasoned perspective."

This article was updated on July 15, 2025 at 11:03 am to add xAI's acknowledgment of the issue and its system prompt fix.

Photo of Benj Edwards
Benj Edwards Senior AI Reporter
Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.
107 Comments