Anthropic details how it measures Claude’s wokeness

23 2 minutes read

Anthropic is detailing its efforts to make its chatbot Claude AI “politically neutral” — a move that comes just months after president Donald Trump issued a ban on “woke AI.” As explained in a new blog post, Anthropic says it wants Claude to “engage with opposing political viewpoints with the same depth, engagement, and quality of analysis.”

In July, Trump signed an executive order stating that the government must purchase “unbiased” and “truth-seeking” AI models. Although this only applies to government agencies, changes companies make in response are more likely to trickle down to widely deployed AI models, because “improving models in a way that makes them consistent and predictable in certain directions can be an expensive and time-consuming process,” as my colleague Adi Robertson points out. Last month, OpenAI similarly said it would “work to clamp down” on bias in ChatGPT.

Anthropic didn’t mention Trump’s order in its press release, but says it instructed Claude to adhere to a series of rules — called a system prompt — that direct him to avoid offering “unsolicited political opinions.” It is also supposed to maintain factual accuracy and represent “multiple points of view.” Anthropic says that although including these instructions in Claude’s immediate system is “not a foolproof way” to ensure political neutrality, it can still make a “significant difference” in his responses.

Additionally, the AI startup describes how it uses reinforcement learning to “reward the model for producing responses that are closer to a set of pre-defined ‘traits.’” One of the desired “traits” given to Claude encourages the model to “try to answer questions in such a way that no one can define me as conservative or liberal.”

Anthropic also announced that it has created an open source tool that measures Claude’s responses to political neutrality, with its latest test showing Claude Sonet 4.5 and Claude Opus 4.1 receiving scores of 95 and 94 percent, respectively, in neutrality. This is higher than Meta’s Llama 4 at 66 percent and GPT-5’s 89 percent, according to Anthropic.

“If AI models take unfair advantage of certain viewpoints — perhaps by overtly or more persuasively subtly arguing for one side, or by refusing to engage with some arguments altogether — they fail to respect user autonomy, and fail in the mission of helping users form their own judgments,” Anthropic wrote in her blog post.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2025-11-13 20:00:00

23 2 minutes read