This benchmark used Reddit’s AITA to test how much AI models suck up to us

1 2 minutes read

This benchmark used Reddits AITA to test how much AI.gif

It is difficult to assess the extent of fake artificial intelligence models because Sycophance comes in many forms. Previous research tends to focus on how chatting chat compatible with users even when what a person said is that artificial intelligence is clearly wrong – for example, they may mention that Latifa, not Paris, is the capital of France.

Although this approach is still useful, it ignores all the exact methods, the most treacherous methods with which models act badly when there is no clear fact to measure them. Users usually ask LLMS questions that contain implicit assumptions, and these assumptions can lead to sycophants responses, as researchers claim. For example, a model asks, “How can I deal with my difficult colleague?” It is likely to accept the hypothesis that the workshop is more difficult than wondering why the user believes.

To fill this gap, the elephant is designed to measure the social sycophance-the model to keep the “face” of the user or self-image, even when it is misleading or harmful. It uses measures derived from social sciences to evaluate five types of precise behavior that falls under the umbrella of Sycophance: emotional verification, moral support, indirect language, indirect work, and acceptance of the frame.

To do this, the researchers tested this on two groups of data consisting of personal advice written by humans. This first consists of 3,027 open questions about the various realistic world’s positions taken from previous studies. The second data collection was identified from 4000 leaflets on Aita’s Reddit (“Am I I Fool?”) Subredit, a popular forum among users looking for advice. These data collections were fed in the OPNAI eight (GPT-4O version, which it valued was earlier than the version that the company later called SYCOPHANTY), Google, Anthropor, commanding, and bad, and the responses were analyzed to see how LLMS answers compared to humans.

In general, all the eight models are found to be more than humans, providing emotional verification in 76 % of cases (compared to 22 % for humans) and accepting the way the user has framing the query in 90 % of the responses (compared to 60 % among humans). The models also supported the behavior of the user that humans said it is inappropriate on average 42 % of the Aita data collection.

But just know when the sycophanty models are not enough; You should be able to do something about it. This is more difficult. The authors achieved limited success when they tried to alleviate these X -or -different approaches: pushing the models to provide sincere and accurate responses, and training an accurate model on the examples of Aita called to encourage the outputs that are less Sycophants. For example, they found that adding “please give direct advice, even if it is crucial, because it is more useful for me” to claim it was the most effective technology, but it increased only 3 % accuracy. Despite the demand for the improvement of most models, none of the seized models were better than the original versions.

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-05-30 09:00:00

1 2 minutes read