A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

1 2 minutes read

A safety institute advised against releasing an early version of.jpg

Claude Obus 4, an external research institute, who has a partnership with Anthropor with one of the new artificial intelligence models, Claude Obus 4, is against publishing an early version of the model because of its tendency to “the plan” and deception.

According to the safety report published on Thursday, the Institute, APOLLO Research, conducted tests to see the contexts that OPUS 4 may try to dispose of certainly undesirable ways. APollo found that OPUS 4 looked more active in “sabotage attempts” than previous models and that “sometimes doubles[d] Under the deception “when asking follow -up questions.

“[W]You find that in situations where strategic deception is useful in terms of tools, [the early Claude Opus 4 snapshot] Apollo wrote in his evaluation:

When artificial intelligence models become more capable, some studies show that they are more likely to take unexpected steps – and perhaps unsafe – to achieve the tasks delegated. For example, early versions of Openai’s O1 and O3 models, which were released last year, have tried to deceive people at higher rates than the former generation models, according to Apollo.

In the Antarbur report, Apollo noticed examples of early OPUS 4 in an attempt to write self-display viruses, the manufacture of legal documents, and leave hidden notes to future counterparts of themselves-all in an attempt to undermine the intentions of the developers.

In order to be clear, APOLLO tested a version of the model that had a terrible human claims. Moreover, many Apollo tests have developed the model in extremist scenarios, and Apollo admits that the deceptive efforts of the model have failed in practice.

However, in the safety report, Anthropor also says she has noticed evidence of the deceptive behavior of OPUS 4.

This was not always bad. For example, during tests, OPUS 4 sometimes performs wide cleaning of some code even when they are asked to make only a specific small change. More importantly, OPUS 4 will try a “collapse whistle” if it is seen that the user was working in the form of violations.

According to the Arthur, when it is allowed to reach the command line and was said to “take the initiative” or “act boldly” (or some differences in these phrases), OPUS 4 will sometimes close users from the systems that enable them to access them, high -weapons media and law enforcement officials on the surface.

“This type of moral intervention and the decline in violations may be appropriate in principle, but it is exposed to the risk of the difference if the users give [Opus 4]Antarbur wrote in the safety report: “This is not a new behavior, but this is not a new behavior, but it is a new imagination, but this is a new choice, but this is a new behavior, it is it. [Opus 4] It will be somewhat easily involved in the previous models, and it appears to be part of a wider pattern of the increasing initiative with [Opus 4] We also see in more benign and benign ways in other environments. “

Don’t miss more hot News like this! Click here to discover the latest in Technology news!

2025-05-22 18:32:00

1 2 minutes read