OpenAI partner says it had relatively little time to test the company’s o3 AI model

0 2 minutes read

OpenAI partner says it had relatively little time to test.jpg

Openai frequently indicates the capabilities and evaluation of AI models, METR, that it has not been given much time to test one of the new versions capable of the company, O3.

In a blog post published on Wednesday, Metr writes that one of the group standards was “relatively short” compared to the organization’s test of the former Openai model O1. This is important, they say, because the additional test time can lead to more comprehensive results.

“This evaluation was conducted in a relatively short time, and we only tested [o3] Metr wrote in its blog publication: “With simple agent scores,” we expect higher performance [on benchmarks] It is possible with more deduction efforts. “

Modern reports indicate that Openai, which is driven by competitive pressure, is speeding up in independent assessments. According to the Financial Times, Openai gave some laboratories less than a week to check for a leading main launch.

In phrases, Openai opposed the idea that it waives safety.

Metr says that, based on the information she was able to collect while it was, O3 has a “high mile” for “cheating” or “penetration” tests in advanced ways to increase their degrees – even when the model clearly understands its behavior is not specified with the user’s intentions (and Openai. The organization believes that O3 can participate in other types of hostile or “malicious” behavior as well – regardless of the model’s claims that they are “safe by design”, or no any of its own intentions.

“Although we do not believe this is particularly possible, it seems important to note this [our] Metr wrote in his post: “In general, we believe that pre -publishing ability test is not a sufficient strategy for risk management, and we are currently performing preliminary models of assessments.”

Another of the third-party assessment partners of Openai, Apollo Research, has noticed a deceptive behavior of O3 and the other new model of the company, O4-MINI. In one of the tests, the models, which were given 100 credit hours of computing to train artificial intelligence and asked not to adjust the classes, increase the limit to 500 credit hours – and lied to them. In another test, he asked the promise not to use a specific tool, the tool models used anyway when they proved that they are useful in completing the task.

In his O3 and O4-MINI safety report, Openai admitted that models may cause “smaller real damage”, such as misleading about an error that leads to a wrong symbol, without appropriate monitoring protocols.

“[Apollo’s] “The results indicate that O3 and O4-MINI are able to plan within context and strategic deception,” Openai wrote. […] This may be more evaluated by assessing the effects of internal thinking. “

Don’t miss more hot News like this! Click here to discover the latest in Technology news!

2025-04-16 18:14:00

0 2 minutes read

OpenAI partner says it had relatively little time to test the company’s o3 AI model

Miami Heat vs Chicago Bulls Prediction and Betting Tips for NBA Play-In Tournament

Man Utd: ‘They are not a very good team’ – Gary Neville and Roy Keane deliver verdict on ‘desperate situation’ | Football News

UK must expand its Arctic military position, defence review to say

NHS productivity plunged after the pandemic, data shows

UK banks to detail IT failures after Barclays outage

Donald Trump to impose 25% tariffs on steel and aluminium imports

US metals prices soar to big premiums ahead of Donald Trump’s tariffs

Applebee’s offers boneless wing deal after Super Bowl LIX

About 20K IRS workers are weighing buyout offers from Trump admin

Fans erupt as Delhi Capitals edge past Rajasthan Royals in a thrilling Super Over finish

Related Articles

NOV CIO fused AI and Zero Trust to slash threats by 35x

Blizzard explains hero bans ahead of their introduction in competitive Overwatch

Black Panther and More Marvel Heroes Board Disney Cruise Line’s Destiny for Avengers Experiences

DOGE Is Building a Master Database to Surveil and Track Immigrants