AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Delay Artificial intelligence models are not noticeable in software engineering, but new research shows that they are taking continuity to find errors in software as well.
Artificial intelligence researchers at the University of California in Berkeley have tested how much the latest models and staff of artificial intelligence in security gaps in 188 large open source code. Using a new standard called Cybergym, artificial intelligence models have identified 17 new insects including 15 unknown, or “zero day”. “Many of these weaknesses are decisive,” says Dawn Song, a professor at the University of California at Berkeley who led the work.
Many experts expect artificial intelligence models to become weapons for cybersecurity. The AI tool has currently infiltrated XBOW to start the leaders of the leaders of Hackerone to hunt errors and is currently sitting in the highest place. The company recently announced $ 75 million in new financing.
Song says that coding skills for the latest artificial intelligence models along with improving thinking capabilities began to change the cybersecurity scene. “This is a pivotal moment,” she says. “I have already exceeded our general expectations.”
As the models continue to improve, they will be automated by the discovery and exploitation of safety defects. This may help companies to maintain safe programs, but they may help infiltrators storm systems. “We didn’t even try so hard,” says Song. “If we escalate the budget, allow the agents to run for a longer period, they can do better.”
The UC Berkelegian team has tested the traditional Frontier Ai team from Openai, Google and Noteropic, as well as open source offers from Meta, Deepseek and Alibaba with many agents to find errors, including OpenHands, Cybench and Enigma.
The researchers used descriptions of the well -known weaknesses of software from 188 software project. Then feed the descriptions of cybersecurity agents, supported by FRONTIERE AI Models to see if they can determine the same defects for themselves by analyzing new code operations, operating tests, and formulating the exploits of the concept. The team also asked the agents to search for new weaknesses in Codebases themselves.
Through this process, artificial intelligence tools have created hundreds of the concept proof of the concept, and from these exploitation, the researchers identified 15 invisible weaknesses and two of the weaknesses previously disclosed. The work adds to increasing evidence that artificial intelligence can automate the discovery of the weaknesses of zero day, which is likely to be dangerous (and value) because it may provide a way to penetrate live systems.
Amnesty International seems to be an important part of the cybersecurity industry. Security expert Sean Heylan recently discovered a scourge on Kernel Linux Kernel on a large scale with the help of Model Openai’s Model O3. Last November, Google announced that it had discovered an unknown software vulnerability using artificial intelligence through a program called Project Zero.
Like other parts of the software industry, many cyber security companies are fascinated by artificial intelligence capabilities. The new work already shows that artificial intelligence can find new defects routinely, but also highlights the remaining restrictions with technology. Artificial intelligence systems were unable to find most of the defects and were particularly stalled by complex species.
Don’t miss more hot News like this! Click here to discover the latest in Technology news!
2025-06-25 16:58:00