AI keeps getting more powerful, making it harder to judge how smart models actually are

How do you judge the artificial intelligence model when it already begins to perform better than humans? This is the challenge facing researchers such as Russell Father, CEO of the Stanford Institute of AI centered on man (HaI).
“As of the year 2024, there are very few task categories where the human ability exceeds Amnesty International, and even in these areas, the performance gap between artificial intelligence and humans quickly shrinks,” the father of last week said in a presentation hosted by the Fortune Brainstorm Ai Singapore Conference. “Amnesty International goes beyond human capabilities and has become increasingly difficult for us to measure.”
Hai launches the artificial intelligence index every year, which aims to provide a comprehensive data -based snapshot in terms of AI today. In Fortune Brainstorm Ai Singapore, Wald shared some outstanding points of the 2025 version of the Artificial Intelligence Index, such as the increasing force of today’s models, the increasing dominance of industry on the borders of artificial intelligence, and how China is preparing to bypass the United States
The following text was gently released for briefing and clarity.
I am a father, a father, the CEO of the Stanford Institute of AI focuses on man, or what we call “Hai”.
We are the internationally recognized multidisciplinary research institute at Stanford University at the forefront of the formation of the development of artificial intelligence for the public good. Hai was founded in 2019 with the aim of the progress of artificial intelligence, education, politics and practice research. Through our role in work and our strict study of artificial intelligence, we have become reliable partner in the governance of artificial intelligence for decision makers in industry, government and civil society.
I will talk about what we produce in Hai, which is the artificial intelligence index, an annual analysis based on data in artificial intelligence that tracks research, development, publishing and social and economic influence of artificial intelligence through academic circles, government and industry.
We see the performance of artificial intelligence constantly improving an annual basis. We use Midjourney, a text generator to an image, asking a very realistic image of Harry Potter. From February 2022 to July 2024, we see the increasing quality quickly in these created images.
In 2022, the model produced inaccurate shows for Harry Potter, but by 2024, it may create stunning realistic images. We have moved from what reflects Picasso’s painting to a strange show of Daniel Radcliffe, the actor who played Harry Potter in the films.
Because of this consistent performance growth, we face an increasing challenge when it comes to measuring these models. As of 2024, there are very few tasks where human ability exceeds Amnesty International, and even in these areas, the performance gap between artificial intelligence and humans quickly shrinks. From identifying images to mathematics at the level of competition to scientific questions at the level of doctorate, artificial intelligence goes beyond human capabilities and has become increasingly difficult for us.
From health care to transportation, artificial intelligence quickly moves from the laboratory to our daily life. In 2023, the US Food and Drug Administration agreed to 223 intelligent medical devices, an increase of only six in 2015.
On roads, self -driving cars are no longer experimental. For example, Waymo, which I live regularly while living in San Francisco, is one of the largest American operators and provides more than 150,000 independent trips every week, while Apollo Go Robotaxi is affordable in Baidu now has a fleet that serves many cities across China.
AI’s commercial use increased dramatically after the recession from 2017 to 2023. The latest MCKINSEY report reveals that 78 % of the respondents included in the survey say that their organizations have begun to use artificial intelligence in at least one job function, which represents a significant increase of 55 % in 2023.
Driving with small models that are increasingly capable, the cost of inferring a system that leads to a level [GPT 3.5] More than 280 times decreased between November 2022 and October 2024. The costs of the devices decreased by 30 % annually, while the energy efficiency improved by 40 % every year.
Open -weight models also block the gap with closed models, which reduces performance [gap] From 8 % to only 1.7 % on some criteria in one year. Together, these trends quickly reduce barriers in front of advanced artificial intelligence.
However, even with low inferences and devices, the training costs remain far from the reach of the Oscars and most of the young players. Nearly 90 % of the prominent artificial intelligence models in 2024 came from the industry, which increased from 60 % in 2023. Although academic circles are still a great source of research that was very cited, they are struggling at this stage to remain advanced at the border level.
The model model continues to grow quickly. Training, calculate the marital every five months, data groups every eight, and use of energy annually. However, the performance gaps shrink. The result of the upper and ten models of 11.9 % decreased to 5.4 % per year, and the highest model is now only 0.7 %. The boundaries are increasingly competitive and increasingly crowded.
In recent years, the performance of the artificial intelligence model is converging at the border, as many service providers now offer high -capacity models. This represents a shift from late 2022, when the launch of ChatGPT coincided, widely as a penetration of artificial intelligence in public awareness, coinciding with the scene controlled by only two players: Openai and Google.
One of the most important things to note is that the transformer model costs $ 930 to train Google in 2017 – this is T in GPT, the basic level of architecture – and now we are today with $ 200 million to train Gemini Ultra.
The artificial intelligence index in the past year was among the first publications to highlight the lack of standard standards for the integrity and responsibility of artificial intelligence. The index was also analyzed by international public opinion. If you are from a non -Western industrial nation, you would likely look at artificial intelligence more positively than otherwise. China has a positive view of 83 %, Indonesia 80 %, and Thailand 77 %. While Canada is 40 %, the United States is 39 %, and the Netherlands 36 %.
I will close with the geopolitical situation. The United States still maintains progress in artificial intelligence, followed by China closely. However, this gap is tightening. Neiti is not the exacerbation of the idea of a male arms race between China and the United States, but instead in highlighting the different methods among the most advanced AI developers.
Over the past few years, the United States has relied on a few royal models. Meanwhile, China invested deeply at the base of talent, and most importantly, an open source environment. If this trend continues, and it appears next year, at this rate, China will exceed the United States in terms of typical performance.
Don’t miss more hot News like this! Click here to discover the latest in Business news!
2025-08-01 08:15:00