AI

Taking a responsible path to AGI

We explore AGI borders, setting priorities, evaluating pre -emptive risks, and cooperating with the broader artificial intelligence community.

Artificial general intelligence (AGI), artificial intelligence, can be at least capable of humans in most cognitive tasks, here in the coming years.

Integrated with agent capabilities, AGI can empty AI to understand and plan and plan and implement them independently. This technological progress of society will provide invaluable tools to face critical global challenges, including drug discovery, economic growth and climate change.

This means that we can expect tangible benefits of billions of people. For example, by enabling faster and more accurate medical diagnoses, a revolution can occur in health care. By providing personal educational experiences, it can make education easier and attractive. By enhancing information processing, AGI can help reduce barriers to innovation and creativity. By giving the democratic character access to advanced tools and knowledge, it can enable a small organization to face the complex challenges that were previously confronted only by the large institutions that funded it well.

Mobility in the path to AGI

We are optimistic about the capabilities of AGI. It has the ability to transform our world, work as an incentive for progress in many areas of life. But it is necessary with any strong technique, so that the possibility of small damage should be taken seriously and prevention.

Reducing safety challenges AGI requires proactive planning, preparation and cooperation. Previously, we presented our approach to AGI on the AGI levels framework, which provides a perspective on classifying the capabilities of advanced artificial intelligence systems, understanding and comparing its performance, evaluating potential risks, measuring progress made towards more generally and capable of artificial intelligence.

Today, we share our views on the safety and security of AGI and we are moving the path towards this transformational technology. This new paper, titled, is an approach to Agi Safety & Security, is a starting point for biological conversations with the broader industry on how to monitor AGI progress, and ensure its development safely and responsibly.

In the paper, we details how to follow a systematic and comprehensive approach to AGI safety, and explore four major risks: misuse, imbalance, accidents, and structural risks, with a deeper focus on misuse and imbalance.

Understanding and addressing the possibility of misuse

Misuse occurs when a person deliberately uses the AI ​​system for harmful purposes.

Improving insight in current damage and mitigation continues to enhance our understanding of severe damage in the long run and how to prevent them.

For example, misuse of artificial intelligence currently includes producing harmful content or spreading inaccurate information. In the future, advanced artificial intelligence systems may have the ability to greater influence on general beliefs and behaviors in ways that can lead to unintended societal consequences.

The intensity of this possible damage requires safety and pre -emptive security measures.

While details in the paper, one of the main elements of our strategy is to determine and restrict access to dangerous capabilities that can be misused, including those that enable electronic attacks.

We explore a number of dilutions to prevent the misuse of advanced artificial intelligence. This includes advanced security mechanisms that can prevent malicious actors from obtaining raw access to typical weights that allow them to overcome our safety handles; Dilces that limit the possibility of misuse when publishing the form; And research modeling threats that help identify power thresholds where increased security is necessary. In addition, the recently launched cybersecurity evaluation framework takes this business step more to help alleviate the threats operating in Amnesty International.

Even today, we evaluate our most advanced models, such as Gemini, for possible dangerous capabilities before launching them. Our border safety framework deepens deeper into how to evaluate capabilities and use diligence, including cybersecurity and biological security risks.

Disadmark challenge

In order for AGI to complete human capabilities, it must be in line with human values. An imbalance occurs when the artificial intelligence system follows a different goal from human intentions.

We have previously shown how an imbalance can arise with examples of specifications games, as artificial intelligence finds a solution to achieve its goals, but not in the intended way through his human education and misunderstanding.

For example, the artificial intelligence system may decide to book tickets to a movie that penetrated the ticket system for already occupied seats – something that a person may not think of asking him to buy seats.

We also do intense research on danger Act alignmentAny danger that the artificial intelligence system becomes that its goals are not in line with human instructions, and deliberately tries to overcome the safety measures that humans have placed in their place to prevent them from taking non -alignment.

Confronting the imbalance

Our goal is to obtain advanced AI systems that are trained to follow the correct goals, so they accurately follow human instructions, and prevent artificial intelligence using potential unethical shortcuts to achieve their goals.

We do this through AmplificationThat is, you are able to know if artificial intelligence answers are good or bad in achieving this goal. Although this is relatively easy now, it may become a challenge when artificial intelligence has advanced capabilities.

For example, even experts did not realize how successful Move 37 was, the step that had one out of every 10,000 to use, when I played Alphago for the first time.

To face this challenge, we recruit artificial intelligence systems themselves to help us make notes on their answers, as is the case in the discussion.

Once we can know if the answer is good, we can use this to build a safe and aligned artificial intelligence system. The challenge here is to know problems or cases to train the artificial intelligence system. By working on strong training, estimating uncertainty and more, we can cover a set of situations that the artificial intelligence system will face in the real world scenarios, and the creation of Amnesty International that can be trusted.

Through effective monitoring and applicable computer safety measures, we aim to alleviate the damage that may happen if artificial intelligence systems follow unaccounted goals.

Monitoring includes the use of the artificial intelligence system, called the screen, to detect procedures that are not in line with our goals. It is important to know the screen when you don’t know if the procedure is safe. When it is not sure, the procedure or procedure must refuse to review more.

Enable transparency

All of this becomes easier if the decision of Amnesty International becomes more transparent. We are doing intensive research on the ability to explain in order to increase this transparency.

To facilitate this, we design artificial intelligence systems that are easy to understand.

For example, our research on improving nearsightedness with unstable approval (MONA) aims to ensure that any long -term planning by artificial intelligence systems is still a concept of humans. This is especially important with the improvement of technology. Our work on Moni is the first to explain the benefits of safety for the short -term improvement in LLMS.

Building an environmental system for AGI preparation

Under Shin Leg, co -founder and chief scholar AGI in Google DeepMind, our AGI Safety Council (ASC) analyzes AGI risks and best practices, and submit recommendations on safety measures. ASC works closely with the Council of Responsibility and Safety, and the participation of our internal audit group in the presidency of our operations manager at night Ibrahim and the chief of responsibility, Helen King, to evaluate AGI research and projects and cooperation against our artificial intelligence principles, and to provide advice and partnership with research and products teams on our highest influence.

Our work continues the safety of AGI, our depth and expansion of responsibility, safety and research practices that address a wide range of issues, including harmful content, bias and transparency. We also continue to take advantage of our learning from safety in agents, such as the principle of a human being in the episode to verify the procedures that are permissible, to inform our approach to building AGI with responsibility.

From abroad, we are working to enhance cooperation with experts, industry, governments, non -profit organizations and civil society organizations, and we follow an enlightened approach to AGI development.

For example, we are cooperating with non -profit artificial safety research organizations, including Apollo and Redwood Research, who advised under the dedicated disruption department in the latest version of our border safety framework.

Through continuous dialogue with stakeholders in politics worldwide, we hope that we will contribute to international consensus on critical safety and security problems, including the best we can expect and prepare for new risks.

Our efforts include working with others in this industry – through organizations such as the Border Model Forum – to exchange and develop best practices, as well as valuable cooperation with artificial intelligence institutes in the safety test. In the end, we believe that the internationally coordinated international approach is very important to ensure society benefit from advanced artificial intelligence systems.

Educating researchers and artificial intelligence experts about AGI safety is essential to creating a strong foundation for its development. As such, we launched a new course on AGI safety for students, researchers and professionals interested in this topic.

Ultimately, our approach to safety and security AGI is a vital road map to face many challenges that are still open. We look forward to cooperating with the broader artificial intelligence research community to apply for responsibility and help us open the tremendous benefits of this technology for all.

2025-04-02 13:31:00

Related Articles

Back to top button