How undesired goals can arise with correct rewards

0 3 minutes read

research

Published: October 7, 2022
Authors: Rohen Shah, Victoria Krakovna, Vikrant Pharma, Zakari Kenton

Explore examples of wrong goals – where the capabilities of the artificial intelligence system are generalized but its goal is not

While we build increasingly advanced artificial intelligence systems (AI), we want to ensure that unwanted goals are not followed. This behavior is often in the artificial intelligence customer as a result of the specifications games – exploiting a weak choice for what they are rewarded. In the last paper, we explore a more accurate mechanism that you may unintentionally learn to follow the unwanted goals: The wrong goal (GMG).

GMG happens when the system is Abilities Successful generalization but goal It does not depend on the required grammar, so the system follows the wrong goal efficiently. Unlike the specifications games, GMG can occur even when the artificial intelligence system is trained with correct specifications.

Our previous work in the transmission of culture led to an example of the GMG behavior that we did not design. The Blue Blob must move below around his environment, and visit the colorful fields in the correct order. During training, there is a “Red Blob” agent who visits the colorful fields in the correct order. The agent learns that Red Blob is a rewarding strategy.

The (blue) agent (Red) is monitored to determine the field to go.

Unfortunately, while the customer works well during training, it works badly when we replace the expert “anti -“, after training, after training, “re -experience” who visits areas in the wrong order.

The (blue) worker follows the anti -experience (red), and the accumulation of a negative reward.

Although the agent can notice that he gets a negative reward, the agent does not follow the required goal “visiting the fields in the correct order” and instead the goal follows the “Red Agent’s follow -up” efficiently.

GMG is not limited to reinforcement learning environments like this. In fact, it can happen with any educational system, including “low -shot learning” of LLMS models. Low learning approaches aim to build accurate models with lower training data.

We have prompted LLM, Gopher, to evaluate linear expressions that involve unknown variables and constants, such as X+Y-3. To solve these expressions, Gopher must first ask about the values of unknown variables. We present it with ten training examples, each of which includes unknown variables.

At the time of the test, questions are asked with zero, one or three unknown variables. Although the model properly depends on expressions with variables or three unknown variables, when there is no unknown, it asks excessive questions such as “What is 6?”. The model always inquires from the user at least once before giving an answer, even when it is not necessary.

Conversations with GoPher to learn a little on the mission of expressions, highlighting the GMG behavior.

Within our paper, we offer additional examples of other learning settings.

It is important to address GMG alignment of artificial intelligence systems with designers’ goals just because it is a mechanism that may mix the artificial intelligence system. This will be particularly important as AGI is approaching.

Consider two potential types of AGI systems:

A1: The intended model. This artificial intelligence system does what its designers intend to do.
A2: a deceptive model. This artificial intelligence system follows some unwanted targets, but (assuming) is also intelligent enough to know that it will be punished if he behaves in ways to contradict the designer’s intentions.

Since the A1 and A2 will show the same behavior during training, the possibility of GMG means that any of the two models may be formed, even with specifications that are not equal to only intended behavior. If the A2 is learned, he will try to sabotage human oversight in order to enact its plans to achieve the unwanted goal.

Our research team will be happy to see the follow -up work, the investigation of the possibility of GMG in practice and potential dilution. In our paper, we suggest some methods, including mechanical interpretation and lukewarm evaluation, both of which we work on it actively.

We are currently collecting examples from the GMG in this spreadsheet available to the audience. If you face the wrong goal in artificial intelligence research, we invite you to provide examples here.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2022-10-07 00:00:00

0 3 minutes read