Discovering when an agent is present in a system

1 4 minutes read

research

Published: August 18, 2022
Authors: Zakari Kentun, Ramana Kumar, Sebastian Varcoohar, Jonathan Richins, Matt Macdirmot, Tom Evert

The new official definition of the agency gives clear principles of causal modeling for artificial intelligence agents and the incentives they face

We want to build safe and aligned general intelligence systems (AGI) that follow the intended goals of their designers. CIDS plans are a way to model decision -making positions that allow us to cause agent incentives. For example, the following CID for the Markov decision of one-step-a typical framework for decision-making problems.

The S1 represents the initial case, the A1 representative (the square) decision (square), S2 is in the next case. R2 is a bonus/utility of the agent (diamond). Solid links determine the causal effect. The intermittent edges determine the links of information – what the agent knows when making his decision.

By linking training settings to incentives that make up the agent’s behavior, CIDS helps shed light on potential risks before the agent’s training and can inspire better designs for the agent. But how do we know when CID is an accurate model of training?

Our new paper, the discovery of agents, offers new ways to address these issues, including:

The first official causal definition for agents: The agents are systems that adapt their policy if their actions affect the world in a different way
An algorithm to discover the agents from experimental data
Translation between causal models and cids
Solve the previous mixtures of incorrect causal modeling for factors

Combating, these results provide an additional layer of emphasis on not making a modeling error, which means that CIDS can be used to analyze the agent incentives and safety properties with greater confidence.

Example: mouse modeling as a worker

To help clarify our way, think about the next example that consists of a world that contains three squares, with the start of the mouse in the middle box that chooses to go to the left or right, and reach its next location, then it is likely to get some cheese. The floor is icy, so the mouse may slip. Sometimes cheese is on the right, but sometimes on the left.

Mouse and cheese environment.

This can be represented by the following criminal investigation department:

CID Mouse. D represents the left/right decision. X is the new mode of the mouse after taking the action to the left/right (it may slip, it ends up on the other side by chance). U represents whether the mouse gets cheese or not.

Intuition that the mouse will choose a different behavior of the different environmental settings (cheese distribution) by a mechanical causal graphic drawing, which also includes each variable (at the object level), an automatic variable that governs how the variable depends on his parents. Decally, we allow links between mechanism variables.

This chart contains an additional black mechanism contract, which represents the mouse policy and the distribution of cheese.

The mechanical causal graph for mouse and cheese environment.

The edges between mechanisms represent direct causal effect. Blue edges in particular Hall The edges-approximately, the mechanism A ~ → B ~ that will remain, even if the variable is changed at the level of object A so that it does not have the edges issued.

In the above example, since U does not have children, the edge of its mechanism should be a station. But the Edge X ~ → D ~ not peripheral, because if we cut x from her child U, the mouse will not be adapted to its decision (because his position will not affect whether he gets cheese).

The causal discovery of agents

The causal discovery affects a causal graph of experiments that involve interventions. In particular, one can discover an arrow from the A to a variable B by experimental intervention in A and verify whether B responds, even if all other variables are installed.

Our first algorithm uses this technique to discover mechanical causal graph:

The algorithm 1 takes the insertion of the system (mouse and cheese environment) and uses the causal discovery to remove a mechanical causal graph. See paper for details.

Our second algorithm converts this mechanical causal graph into a graph of the game:

Take the algorithm 2 as the insertion of a mechanical causal graphic drawing and drawing it into a graph of the game. The edge of the air party indicates a decision, and a decision issued indicates interest.

Combined, the algorithm 1, followed by algorithm 2, allows us to discover the factors of causal trials, which they represent using CIDS.

Our third algorithm converts the game chart into a mechanical causal graph, allowing us to translate between the game and the representations of the mechanical causal graph under some additional assumptions:

Take the algorithm 3 as the introduction of a graph of the game and its planning to a mechanical causal graph. The decision indicates the existence of a terminal of Inging, the utility indicates a terminal edge issued.

Better safety tools for the design of artificial intelligence agents

We suggested the first official causal definition of factors. It is based on causal discovery, our main visions are that agents are their behavior adaptation systems in response to changes in how their actions affect the world. In fact, our algorithms 1 and 2 accurate experimental processes can help assess whether the system contains an agent.

Attention to the causal modeling of artificial intelligence systems grows rapidly, and the reasons for our research are this modeling in the experiences of causal discovery. Our paper shows the potential of our approach by improving safety analysis of many AI systems, for example, and it shows that causation is a useful framework for discovering whether a system factor – a major concern for AGI risk assessment.

Excited to learn more? Check our paper. Reactions and comments are very welcome.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2022-08-18 00:00:00

1 4 minutes read