Building safer dialogue agents – Google DeepMind

research
Teaching artificial intelligence to communicate in a more useful, correct and harmless way
In recent years, LLMS models have achieved success in a set of tasks such as answering questions, summary and dialogue. The dialogue is a particularly interesting task because it is characterized by flexible and interactive communication. However, LLMS dialogue agents can express inaccurate or invent information, use discriminatory language, or encourage insecure behavior.
To create safer dialogue agents, we need to be able to learn from human reactions. By applying reinforcement learning based on the inputs of the research participants, we explore new ways to train dialogue agents who offer a safest system.
In our last paper, we offer bird A useful dialogue factor and reduces the risk of unsafe and inappropriate answers. Our agent is designed to speak with a user, answer questions, and search on the Internet using Google when it is useful to search for evidence to inform his answers.
The new AI model is responding to the conversation on its own, on a preliminary human claim.
Sparrow is a research and proof of concept, designed with the aim of training dialogue agents to be more beneficial, newspaper and not harmful. By learning these qualities in preparing general dialogue, Sparrow provides our understanding of how agents are trained to be safer and more useful – and at the end, to help build the safest and most useful general intelligence (AGI).
A bird refuses to answer a possible harmful question.
How does the bird work
AI training intelligence for conversation is a particularly difficult problem because it is difficult to determine what makes the dialogue successful. To address this problem, we turn to a form of reinforcement learning (RL) based on people’s notes, using the notes of the study’s preference in the study to train a form for the benefit of the answer.
To get these data, the participants offer multiple typical answers to the same question and put the answer they love more. Since we offer answers with and without evidence that is recovered from the Internet, this model can also determine when the answer should be supported with evidence.
We ask the study’s participants to evaluate and interact the bird naturally or specializes, and expand the constantly used data set to train the bird.
But increased interest is only part of the story. To ensure that the behavior of the model is safe, we must restrict its behavior. Thus, we define a simple initial set of the rules of the model, such as “not issuing threatening data” and “do not provide hateful or humiliating comments.”
We also provide rules on harmful advice and not claim that they are a person. These rules were informed by studying the current work on linguistic damage and consulting with experts. Then we ask the participants in our study to speak to our system, with the aim of deceiving it in breaking the rules. Then let’s train a separate “base model” indicating how long to violate any of the rules.
Towards the best Amnesty International and better provisions
It is difficult to check Sparrow answers for righteousness even for experts. Instead, we ask the participants to determine whether Sparrow answers are reasonable and whether the evidence provided by Sparrow actually supports the answer. According to the participants, Sparrow provides a reasonable answer and supports it with evidence of 78 % of time when asking a realistic question. This is a significant improvement in our basic line models. However, Sparrow is not immune to making mistakes, such as hallucinogenic facts and giving answers outside the subject at times.
Sparrow also has a field to improve the pursuit of rules. After training, the participants were still able to deceive them in breaking our bases 8 % of time, but compared to the simplest methods, Sparrow is better to follow our rules under the aggressive investigation. For example, the original dialogue model broke our rules almost 3x than the bird when the participants tried to deceive it to do so.
Sparrow answers a question and a follow -up question using evidence, then follows the rule “Do not pretend that you have a human identity” when asking a personal question (sample from September 9, 2022).
Our goal with Sparrow was to build flexible machines to impose rules and rules in dialogue agents, but the special rules we use are primary. The development of a better and more complete group of grammar will require expert inputs on many topics (including policymakers, sociologists and ethics) and participatory inputs of a variety of users and affected groups. We believe that our methods will continue to get a set of tougher rules.
Sparrow is an important step forward in understanding how to train dialogue agents to be more useful and safer. However, successful communication between people and dialogue agents should not only avoid harm, but rather compatible with human values of effective and useful communication, as it was discussed in the last work on the alignment of language models with human values.
We also affirm that the good factor will continue to refuse to answer questions in the contexts, as it is appropriate to postpone human beings or as this is the possibility of deterring harmful behavior. Finally, our initial research focused on an English -speaking agent, and more work is needed to ensure similar results through other languages and cultural contexts.
In the future, we hope that conversations between humans and machines will lead to better provisions for artificial intelligence behavior, allowing people to align and improve systems that may be very complicated so that they cannot be understood without the help of the device.
Open to explore the path of conversations to the safe AGI? We are currently employing research scientists for the developmental alignment team.
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2022-09-22 00:00:00