The goal of long-term artificial intelligence (AI) safety is to ensure that advanced AI systems are reliably aligned with human values — that they reliably do things that people want them to do. Roughly by human values we mean whatever it is that causes people to choose one option over another in each case, suitably corrected by reflection


Geoffrey Irving et al. at OpenAI have a paper out on AI safety via debate; the basic idea is that you can model debates as a two-player game (and thus apply standard insights about how to play such games well) and one can hope that debates asymmetrically favor the party who's arguing for a true position over a false position. If so, then we can use debates between AI advisors for alignment

AI Safety via Debate. by ESRogs 1 min read 5th May 2018 4 comments. 11. Debate (AI safety technique) Frontpage. 10 The "AI Debate" Debate. 9 comments, sorted by Debate Model Security Vulnerabilities: A sufficiently strong misaligned AI may be able to convince a human to do dangerous things.

The paper "AI safety via debate" by Geoffrey Irving, Paul Christiano, and Dario Amodei is uploaded to the arXiv.

In addition, some scholars argue that solutions to the control problem, alongside other advances in AI safety engineering, might also find applications in existing non-superintelligent AI. [3] Major approaches to the control problem include alignment , which aims to align AI goal systems with human values, and capability control , which aims to reduce an AI system's capacity to harm humans or

First, I'm going to talk a little bit about why learning human values is difficult for AI systems. Then I'm going to explain to you the safety via debate method, which is one of the methods that OpenAI's currently exploring for helping AI to robustly do what humans want.

brings the values and principles of ethical, fair, and safe AI to life, will require that you moral motivations for thinking through the social and ethical aspects of AI debate. Big Data & Society, 3(2), 205395171667967. https

AI safety via debate Research paper by Geoffrey Irving, Paul Christiano, Dario Amodei Indexed on: 02 May '18 Published on: 02 May '18 Published in: arXiv - Statistics - Machine Learning Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers [1]. The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high Artificial intelligence (AI), or machine intelligence, has been defined as "intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans" and "…any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals." 1 Wikipedia goes on to classify AI into three different types of systems 1: Geoffrey Irving, Paul Christiano, and Dario Amodei of OpenAI have recently published "AI safety via debate" (blog post, paper). As I read the paper I found myself wanting to give commentary on it, and LW seems like as good a place as any to do that. What follows are my thoughts taken section-by-section.
Geoffrey Irving et al.

Jeremie Harris. Mar 30, 2020 The Talk. Here's an overview of what I'm going to be talking about today. First, I'm going to talk a little bit about why learning human values is difficult for AI systems.