The Problem Before the Alignment Problem: Why Defining Human Values Is the Real Challenge in AI Alignment

As we grapple with aligning Artificial Intelligence to human values, we face a more fundamental challenge: agreeing on those values. This article explores the complexities of human morality and why achieving consensus on what we're aligning AI to might be our greatest hurdle yet.

The Problem Before the Alignment Problem: Why Defining Human Values Is the Real Challenge in AI Alignment

In the race to develop safe and beneficial Artificial Intelligence, much attention has been focused on the so-called "Alignment Problem"—the challenge of ensuring that AI systems behave in ways that are aligned with human values and intentions.

However, as we look into this complex issue, a more fundamental question emerges: what exactly are these human values we're trying to align with? If we cannot first answer this question, there is nothing for us to align to.

The Alignment Problem, as traditionally framed, assumes a clear and unified set of human values that can serve as a moral compass for AI systems. Yet, upon closer examination, this assumption reveals itself to be a significant oversimplification of the rich tapestry of human morality and ethics.

The Myth of Universal Human Values

When we speak of aligning Artificial Intelligence with human values, we often take for granted that there exists a coherent and universally agreed-upon set of principles that define what it means to be human and therefore what must be protected to align to "human values".

This assumption, however, quickly unravels when we consider the vast diversity of human experiences, cultures, and belief systems that coexist on our planet.

Consider, for instance, the stark contrasts in values between different human groups:

  • A libertarian might prioritise individual freedom above all else, viewing any form of centralised control or regulation as an infringement on personal liberty.
  • A communitarian, on the other hand, might argue that true value and meaning are derived from one's connection to and participation in a larger community.
  • An uncontacted tribe in the Amazon rainforest might value isolation and the preservation of their traditional way of life, rejecting the encroachment of modern civilisation.
  • A technology company specialising in AI chip manufacturing might prioritise technological progress and resource extraction, even at the expense of the rainforest.

Each of these perspectives represents a valid set of human values, yet they are often in direct conflict with one another.

How, then, can we hope to align Artificial Intelligence with "human values" when we ourselves cannot agree on what those values should be?

This dilemma is further complicated by the fact that human values are not static or easily reducible to simple rules. Our moral intuitions often defy rigid categorisation, and we frequently find ourselves navigating complex ethical landscapes without a clear roadmap.

As philosopher Toby Ord points out in his book "The Precipice," even seemingly straightforward moral imperatives like "Do not kill" or "Do not lie" quickly become murky when faced with real-world scenarios. Is it acceptable to lie to protect someone from harm? Is killing in self-defence morally justified? These questions have been debated by ethicists for centuries, with no clear consensus emerging. Yet the traditional approach to the Alignment Problem side-steps these precursor questions.

The Fluidity of Human Morality

The challenge of defining human values is not merely a matter of reconciling different cultural or ideological perspectives. Even within individuals, values can be inconsistent, context-dependent, and subject to change over time.

Consider how our collective values have evolved throughout history. Practices that were once widely accepted, such as slavery or denying women the right to vote, are now universally condemned in most societies. This moral progress demonstrates the dynamic nature of human values and raises questions about how we can align Artificial Intelligence with a moving target.

Moreover, our individual values often exist in a state of tension or contradiction. We might simultaneously value personal privacy and public safety, or economic growth and environmental protection. These conflicting priorities require constant negotiation and trade-offs, a nuanced balancing act that humans are able to leave in a state of logical conflict which could cause problems for a goal-driven AI.

As Stuart Russell, a prominent AI researcher, notes in his book "Human Compatible," our inability to precisely define human values doesn't mean we don't have them. We navigate the world guided by a complex, often implicit, set of preferences and moral intuitions.

So we can agree that there is something to the notion of "human values": otherwise, we have no reason to condemn Auschwitz.

However, translating these intuitions into a formal system that can be understood and implemented by an AI on a day-to-day basis without contradiction or harm presents an enormous challenge.

The Path Forward: Embracing Complexity

Given the inherent complexity and diversity of human values, how can we hope to make progress on the Alignment Problem? The answer may lie in reframing our approach to AI alignment itself.

Instead of seeking a single, unified set of human values to align AI with, we might need to develop systems that can navigate and reconcile diverse and sometimes conflicting value systems. This could involve creating AI that can:

  1. Recognise and respect the diversity of human values across different cultures and individuals.
  2. Understand the context-dependent nature of ethical decision-making.
  3. Engage in moral reasoning that considers multiple perspectives and potential consequences.
  4. Adapt to evolving human values over time.

Achieving this level of ethical sophistication in Artificial Intelligence is, undoubtedly, a monumental challenge. It requires not only advances in AI technology but also a deep interdisciplinary collaboration between computer scientists, ethicists, anthropologists, and philosophers.

Some promising approaches are already emerging. For example, researchers at DeepMind have proposed the concept of "debate" as a way to align AI systems with human preferences. In this framework, AI agents argue different sides of a question, allowing humans to gain insight into complex issues and make more informed judgements. This approach acknowledges the complexity of human values and seeks to augment rather than replace human moral reasoning.

Another avenue of research focuses on "value learning," where AI systems attempt to infer human values through observation and interaction. This approach recognises that human values are often implicit and context-dependent, and seeks to develop AI that can flexibly adapt to different value systems.

As we continue to grapple with these challenges, it's crucial to maintain a sense of humility and openness to diverse perspectives. The quest to align Artificial Intelligence with human values is not just a technical problem, but a profound philosophical and ethical endeavour that forces us to confront fundamental questions about what it means to be human.

In conclusion, while the alignment problem remains a critical challenge in AI development, we must recognise that the "Problem Before the Alignment Problem"—defining and agreeing upon human values—is equally, if not more, daunting. As we strive to create beneficial Artificial Intelligence, we must embrace the complexity and diversity of human morality, seeking solutions that can navigate this rich ethical landscape rather than imposing a simplistic or reductionist view of human values.

By acknowledging the depth of this challenge, we open up new avenues for research and dialogue that may ultimately lead us to more robust and ethically sound approaches to AI development.

In doing so, we not only work towards creating better AI systems but also deepen our understanding of our own values and what it truly means to be human in an increasingly technological world.