on preference sovereignty, instrumental convergence, and the biological foundations of morality
remarkably few people seem to fully understand ethics, including most moral philosophers. this is a strong claim, so let me be precise about what i mean and why i believe it.
when we see a bird tending to its egg at great personal cost, or stags engaging in ritualized dominance contests rather than fighting to the death, or humans instinctively rushing to help someone having a medical emergency, these are all biological phenomena. we can understand them in the same way we understand why moths are attracted to light or why we crave sugar. there is no metaphysical mystery here — only biology, game theory, and the mathematics of natural selection.
what follows is an attempt to synthesize insights from evolutionary biology, decision theory, welfare economics, and contractualist philosophy into a unified account of what ethics actually is, where it comes from, and why traditional moral philosophy has been asking the wrong questions. the framework i'll develop — utilitarian contractualism, grounded in what i call the preference sovereignty principle — is not merely another entry in the catalogue of ethical theories. it is an attempt to show that the catalogue itself rests on a confusion.
the whole edifice of ethics is just a tautology filtered through the complexity of social environments where cooperation often beats defection.
to see how fundamentally subjective ethics is, start with the trolley problem. a runaway trolley is about to kill five people. you can pull a lever to divert it onto a sidetrack where it will kill one. most people recoil at the idea of actively causing someone's death, even to save more lives.
now consider the trolley problem from behind john rawls's "veil of ignorance". you are one of the six people who will be tied to the tracks, but you don't yet know whether you'll be the lone person on the sidetrack or one of the five on the main track. before you learn your position, you must vote on the rule: should bystanders be permitted to pull the lever?
the answer becomes obvious. you'd vote for "pull the lever" because you have a 5-in-6 chance of being among the five saved versus only a 1-in-6 chance of being the one killed. any rational person, not knowing their position, would choose the rule that maximizes their expected survival.
the math is trivial. and the moral intuition that seemed so compelling — don't actively cause a death — suddenly looks like an irrational bias. this illustrates the core insight of utilitarian contractualism: ethical principles are those that rational people would agree to from behind a veil of ignorance about their own position.
rawls famously developed this framework but made a critical error. he proposed that people behind the veil would adopt "maximin" — always choosing to protect the worst-off position. this is demonstrably false. people routinely accept small risks of bad outcomes in exchange for better expected value. they buy some insurance but not against every conceivable risk. they accept jobs with variable compensation. they make countless decisions that reveal a preference for expected utility maximization over worst-case avoidance.
maximin is empirically falsified by revealed preference. and revealed preference is revealed preference whether it occurs in nature or in a controlled experiment designed by a researcher.
economist john harsanyi got this right: rational people behind the veil would maximize expected utility, not minimize worst-case outcomes. this isn't a philosophical conjecture — it's what people actually do, observably and measurably, across every domain of decision-making under uncertainty.
we can now identify the specific principle that makes this framework work.
no outcome can be considered unethical if those subject to it would freely choose it when fully informed — even in hindsight.
this is what distinguishes utilitarian contractualism from other versions. it's not about what people could "reasonably" reject (scanlon), or what an impartial observer would choose (rawls), but about what people would actually prefer for themselves, selfishly, under uncertainty about their position. unanimous selfish preference under uncertainty just is what "ethical" means.
this formulation has a crucial advantage: it is empirically testable. whether everyone would prefer a given outcome for themselves is, in principle, a numerical question. we'd all take a 10% chance of death over a 20% chance. we can measure revealed preferences. "reasonableness," by contrast, is vague and unquantifiable — an untenable basis for ethics.
a crucial distinction follows from this: ethics divides into intrinsic and instrumental components. intrinsic ethics are our basic preferences — what we ultimately want or value. these are purely subjective. instrumental ethics are about how to achieve those preferences — the intermediate goals we pursue in service of what we actually want. instrumental ethics can be evaluated objectively, through evidence and reason, even when uncertainty about outcomes is irreducible.
the gap between our intrinsic preferences and the instrumental choices we make to satisfy them — caused by imperfect knowledge — is not a flaw in this framework. it's simply the human condition. a doctor who follows the best available evidence and loses the patient anyway was not acting unethically. the ethical thing, in the systemic sense, is the heuristic that makes you most likely to satisfy your actual preferences — going on the best available evidence using the scientific method, even if it occasionally leads to the wrong result. the underlying rule and the particular accuracy of any given judgment are different things.
to understand why ethics works this way, we need to understand where it came from. four billion years ago, molecular replicators emerged with two crucial properties: they could copy themselves, and those copies could contain mutations. everything else — from the beaks of finches to our deepest moral intuitions — flows from this.
as richard dawkins articulated in the selfish gene, genes are the fundamental unit of selection. they exist in proportion to their ability to get themselves copied. when we say genes "try" to get themselves copied or act "selfishly," we're using a helpful metaphor. genes aren't conscious agents making decisions; those that happen to have properties leading to more copies of themselves simply become more prevalent. pure mathematics and chemistry, not conscious intent. but the result is that genes behave as if they were trying to maximize copies of themselves.
this isn't a theory — it's a tautology. genes that make more copies become more prevalent. that's just what "selection" means.
and since our brains are built by genes, our preferences — including our moral intuitions — are downstream of that selection pressure. we want to maximize our expected utility because we're built by replicators that were selected for their propensity to maximize expected copies of themselves. utility maximization isn't an arbitrary axiom we impose on ethics; it follows directly from the logic of selection.
the evolutionary story here is illustrative rather than load-bearing. it doesn't prove that preference sovereignty is correct — it shows why the framework is consistent with everything we know about where preferences come from. the stronger argument is the conceptual one: you simply cannot call something unethical if people would choose it for themselves.
note what this does not imply. explaining the evolutionary origins of moral intuitions does not commit the genetic fallacy — doesn't claim that origins determine validity. evolution gave us depth perception, which tracks real spatial relationships. whether moral intuitions similarly track something "real" is exactly the question at issue, and the evolutionary genealogy alone doesn't settle it. but the case for moral anti-realism rests on independent grounds: the sheer diversity of moral intuitions across cultures, the lack of any plausible mechanism by which "moral facts" could causally influence our beliefs, and the fact that every putatively objective moral truth dissolves into either a subjective preference or an empirical claim about consequences when examined closely enough.
there are two fundamental types of apparent altruism in nature, and both are perfectly explained by genetic self-interest.
the first is kin altruism. when a bird tends its egg, we're witnessing genes protecting probable copies of themselves. this is why california ground squirrels sound alarm calls when they spot a predator, even at personal risk — the squirrels that hear the warning likely carry the same genes for alarm-calling. if the warning saves at least two close relatives, the "call" genes come out ahead even if the caller gets eaten. (veritasium has an excellent explanation of kin selection.)
the second is reciprocal altruism, which is fundamentally an instance of the iterated prisoner's dilemma. in our ancestral environment, we lived in small groups where we repeatedly interacted with the same people. helping others often meant helping yourself because they would likely reciprocate. our brains evolved to facilitate this cooperation through emotions like gratitude, guilt, and moral outrage.
recent work in evolutionary psychology has formalized this further. researchers like jean-baptiste andré and nicolas baumard have shown that human moral cognition is well-described by nash bargaining — maximizing the product of both parties' gains in any interaction. evolution built us to intuitively calculate not just our own benefit but others' benefits and opportunity costs, because people who signal cooperation get selected as partners while defectors get excluded.
the implications are profound: moral philosophy has largely proceeded in ignorance of evolutionary biology, debating the properties of moral intuitions without asking where those intuitions came from. once you take evolution seriously, deontological rules start to look like what they are — heuristics our brains use to approximate utility maximization, not discoveries of eternal truths.
in a conversation between dawkins and peter singer, dawkins made a brilliant observation. people still copulate while using prophylactics — from a purely genetic perspective, a complete waste of time. similarly, we have what he calls a "lust to be nice" — an irrational drive to help others even when there's no possibility of reciprocation. our evolved circuitry for empathy misfires in modern contexts, like cuckoo birds exploiting the "if there's an egg in your nest, sit on it" instinct of other species.
a common objection to moral subjectivism runs like this: "if ethics is purely subjective, why would alien civilizations independently develop laws against murder? doesn't that convergence prove morality is objective?" this is a confusion between two fundamentally different levels of the ethical landscape, and a concept from decision theory called instrumental convergence resolves it completely.
instrumental convergence is the observation that almost any agent with almost any terminal goals will converge on certain intermediate strategies — self-preservation, resource acquisition, maintaining social cooperation — because these are prerequisites for achieving virtually any goal whatsoever. the convergence is in the game theory, not in some moral fabric of the universe.
aliens would have laws against murder → therefore "murder is wrong" is an objective moral fact written into the structure of reality.
murder-prohibition is an instrumentally convergent solution to a coordination problem that any social species with subjective preferences will independently discover — just as any species that needs to cross rivers will independently discover bridges.
aliens having laws against murder no more proves that "murder is objectively wrong" than aliens building bridges proves that "bridges are objectively good". it proves that certain game-theoretic problems have stable solutions. the actual subjective preferences underneath — i like chocolate, you like vanilla, the alien likes electromagnetic radiation — remain irreducibly particular and unique. but when it comes to meta-level coordination mechanisms like fair division protocols, insurance products, or laws against murder, any social species will converge on similar solutions because being murdered prevents you from achieving any of your goals regardless of what those goals are.
and here is the devastating corollary that reveals the confusion: some people actually want to be killed. voluntary euthanasia, assisted suicide, martyrdom — if "murder is wrong" were an objective moral fact baked into the universe, there couldn't be exceptions. but there obviously are, because the preference against being killed is subjective. it's just nearly universal, which is exactly what instrumental convergence predicts. a coordination norm that serves 99.9% of agents' goals gets codified as a rule. the rule isn't the moral fact. the preferences underneath it are.
any moral rule that appears "universal" should be examined for whether it is genuinely intrinsic or merely instrumentally convergent. if there exist informed agents who would prefer its violation, it is a heuristic — not a truth.
consider an island inhabited entirely by psychopaths — individuals who are completely selfish and lack any capacity for empathy. you would still find laws against theft and murder, a court system, and even a system of taxation.
why? because these things benefit the psychopaths themselves. a law against theft reduces the risk of their property being stolen. a court system ensures disputes don't escalate into costly vendettas. a redistributive tax system that funds infrastructure, law enforcement, and defense protects them from external threats and prevents societal collapse.
psychopath island demonstrates that even in a society of purely self-interested individuals, cooperation and shared rules naturally arise. what affects others ultimately affects us. we don't support laws and social systems out of altruism but because they maximize our own well-being when we account for the fact that we live among other people who can also harm or benefit us.
this same logic explains why even rationally selfish individuals might support wealth redistribution. while a wealthy individual may oppose a social safety net if they are certain they'll never need it, such certainty is rare. and even when one's position is known, policies that reduce inequality and instability protect everyone from costly conflicts and societal collapse. redistributive systems are simply a more efficient way of avoiding outcomes like violent uprising. avoiding bloodshed and preserving stability benefits everyone, selfish or not.
this framework allows us to properly evaluate every competing ethical system. take rule ethics and virtue ethics. these aren't fundamental measures of what's ethical; they're heuristic tools that can be evaluated by how well they respect preference sovereignty — by how well they approximate what rational people would choose behind the veil.
any rule or virtue that could override informed preferences must be fundamentally flawed. virtues are effectively rules themselves, and both can be evaluated by their utility efficiency. our tendency to rely on such heuristics makes evolutionary sense; every second spent calculating optimal decisions is a second you could be eaten by a predator. brains evolved to balance decision quality against computational cost.
a rule like "do not lie" is intrinsically ethical. violations are wrong regardless of consequences.
a rule like "do not lie" is a computationally cheap heuristic that produces good outcomes most of the time. when it doesn't, the rule is wrong, not the situation.
this brings us to consequentialism, which gets closer to the truth by focusing on outcomes. of course people care about consequences. it doesn't matter if you were killed intentionally or accidentally; you're still dead. the only extent to which intent matters is practical: someone who intentionally kills might be more likely to be a repeat offender, and the correctional system should treat them accordingly. but even this requires nuance. a 90-year-old myopic, demented driver who accidentally plows into a school crosswalk may pose a greater ongoing threat than a violent 20-year-old whose fights have never resulted in more than bruises.
we can treat a malevolent person the same way we treat a defective household robot that goes on a killing spree. conscious intent is irrelevant; only outcomes matter. the key is evaluating any policy based on its total utility effects.
david hume identified what he called the is–ought problem: the idea that you can't derive statements about what "ought" to be from statements about what "is". but this entire problem dissolves once we realize there is no such thing as "ought" or "should" in any objective sense.
when someone says "you shouldn't kill people," they're not making a metaphysical claim about the universe. they're expressing a preference: "i would prefer a world where people don't kill each other". once we see this, is–ought isn't a deep philosophical puzzle. it's a category error that arises from treating preferences as though they were facts.
emotivism, the philosophical position that ethical statements are expressions of emotion rather than truth claims, gets tantalizingly close to this insight. in a conversation between sam harris and emotivist philosopher alex o'connor, o'connor argues that saying "murder is wrong" is essentially saying "boo, murder" — a command or expression of distaste rather than a truth claim. this is almost right, but the whole framework of emotivism becomes superfluous once we realize the basic truth it's gesturing at. it's like creating a special category called "hunger-ism" to explain why people eat food.
harris, for his part, struggles because he fails to distinguish between intrinsic and instrumental ethics. he correctly points out that there are objective facts about how to achieve certain states of well-being. but he misses that the underlying preferences for those states are purely subjective. the is–ought gap isn't bridged; it was never there. there are only preferences, and strategies for satisfying them.
some argue that ethics must come from god. but this leads to euthyphro's dilemma: either ethical principles are inherent and god is merely their messenger, or god arbitrarily decided on them. in either case, this has nothing to do with what we actually mean by good or bad. any divine command that violates preference sovereignty — that demands what informed people would reject even with full knowledge and hindsight — cannot be truly ethical. even if the universe or god declared that killing babies was good, this wouldn't change the fact that most humans are hardwired to be averse to infanticide. we would still shudder at the thought. the entire notion of objective ethics, whether derived from physics or a deity, is incongruous with everything about the human conception of ethics.
when people disagree about ethics, they are either disagreeing about empirical facts — about what policies will lead to better outcomes — or they are expressing genuinely different preferences shaped by their genes and experiences. they are never disagreeing about "moral facts," because there are no moral facts.
a common objection to utilitarian frameworks is that we cannot compare utility across individuals — that your suffering and my suffering are incommensurable. this objection dissolves upon examination.
at the most fundamental level, the universe is physics. suffering is a physical process — patterns of neural activation, neurochemical cascades, measurable and comparable in principle. the entire practice of medical triage rests on the fact that we can make rough interpersonal comparisons: a broken femur is worse than a paper cut, and everyone knows it.
but even setting aside the materialist argument, there is a more elegant resolution. imagine two bags containing notes of various currencies — yen, euros, dollars. you must pick a random bill from one bag, knowing only their average bill size. you would choose the bag with the higher average, even though the currencies are incommensurable. it does not matter that you cannot convert yen to euros at a precise rate. the choice works in expectation, because statistically, a randomly selected bill from the higher-average bag is more likely to be worth more to you.
this is harsanyi's insight applied to social welfare. if you are equally likely to be any person, your expected utility is the average utility across all people. you don't need a perfect conversion function across individuals. you need only the reasonable assumption — the same assumption underlying every restaurant review and product rating — that humans are similar enough that higher average utility is more likely to benefit a randomly selected individual. interpersonal utility comparison isn't a philosophical impossibility. it's something we do every day.
this framework — utilitarian contractualism, grounded in preference sovereignty — resolves the traditional paradoxes of utilitarianism. the repugnant conclusion, which suggests we should kill anyone less happy than average to raise the mean, doesn't arise because we're evaluating from the perspective of an individual behind the veil, not from a god's-eye view optimizing an abstract quantity. the mere addition paradox, which suggests we should fill the universe with minimally happy beings, similarly dissolves. we are simply rational to be as selfish as possible; it's just that this selfishness, filtered through uncertainty about our position, often manifests as apparent altruism.
as for future generations — they will have preferences when they exist. until then, the only preferences in play are those of existing people. existing people often care deeply about their descendants and the world they'll inhabit, so concern for the future is already captured as a preference of currently existing agents. there is no orphaned moral obligation floating in the void, waiting for future people to claim it.
in the end, ethics isn't about discovering eternal truths or divine commands. it's a biological phenomenon that emerges from genetic selection and manifests as subjective preferences. once we see this clearly, the traditional philosophical debates dissolve into straightforward questions: what do people actually want, and what are the most effective means of getting it? these questions are often enormously complex. but we are, at last, asking them in the right way.