Many thanks for feedback and insight from Kelly Anthis, Tobias Baumann, Jan Brauner, Max Carpendale, Sasha Cooper, Sandro Del Rivo, Michael Dello-Iacovo, Michael Dickens, Anthony DiGiovanni, Marius Hobbhahn, Ali Ladak, Simon Knutsson, Greg Lewis, Kelly McNamara, John Mori, Thomas Moynihan, Caleb Ontiveros, Sean Richardson, Zachary Rudolph, Manny Rutinel, Stefan Schubert, Michael St. Jules, Nell Watson, Peter Wildeford, and Miranda Zhang. This essay is in part an early draft of an upcoming book chapter on the topic, and I will add the citation here when it is available.
The prioritization of extinction risk reduction depends on an assumption that the expected value (EV) of human survival and interstellar colonization is highly positive. This essay lays out many arguments on the topic. This matters because, insofar as these arguments are compelling, we should shift some longtermist resources away from extinction risks. Extinction risks are the most extreme category of population risks, which are risks to the number of individuals in the long-term future. We could shift resources towards the other type of long-term risk, quality risks, which are risks to the moral value of individuals in the long-term future, such as whether they experience suffering or happiness. Promising approaches to improve the quality of the long-term future include some forms of AI safety, moral circle expansion, cooperative game theory, digital minds, and global priorities research. There may be substantial overlap with extinction risk reduction approaches, but in this case and in general, much more research is needed. I think that the effective altruism (EA) emphasis on existential risk could be replaced by a mindset of existential pragmatism: Rather than ensuring humanity expands its reach throughout the universe, we must ensure that the universe will be better for it.
I have spoken to many longtermist EAs about this crucial consideration, and for most of them, that was their first time explicitly considering the EV of human expansion. My sense is that many more are considering it now, and the community is growing more skeptical of highly positive EV as the correct estimate. I’m eager to hear more people’s thoughts on the all-things-considered estimate of EV, and I discuss the limited work done on this topic to date in the “Related Work” section.
In the following table, I lay out the object-level arguments on the EV of human expansion, and the rest of the essay details meta-considerations (e.g., option value). The table also includes the strongest supporting arguments that increase the evidential weight of their corresponding argument and the strongest counterarguments that reduce the weight. The arguments are not mutually exclusive and are merely intended as broad categories that reflect the most common and compelling arguments for at least some people (not necessarily me) on this topic. For example, Historical Progress and Value Through Intent have been intertwined insofar as humans intentionally create progress, so users of this table should be mindful that they do not overcount (e.g., double count) the same evidence. I handle this in my own thinking by splitting an overlapping piece of evidence among its categories in proportion to a rough sense of fit in those categories.
In the associated spreadsheet, I list my own subjective evidential weight scores where positive numbers indicate evidence for +EV and negative numbers indicate evidence for -EV. It is helpful to think through these arguments with different assignment and aggregation methods, such as linear or logarithmic scaling. With different methodologies to aggregate my own estimates or those of others, the total estimate is highly negative around 30% of the time, weakly negative 40%, and weakly positive 30%. It is almost never highly positive. I encourage people to make their own estimates, and all such estimates should be taken with golf balls of salt.
This is an atypical structure for an argumentative essay—laying out all the arguments, for and against, instead of laying out arguments for my position and rebutting the objections. I think that we should detach argumentation from evaluation. I’m not aiming for maximum persuasiveness. Indeed, the thrust of my critique is that EAs have failed to consider these arguments in such a systematic way, either neglecting the assumption entirely or selecting only a handful of the multitude of evidence and reason we have available. Overall, my current thinking (primarily an average of several aggregations of quantified estimates and Aumann updating on others’ views) is that the EV of human expansion is not highly positive. For this and other reasons, I prioritize improving the quality of the long-term future rather than increasing its expected population.
Arguments for Positive Expected Value (EV)
Humanity has achieved great progress in adding value and reducing disvalue, especially since the Enlightenment, such as through declines in violence, oppression, disease, and poverty. In particular, explicit human values seem to have progressed alongside human behavior, which may more robustly extend into the long-term future. Many scholars have written persuasively on this evidence, most famously Pinker (2012; 2018).
Value Through Intent
As technology increases, it arguably seems that (i) humans exert more of their intent on the universe, and (ii) humans tend to want good more than bad.
Value Through Evolution
Evolution (e.g., selection of genetic material over generations) selects for some forms of value and good moral attitudes, at least for oneself. Altruism and self-sacrifice can be selected for (e.g., in soldier ants), especially insofar as altruists care more about future generations. These forces may apply to the evolution of post-humans, AGIs, or minds created by an unaligned/rogue AGI. Christiano (2013) argues that longtermist values will be selected for over time, though it is unclear how this applies to non-temporal sorts of altruism.
Convergence of Patiency and Agency
Moral patients (i.e., beings who can have positive or negative value) may tend to be agents able to protect their own interests (e.g., to exit a situation when it is disvaluable). In other words, if more patients are agents, that's reason for optimism because such beings can use their power as agents to protect their moral interests as patients.
Agents tend to selfishly benefit from working together, such as in families, herds, villages, city-states, nations, and international trade. Such cooperation may protect the interests of future beings. For example, we could expect similar cooperation to evolve on alien worlds or in any evolutionary forces behind digital mind development.
Discoverable Moral Reality
If there are stance-independent moral facts (e.g., divine moral truth), then future beings may discover and implement them.
Arguments for Negative Expected Value (EV)
Humanity has a very bad track record of harming other humans as well as domestic and wild animals. The empirical evidence for disvalue seems clearest to people who have worked on human and animal rights issues because of salient firsthand experience with cruel and callous humans can be, particularly the unsettling “seriousness of suffering” (see the disturbing examples in Tomasik 2006 for an introduction). This is a topic we are very tempted to ignore, downplay, or rationalize (see Cohen 2001 and “Biases” below). The largest sources of disvalue today are factory farming and wild animal suffering (Anthis 2016b).
Disvalue Through Intent
Many human intentions cause harm to others, such as desires for power, status, and novelty. There are many plausible human interstellar endeavors that involve extensive disvalue in ways that may not be avoided with mere technological advancement (as, arguably, factory farming of animals will be avoided), such as “recreation (e.g. safaris, war games), a labor force (e.g. colonists to distant parts of the galaxy, construction workers), scientific experiments, threats, (e.g. threatening to create and torture beings that a rival cares about), revenge, justice, religion, or even pure sadism” (Anthis 2018b).
Disvalue Through Evolution
Evolution tends to produce more suffering than happiness, such as in wild animals.
Divergence of Patiency and Agency
Moral patients (i.e., beings who can have positive or negative value) may tend to not be agents able to protect their own interests (e.g., to exit a situation when it is disvaluable). This may be the most likely type of long-term society for various reasons (Anthis 2018).
Disvalue can be used as a threat, such as threatening to torture many simulated copies of another agent unless they hand over some of their interstellar resources.
Even people with great material resources can be very unhappy, including many of the best-off humans today, such as in the Easterlin Paradox (Plant 2022).
Arguments that May Increase or Decrease Expected Value (EV)
Conceptual Utility Asymmetry
A unit of disvalue (e.g., suffering) may be larger in absolute value from a unit of value (e.g., happiness). This could be an axiological asymmetry between some natural units of disvalue and value, or empirical (see below).
Empirical Utility Asymmetry
A unit of disvalue (e.g., suffering) may be larger in absolute value from a unit of value (e.g., happiness), or vice versa. This could be an axiological asymmetry (see above) or an empirical asymmetry, such as between per-joule units of disvalue and value or between dolorium and hedonium. As described in Anthis and Paez (2021), when we imagine the largest values and disvalues (e.g., How many days of intense pleasure would you trade for intense pain?), the disvalues tend to seem larger.
This argument overlaps with most other arguments in this table, so users should be cautious about overcounting the same evidence.
Disvalue may be simpler and thus easier to produce and more common than value. This is a variation of the Anna Karenina principle that failure tends to come from any of a number of factors, which was posed at least as early as Aristotle’s Nichomachean Ethics. Value, on the other hand, may be more complex, a view favored by some in AI safety, such as Yudkowsky (2007). The opposite argument may obtain, though I have never heard anyone believe that claim.
Bringing value-positive people into existence may be less valuable than adding value to existing people, but bringing negative-value people into existence may not be as different from adding disvalue to existing people—or vice versa.
EV of the Counterfactual
If humans do not expand, perhaps because we die off or stay on Earth, what will the EV of the universe be? This counterfactual EV can include wild animals on Earth (including those who could evolve a human-like society after many years of a humanless Earth if humans die off), alien civilizations (who may be very different from humans, such as evolving more like insects or solitary predators), value or disvalue in the universe as we know it (e.g., stars being born and dying, fundamental physics in which particles are attracted and repelled by each other, Boltzmann brains), parallel universes (whom we may otherwise affect, for better or worse, through acausal interactions or as-yet-undiscovered causal mechanisms), and simulators (if we live in a simulation).
Human expansion may lead to increases or decreases in the EV of these groups. Depending on what sort of expansion we’re considering, such as if we curtail the +EV or -EV expansion of alien civilizations or if we attack or rescue them, this counterfactual may also include unaligned AI systems that kill humans or prevent our expansion but expand themselves, such as by paperclipping the universe (which may involve many paperclipping drones and von Neumann probes). The EV of aligned versus unaligned AI systems has been discussed in Tomasik (2015) and Gloor (2018) from a total-suffering perspective, and it remains extremely unclear.
The Nature of Digital Minds, People, and Sentience
While there are relatively clear advantages to digital minds over biological minds, such as the ability to self-modify, copy, and travel long distances, it is much less clear what life for digital minds would be like. For example, we do not know how much protection digital sentience will have over their own experiences (e.g., cryptographic security), how useful it will be to have many small minds versus few large minds, and how useful nesting of minds within each other will be. There are also many normative questions regarding the value of these different minds, such as group entities where the subunits are more distinct than subunits of a biological brain (e.g., What if a China brain were implemented in which there were tiny humans inside of each neuron in a normal human brain, passing around neurotransmitters?). Digital minds may also make up value-optimal structures (e.g., dolorium and hedonium). See “Scaling of Value and Disvalue” below.
This argument overlaps with most other arguments in this table, so users should be cautious about overcounting the same evidence.
Life Despite Suffering
Even in dire scenarios, many humans report a preference to live or have lived over to die or to have never been born. This may suggest underappreciated value in even apparently disvaluable lives (e.g., aspects not covered in current moral frameworks) or, as with many psychological arguments, it can be viewed as a bias of overemphasizing value in our evaluations (e.g., just world bias, fear of death). See “Biases” below.
The Nature of Value Refinement
Many plausible trajectories of the long-term future involve some sort of value refinement, such as coherent extrapolated volition (Yudkowsky 2004), indirect normativity (Bostrom 2014), and long reflection (Greaves and MacAskill 2017). The effect of such processes on values depends on a range of questions such as: Whose values are refined? How important are value inputs at the beginning of refinement (i.e., to what extent are they locked in)? And what sort of moral considerations (e.g., thought experiments) has humanity not yet considered but may consider in such processes?
Insofar as one believes that AGI will be instrumental in humanity’s future, even an AGI that is aligned in some way may not be good. It depends on what values are aligned with what aspect of the AGI. Especially concerning is that the AGI may only be aligned with human values and interests, only caring about nonhuman beings to the extent humans do, which may not be sufficient for net positive outcomes.
Tomasik (2013) covers many of these arguments, e.g., “Very likely our values will be lost to entropy or Darwinian forces beyond our control. However, there's some chance that we'll create a singleton in the next few centuries that includes goal-preservation mechanisms allowing our values to be ‘locked in’ indefinitely. Even absent a singleton, as long as the vastness of space allows for distinct regions to execute on their own values without take-over by other powers, then we don't even need a singleton; we just need goal-preservation mechanisms,” as does Tomasik (2017).
Scaling of Value and Disvalue
Sources of value and disvalue vary in magnitude, and some sources seem more likely to be value-optimized, such as dolorium as optimal suffering per unit of resource (e.g., joules of energy) or hedonium as optimal happiness. Forces such as human intent, resource accumulation, evolution, and runaway AI seem to be particularly optimizing. This consideration also depends on how values and optimization are viewed, such as what it means to optimize a layperson’s intuitionist morality.
The more one cares about this sort of utilitronium or value-optimized sources (empirically or conceptually), the more such sources matter. One can also have different evaluations of dolorium and hedonium, such as whether they have a ratio of -1:1, -100:1, etc. (see Shulman 2012 and Knutsson 2017 for some discussion). This can also affect trade-offs within closer-to-zero ranges of value and disvalue.
More broadly, the gradient of possible positive and negative futures could make large differences in the best approach to reducing quality risks, such as jumping from one step of EV to the one above it (e.g., futures where digital sentiences are not seen as people to futures where they are). The larger the jump between steps, the more one should prioritize even small chances of moving up a step (e.g., avoiding an existential risk).
This argument overlaps with most other arguments in this table, so users should be cautious about overcounting the same evidence.
EV of Human Expansion after Near-Extinction or Other Events
While humans may survive and colonize the cosmos along what would seem a similar trajectory to our current one, there may be major events, such as an apocalyptic near-extinction event in which the human population is decimated but recovers. It is very unclear how such events would affect the EV of human expansion. For example, post-near-extinction humans may have a newfound sense of global stewardship and camaraderie or they may have a newfound sense of resource scarcity and fear of each other. Similarly, humans after radical technology change such as life extension may have very different values, such as a resistance to changing their values the way new generations of humans do. Near-extinction events may also select for certain demographics and ideologies.
The Zero Point of Value
Each argument for +EV and -EV depends on where one places the zero point of value. Some scenarios, such as an unaligned AI that carpets the universe with sentience that has a very limited amount of value (e.g., muzak and potatoes) and very limited amount of disvalue (e.g., boredom), may teeter on where one places the zero point between +EV and -EV.
Some of our successors might live lives and create worlds that, though failing to justify past suffering, would give us all, including some of those who have suffered, reasons to be glad that the Universe exists. ⸻ Derek Parfit (2017)
The field of existential risk has intellectual roots as deep as human history in notions of “apocalypse” such as the end of the Mayan calendar. Thomas Moynihan (2020) distinguishes apocalypse as having a sense to it or a justification, such as the actions of a supernatural deity, while “extinction” entails “the ending of sense” entirely. This notion of human extinction is traced back only to the Enlightenment beginning in the 1600s, and its most well-known articulation in the 21st century is under the category of existential risks (also known as x-risks), a term coined in 2002 by philosopher Nick Bostrom for risks “where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.”
The most famous essay on existential risk is “Astronomical Waste” (Bostrom 2003), in which Bostrom argues that if humanizes could colonize the Virgo supercluster, the massive concentration of galaxies that includes our own Milky Way and 47,000 of its neighbors, then we could sustain approximately 1038 human beings, an intuitively inconceivably large number. Bostrom argues that the priority of utilitarians should be to reduce existential risk and ensure we seize this cosmic endowment, though the leap from the importance of the long-term future to existential risk reduction is contentious (e.g., Beckstead 2013b). The field of existential risk studies has risen at pace with the growth of effective altruism (EA), with a number of seminal works summarizing and advancing the field (Matheny 2007; Bostrom 2012; Beckstead 2013a; Bostrom 2013; 2014; Tegmark 2017; Russell 2019; Moynihan 2020; Ord 2020; MacAskill forthcoming).
Among existential risks, EAs have largely focused on population risks (particularly extinction risks); the term “x-risk,” which canonically refers to existential risk, is often interpreted as extinction risk (see Aird 2020a). A critical assumption underlying this focus has been that the expected value of humanity’s survival and interstellar colonization is very high.. This assumption largely goes unstated, but it was briefly acknowledged in Beckstead (2013a):
Is the expected value of the future negative? Some serious people—including Parfit (2011, Volume 2, chapter 36), Williams (2006), and Schopenhauer (1942)—have wondered whether all of the suffering and injustice in the world outweigh all of the good that we've had. I tend to think that our history has been worth it, that human well-being has increased for centuries, and that the expected value of the future is positive. But this is an open question, and stronger arguments pointing in either direction would be welcome.
Christiano (2013) asked, “Why might the future be good?” though, as I understood it, that essay did not mention the possibility of a negative future. I had also implicitly accepted the assumption of a good future until 2014, when I thought through the evidence and decided to prioritize moral circle expansion at the intersection of animal advocacy and longtermism (Anthis 2014). I brought it up on the old EA Forum in Anthis (2016a), and West (2017) detailed a version of the “Value Through Intent” argument. I also remember extensive Facebook threads around this time, though I do not have links to share. I finally wrote up my thoughts on the topic in detail in Anthis (2018b) as part of a prioritization argument for moral circle expansion over decreasing extinction risk through AI alignment, and this essay is a follow-up to and refinement of those ideas.
Later in 2018, Brauner and Grosse-Holz (2018) published an EA Forum essay arguing that the expected value of extinction risk reduction is positive. In my opinion, it failed to consider many of the arguments on the topic, as discussed in EA Forum comments and a rebuttal, also on the EA Forum, DiGiovanni (2021). There is also a chapter in MacAskill (forthcoming) covering similar ground as Brauner and Grosse-Holz, with similar arguments missing, in my opinion. Overall, these writings primarily focus on three arguments:
These are three important considerations, but as I argued above (and at least some of the authors would disagree), they cover only a small portion of the total landscape of evidence and reason that we have available for estimating the EV of human expansion.
Overall, I think the arguments against a highly positive EV of human expansion have been the most important blindspot of the EA community to date, and it is the only major dissenting opinion I have with the core of the EA memeplex. I would guess over 90% of longtermist EAs with whom I have raised this topic have never considered it before, despite acknowledging during our conversation that the expected value being highly positive is a crucial assumption for prioritizing extinction risk and that it is on shaky ground—if not deciding that it is altogether mistaken. While examining this assumption and deciding that the far future is not highly positive would not completely overhaul longtermist EA priorities, it would significantly change our focus. In particular, we should shift resources away from extinction risk and towards quality risks, as well as towards global priorities research to better understand this and other crucial considerations. I would be eager for more discussion of this topic, and the sort of evidence I expect to most change my mind is the cooperative game theory research done by CLR, the Center on Human-Compatible AI (CHAI), and others in AI safety; the moral circle expansion and digital minds research done by Sentience Institute (SI), Future of Humanity Institute (FHI), and others in longtermism and AI safety; and all sorts of exploration of concrete scenarios similar to The Age of Em (Hanson 2016) and AI takeoff “training stories” (Hubinger 2021). I expect fewer updates from more conceptual discourse like the works cited above on the EA Forum and this essay, but I still see them as valuable contributions. See further discussion in the “Future Research on the EV of Human Expansion” subsection below.
I separate the moral value of the long-term future into two factors: population, the number of individuals at each point in time, and quality, the moral value of each individual’s existence at each point in time. The moral value of the long-term future is thus the double sum of quality across individuals across time. Risks to the number of individuals (living sufficiently positive lives) are population risks, and risks to the quality of each individual life are quality risks.
Extinction risks are a particular sort of population risk, those that would “annihilate Earth-originating intelligent life,” though I would also include threats towards populations of non-Earth-originating and non-intelligent (and perhaps even non-living) individuals who matter morally, and I get the sense that others have also favored this more inclusive definition. Non-existential population risks could be a permanent halving of the population or a delay of one-third the universe’s remaining lifetime in humanity’s interstellar expansion, though there is no consensus on where exactly the cutoff is between existential and non-existential, though there does seem to be consensus that extinction of humans (with no creation of post-humans, such as whole brain emulations) is existential.
Quality risks are risks to the moral value of individuals who may exist in the long-term future. Existential quality risks are those that “permanently and drastically curtail its potential” moral value, such as all individuals being moved from positive to zero or positive to negative value. Non-existential quality risks may include one-tenth of the future population dropping from highly positive to barely positive quality, one-fourth of the future population dropping from barely positive to barely negative quality, and so on. Again, this may be better understood as a spectrum of existentiality, rather than two neatly separated categories, because it is unclear at what point potential is permanently and drastically curtailed. Quality risks include suffering risks (also known as s-risks), “risks of events that bring about suffering in cosmically significant amounts” (Althaus and Gloor 2016; Tomasik 2011), which was noted as “weirdly sidelined” by total utilitarians in Rowe’s (2022) “Critiques of EA that I Want to Read.”
These categories are not meant to coincide with the existential risk taxonomies of Bostrom (2002) (bangs, crunches, shrieks, whimpers) or Bostrom (2013) (human extinction, permanent stagnation, flawed realization, subsequent ruination), in part because those are worded in terms of positive potential rather than an aggregation of positive and negative outcomes. However, one can reasonably view some of those categories (e.g., shrieks and failed realizations) as including some positive, zero, or negative quality trajectories because they have a failed realization of positive potential. Aird (2020b) has some useful Venn diagrams of the overlaps of some long-term risks.
The term “trajectory change” has variously been used as a category that, from my understanding, includes the mitigation or exacerbation of all of the risks above, such as Beckstead’s (2013a) definition of trajectory changes as actions that “slightly or significantly alter the world’s development trajectory.”
Explosive forces, energy, materials, machinery will be available upon a scale which can annihilate whole nations. Despotisms and tyrannies will be able to prescribe the lives and even the wishes of their subjects in a manner never known since time began. If to these tremendous and awful powers is added the pitiless sub-human wickedness which we now see embodied in one of the most powerful reigning governments, who shall say that the world itself will not be wrecked, or indeed that it ought not to be wrecked? There are nightmares of the future from which a fortunate collision with some wandering star, reducing the earth to incandescent gas, might be a merciful deliverance. ⸻ Winston Churchill (1931)
Under the standard definition of utility, you should take actions with positive expected value (EV), not take actions with negative EV, and it doesn’t matter if you take actions with zero EV. However, prioritization is plausibly much more complicated than this. Is the EV of the action higher than counterfactual actions? Is EV the right approach for imperfect individual decision-makers? Is EV the right approach for a group of people working together? What is the track record for EV decision-making relative to other approaches? Etc. There are many different views that a reasonable person can come to on how best to navigate these conceptual and empirical questions, but I believe that the EV needs to be highly positive to prioritize extinction risks.
As I discussed in Anthis (2018b), I think an intuitive but mistaken argument on this topic is that if we are uncertain about the EV or expect it is close to zero, we should favor reducing extinction risk to preserve option value. Fortunately I have heard this argument much less frequently in recent years, but it is still in a drop-down section of 80,000 Hours’ “The Case for Reducing Existential Risks.” This reasoning seems mistaken for two reasons:
First, option value is only good insofar as we have control over the exercising of future options or expect those who have control to exercise it well. In the course of human civilization, even the totality of the EA movement has relatively little control over humanity’s actions—though arguably a lot more than most measures would make it appear due to our strategic approach, particularly targeting high-leverage domains such as advanced AI—and it is unclear that EA will retain even this modest level of control. The argument that option value is good because our descendants will use it well is circular because the case against extinction risk reduction is primarily focused on humanity not using its options well (i.e., humanity not using its options well is both the premise and the conclusion). An argument that relies on the claim that is being contested is very limited. However, we have more control if one thinks extinction timelines are very short and, if one survives, they and their colleagues will have substantial control over humanity’s actions; we also may be optimistic about human action despite being pessimistic about the future if we think nonhuman forces such as aliens and evolution are the decisive drivers of long-term disvalue.
Second, continued human existence very plausibly limits option value in similar ways to nonexistence. Whether we are in a time of perils or not, there is no easy “off switch” for which humanity can decide to let itself go extinct, especially with advanced technologies (e.g., spreading out through von Neumann probes). It is not as if we can or should reduce extinction risk in the 2020s then easily raise it in the 2030s based on further global priorities research. Still, there is a greater variety of non-extinct than extinct civilizations, so insofar as we want to preserve a wide future of possibilities, that is reason to favor extinction risk reduction.
Instead of option value, the more important considerations to me are (i) that we have other promising options with high EV such that extinction risk reduction needs to be more positive than these other options in order to justify prioritization and (ii) that we should have some risk aversion and sandboxing of EV estimates such that we should sometimes treat close-to-zero values as zero. It’s also unclear how to weigh the totality of evidence here, but insofar as it is weak and speculative—as with most questions about the long-term future—one may pull their estimate towards a prior, though it is unclear what that prior should be. If one thinks zero is a particularly common answer in an appropriate reference class, that could be reasonable, but it depends on many factors beyond the scope of this essay.
If we are allocating resources to both population and quality risks, one could argue that we should spend resources on population risks first because the quality of individual lives only matters insofar as those individuals exist. The opposite is true as well: For example, if a quality of zero were locked in for the long-term future, then increasing or decreasing the population would have no moral value or disvalue. Outcomes of exactly zero quality might seem less likely than outcomes of exactly zero population, though this depends on the “EV of the Counterfactual” (e.g., life originating on other planets) and is more contentious for close-to-zero quantities.
As with option value, the future depends on the past, so for every year that passes, the future has fewer degrees of freedom. This is most apparent in the development of advanced AI, in which its development may hinge on early-stage choices, such as selecting training regimes that are more likely to lead to its alignment with its designers’ value or selecting those values with which to align the AI (i.e., value lock-in). In general, there are strong arguments for time-sensitivity for both types of trajectory change, especially with advanced technology—also life extension and von Neumann probes in particular.
To our amazement we suddenly exist, after having for countless millennia not existed; in a short while we will again not exist, also for countless millennia. That cannot be right, says the heart. ⸻ Arthur Schopenhauer (1818, translation 2008)
We could be biased towards optimism or pessimism. Among the demographics of EA, I think that we should probably be more worried about bias towards optimism. Extreme suffering, as described by Tomasik (2006), is a topic that people are very tempted to ignore, downplay, or rationalize (Cohen 2001). In general, the prospect of future dystopias is uncomfortable and unpleasant to think about. Most of us dread the possibility that our legacy in the universe could be a tragic one, and such a gloomy outlook does not resonate with favored trends of techno-optimism or the heroic notion of saving humanity from extinction. However, the sign of this bias can be flipped, such as in social groups where pessimism and doomsaying is in vogue. My experience is that people in EA and longtermism tend to be much more ready to dismiss pessimism and suffering-focused ethics than optimism and happiness-focused ethics, especially based on superficial claims that pessimism is driven by the personal dispositions and biases of its proponents. For a more detailed discussion on biases related to (not) prioritizing suffering, see Vinding (2020).
Additionally, given the default approach to longtermism and existential risk is to reduce extinction risk, and there has already been over a decade of focus on that, we should be very concerned about status quo bias and the incentive structure of EA as it is today. This is one reason to encourage self-critique as individuals and as a community, such as with the Criticism and Red-Teaming Contest. In fact, that contest is one reason I wrote this essay, though I was already committed to writing a book chapter on this topic before the contest was announced.
I think we should focus more on the object-level arguments than on biases, but given how our answer to this question hinges on our intuitive estimates of extremely complicated figures, bias is probably more important than normal. I further discussed the merits of considering bias and listed many possible biases towards both moral circle expansion and reducing extinction risk through AI alignment in Anthis (2018b).
One conceptual challenge is that a tendency towards pessimism or optimism could either be accounted for as a bias that needs correction or as a fact about the relative magnitudes of value and disvalue. On one hand, we might say that the importance of disvalue in evolution (e.g. the constant danger of one misstep curtailing all future spread of one’s genes) has made us care more about suffering than we should. On the other hand, we might say that it is a fact about how disvalue tends to be more common, subjectively worse, or objectively worse in the universe.
People ask me to predict the future, when all I want to do is prevent it. Better yet, build it. Predicting the future is much too easy, anyway. You look at the people around you, the street you stand on, the visible air you breathe, and predict more of the same. To hell with more. I want better. ⸻ Ray Bradbury (1979)
I present a more detailed argument for the prioritization of quality risks (particularly moral circle expansion) over extinction risk reduction (particularly through certain sorts of AI research) in Anthis (2018), but here I will briefly note some thoughts on importance, tractability, and neglectedness. Two related EA Forum posts are “Cause Prioritization for Downside-Focused Value Systems” (Gloor 2018) and “Reducing Long-Term Risks from Malevolent Actors” (Althaus and Baumann 2020). Additionally, at this early stage of the longtermist movement, the top priorities for population and quality risk may largely intersect. Both issues suggest foundational research of topics such as the nature of AI control and likely trajectories of the long-term future, community-building of thoughtful do-gooders, and field-building of institutional infrastructure to use for steering the long-term future.
One important application of the EV of human expansion is to the “importance” of population and quality risks. Importance can be operationalized as the good done if the entire cause succeeded in solving its corresponding problem, such as the good done by eliminating or substantially reducing extinction risk, which is effectively zero if the EV of human expansion is zero and effectively negative if the EV of human expansion is negative.
The importance of quality risk reduction is clearer, in the sense that the difference in quality between possible futures is clearer than the difference in extinction and non-extinction, and larger, in the sense that while population risk entails only the range of zero-to-positive difference between human extinction and non-extinction (or population risk between zero population and some positive number of individuals) across quality risk entails the difference between the best quality humans could engender and the worst, across all possible population sizes. This is arguably a weakness of the framework because we could categorize the quality risk cause area as smaller in importance (say, an increase of 1 trillion utils, i.e., units of goodness), and it would tend to become more tractable as we narrow the category.
The tractability difference between population and quality risk seems the least clear of the three criteria. My general approach is thinking through the most likely “theories of change” or paths to impact and assessing them step-by-step. For example, one commonly discussed extinction risk reduction path to impact is “agent foundations,” building mathematical frameworks and formally proving claims about the behavior of intelligent agents, which would then allow us to build advanced AI systems more likely to do what we tell them to do, and then using these frameworks to build AGI or persuading the builders of AGI to use them. Quality-risk-focused AI safety strategies may be more focused on the outer alignment problem, ensuring that an AI’s objective is aligned with the right values, rather than just the inner alignment problem, ensuring that all actions of the AI are aligned with the objective. Also, we can influence quality by steering the “direction” or “speed” of the long-term future, approaches with potentially very different impact, hinging on factors such as the distribution of likely futures across value and likelihood (e.g., Anthis 2018c; Anthis and Paez 2021).
One argument that I often hear on the tractability of trajectory changes is that changes need to “stick” or “persist” over long periods. It is true that there needs to be a persistent change in the expected value (i.e., the random variable or time series regime of value in the future), but I frequently hear the claim that there needs to be a persistent change in the realization of that value. For example, if we successfully broker a peace deal between great powers, neither the peace deal itself nor any other particular change in the world has to persist in order for this to have high long-term impact. The series of values itself can have arbitrarily large variance, such as it being very likely that the peace deal is broken within a decade.
For a sort of change to be intractable, it needs to not just lack persistence, but to rubber band (i.e., create opposite-sign effects) back to its counterfactual. For example, if brokering a peace deal causes an equal and opposite reaction of anti-peace efforts, then that trajectory change is intractable. Moreover, we should not only consider rubber banding but dominoing (i.e., create same-sign effects), perhaps because of how this peace deal inspires other great powers to follow suit even if this particular deal is broken. There is much of this potential energy in the world waiting to be unlocked by thoughtful actors.
The tractability of trajectory change has been the subject of research at Sentience Institute, including our historical case studies and “Harris’ (2019)” How Tractable Is Changing the Course of History?”
The neglectedness difference between population and quality risk seems the most clear. There are far more EAs and longtermists working explicitly on population risks than on quality risks (i.e., risks to the moral value of individuals in the long-term future). Two nuances for this claim are first that it may not be true for other relevant comparisons: For example, many people in the world are trying to change social institutions, such as different sides of the political spectrum trying to pull public opinion towards their end of the spectrum. This group seems much larger than people focused explicitly on extinction risks, and there are many other relevant reference classes. Second, it is not entirely clear whether extinction risk reduction and quality risk reduction face higher or lower returns to being less neglected (i.e., more crowded). It may be that so few people are focused on quality risks that marginal returns are actually lower than they would be if there were more people working on them (i.e., increasing returns).
Because most events in the long-term future entail some sort of value or disvalue, most new information from longtermist research provides some evidence on the EV of human expansion. As stated above, I’m particularly excited about cooperative game theory research (e.g., CLR, CHAI), moral circle expansion and digital minds research (e.g., SI, FHI), and exploration of concrete trajectories (e.g., Hanson 2016; Hubinger 2021). I’m relatively less excited (though still excited!), on the margin, by entirely armchair taxonomization and argumentation like that in this essay. That includes research on axiological asymmetries, such as more debate on suffering-focused ethics or population ethics, though these can be more useful for other topics and perhaps other people considering this question. My lack of enthusiasm is largely because in the past 8 years of having this view that the EV of human expansion is not highly positive, very little of the new evidence has come from armchair reasoning and argumentation, despite that being more common (although what sort of research is most common depends on where one draws the boundaries because, again, so much research has implications for EV).
In general, this is such an encompassing, big-picture topic that empirical data is extremely limited relative to scope, and it seems necessary to rely on qualitative intuitions, quantitative intuitions, or back-of-the-envelope calculations a la Dickens’ (2016) “A Complete Quantitative Model for Cause Selection” or Tarsney’s (2022) “The Epistemic Challenge to Longtermism.” I would like to see a more systematic survey of such intuitions, ideally from 5-30 people who have read through this essay and the “Related Work.” Ideally these would be stated as credible intervals or similar probability distributions, such that we can more easily quantify uncertainty in the overall estimate. As with all topics, I think we should Aumann update on each other’s views, a process in which I split the difference between my belief and someone else’s even if I do not know all the prior and posterior evidence on which they base their view. Of course, this is messy in the real world, for instance because we presumably should account not just for the few people with whom we happen to know their beliefs, but also for our expectations of the many people who also have a belief and even hypothetical people who could have a belief (e.g., unbiased versions of real-world people). It is also unclear whether normative (e.g., moral) views constitute the sort of belief that should be updated in this way, such as between people with fundamentally different value trade-offs between happiness and suffering. There are cooperative reasons to deeply account for others’ views, and one may choose to account for moral uncertainty. In general, I would be very interested in a survey that just asks for numbers like those in the table above and allows us to aggregate those beliefs in a variety of ways; a more detailed case for how that aggregation should work is beyond the scope of this essay.
If you are persuaded by the arguments that the expected value of human expansion is not highly positive or that we should prioritize the quality of the long-term future, promising approaches include research, field-building, and community-building, such as at the Center on Long-Term Risk, Center for Reducing Suffering, Future of Humanity Institute, Global Catastrophic Risk Institute, Legal Priorities Project, and Open Philanthropy, and Sentience Institute, as well as working at other AI safety and EA organizations with an eye towards ensuring that, if we survive, the universe is better for it. Some of this work has substantial room for more funding, and related jobs can be found at these organizations’ websites and on the 80,000 Hours job board.
Aird, Michael. 2020a. “Clarifying Existential Risks and Existential Catastrophes.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/skPFH8LxGdKQsTkJy/clarifying-existential-risks-and-existential-catastrophes.
———. 2020b. “Venn Diagrams of Existential, Global, and Suffering Catastrophes.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/AJbZ2hHR4bmeZKznG/venn-diagrams-of-existential-global-and-suffering.
Alighieri, Dante. 1307. Convivo. https://www.loebclassics.com/view/marcus_tullius_cicero-de_finibus_bonorum_et_malorum/1914/pb_LCL040.41.xml.
Althaus, David, and Tobias Baumann. 2020. “Reducing Long-Term Risks from Malevolent Actors.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/LpkXtFXdsRd4rG8Kb/reducing-long-term-risks-from-malevolent-actors.
Althaus, David, and Lukas Gloor. 2016. “Reducing Risks of Astronomical Suffering: A Neglected Priority.” Center on Long-Term Risk. https://longtermrisk.org/reducing-risks-of-astronomical-suffering-a-neglected-priority/.
Anthis, Jacy Reese. 2014. “How Do We Reliably Impact the Far Future?” The Best We Can. https://web.archive.org/web/20151106103159/http://thebestwecan.org/2014/07/20/how-do-we-reliably-impact-the-far-future/.
———. 2016a. “Some Considerations for Different Ways to Reduce X-Risk.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/NExT987oY5GbYkTiE/some-considerations-for-different-ways-to-reduce-x-risk.
———. 2016b. “Why Animals Matter for Effective Altruism.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/ch5fq73AFn2Q72AMQ/why-animals-matter-for-effective-altruism.
———. 2018a. The End of Animal Farming: How Scientists, Entrepreneurs, and Activists Are Building an Animal-Free Food System. Boston: Beacon Press.
———. 2018b. “Why I Prioritize Moral Circle Expansion Over Artificial Intelligence Alignment.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/BY8gXSpGijypbGitT/why-i-prioritize-moral-circle-expansion-over-artificial.
———. 2018c. “Animals and the Far Future.” EAGxAustralia. https://www..com/watch?v=NTV81NZSuKw.
———. 2022. “Consciousness Semanticism: A Precise Eliminativist Theory of Consciousness.” In Biologically Inspired Cognitive Architectures 2021, edited by Valentin V. Klimov and David J. Kelley, 1032:20–41. Studies in Computational Intelligence. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-96993-6_3.
Anthis, Jacy Reese, and Eze Paez. 2021. “Moral Circle Expansion: A Promising Strategy to Impact the Far Future.” Futures 130: 102756. https://doi.org/10.1016/j.futures.2021.102756.
Askell, Amanda, Yuntao Bai, Anna Chen, et al. “A General Language Assistant as a Laboratory for Alignment.” ArXiv. https://arxiv.org/abs/2112.00861.
Beckstead, Nick. 2013a. “On the Overwhelming Importance of Shaping the Far Future.” Rutgers University. https://doi.org/10.7282/T35M649T.
———. 2013b. “A Proposed Adjustment to the Astronomical Waste Argument.” effectivealtruism.org. https://www.effectivealtruism.org/articles/a-proposed-adjustment-to-the-astronomical-waste-argument-nick-beckstead.
Benatar, David. 2006. Better Never to Have Been: The Harm of Coming into Existence. New York: Clarendon Press.
Bostrom, Nick. 2002. “Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards.” Journal of Evolution and Technology 9. https://ora.ox.ac.uk/objects/uuid:827452c3-fcba-41b8-86b0-407293e6617c.
———. 2003. “Astronomical Waste: The Opportunity Cost of Delayed Technological Development.” Utilitas 15 (3): 308–14. https://doi.org/10.1017/S0953820800004076.
———. 2003. “Moral uncertainty – towards a solution?” Overcoming Bias. https://www.overcomingbias.com/2009/01/moral-uncertainty-towards-a-solution.html.
———. 2012. Global Catastrophic Risks. Repr. Oxford: Oxford University Press.
———. 2013. “Existential Risk Prevention as Global Priority.” Global Policy 4 (1): 15–31. https://doi.org/10.1111/1758-5899.12002.
———. 2014. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press.
Bradbury, Ray. 1979. “Beyond 1984: The People Machines.” In Yestermorrow: Obvious Answers to Impossible Futures.
Brauner, Jan M., and Friederike M. Grosse-Holz. 2018. “The Expected Value of Extinction Risk Reduction Is Positive.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/NfkEqssr7qDazTquW/the-expected-value-of-extinction-risk-reduction-is-positive?fbclid=IwAR2Si8qdOEqXdPujDfv6gDGLaTdevs4Tb_CALW0D2MHUC4Ot9evEAoem3Gw.
Christiano, Paul. 2013. “Why Might the Future Be Good?” Rational Altruist. https://rationalaltruist.com/2013/02/27/why-will-they-be-happy/.
Churchill, Winston. 1931. “Fifty Years Hence. https://www.nationalchurchillmuseum.org/fifty-years-hence.html.
Cowen, Tyler. 2018. Stubborn Attachments: A Vision for a Society of Free, Prosperous, and Responsible Individuals.
Crootof, Rebecca. 2019. “'Cyborg Justice' and the Risk of Technological-Legal Lock-In.” 119 Columbia Law Review Forum 233.
Deutsch, David. 2011. The Beginning of Infinity: Explanations That Transform the World. London: Allen Lane.
Dickens, Michael. 2016. “A Complete Quantitative Model for Cause Selection.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/fogJKYXvqzkr9KCud/a-complete-quantitative-model-for-cause-selection.
DiGiovanni, Anthony. 2021. “A Longtermist Critique of ‘The Expected Value of Extinction Risk Reduction Is Positive.’” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/RkPK8rWigSAybgGPe/a-longtermist-critique-of-the-expected-value-of-extinction-2.
Gloor, Lukas. 2017. “Tranquilism.” Center on Long-Term Risk. https://longtermrisk.org/tranquilism/.
———. 2018. “Cause Prioritization for Downside-Focused Value Systems.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/225Aq4P4jFPoWBrb5/cause-prioritization-for-downside-focused-value-systems.
Greaves, Hilary, and Will MacAskill. 2017. “A Research Agenda for the Global Priorities Institute.” https://globalprioritiesinstitute.org/wp-content/uploads/GPI-Research-Agenda-December-2017.pdf.
Hanson, Robin. 2016. The Age of Em: Work, Love, and Life When Robots Rule the Earth. First Edition. Oxford: Oxford University Press.
Harris, Jamie. 2019. “How Tractable Is Changing the Course of History?” Sentience Institute. http://www.sentienceinstitute.org/blog/how-tractable-is-changing-the-course-of-history.
Hobbhan, Marius, Eric Landgrebe, and Beth Barnes. “Reflection Mechanisms as an Alignment target: A Survey.” LessWrong. https://www.lesswrong.com/posts/XyBWkoaqfnuEyNWXi/reflection-mechanisms-as-an-alignment-target-a-survey-1.
Hubinger, Evan. 2021. “How Do We Become Confident in the Safety of a Machine Learning System? - AI Alignment Forum.” AI Alignment Forum. https://www.alignmentforum.org/posts/FDJnZt8Ks2djouQTZ/how-do-we-become-confident-in-the-safety-of-a-machine.
Knutsson, Simon. 2017. “Reply to Shulman’s ‘Are Pain and Pleasure Equally Energy-Efficient?’” http://www.simonknutsson.com/reply-to-shulmans-are-pain-and-pleasure-equally-energy-efficient/.
MacAskill, William. Forthcoming (2022). What We Owe the Future: A Million-Year View. New York: Basic Books.
Matheny, Jason G. 2007. “Reducing the Risk of Human Extinction.” Risk Analysis 27 (5): 1335–44. https://doi.org/10.1111/j.1539-6924.2007.00960.x.
Moynihan, Thomas. 2020. X-Risk: How Humanity Discovered Its Own Extinction. Falmouth: Urbanomic.
Ord, Toby. 2020. The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books.
Parfit, Derek. 2017. On What Matters: Volume Three. Oxford: Oxford University Press.
Pinker, Steven. 2012. The Better Angels of Our Nature. New York Toronto London: Penguin Books.
———. 2018. Enlightenment Now. New York, New York: Viking, an imprint of Penguin Random House LLC.
Plant, Michael. 2022. “Will faster economic growth make us happier? The relevance of the Easterlin Paradox to Progress Studies.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/gCDsAj3K5gcZvGgbg/will-faster-economic-growth-make-us-happier-the-relevance-of.
Rowe, Abraham. 2022. “Critiques of EA that I Want to Read.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/n3WwTz4dbktYwNQ2j/critiques-of-ea-that-i-want-to-read.
Russell, Stuart J. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. New York: Viking.
Schopenhauer, Arthur. 2008 . The World as Will and Representation. New York: Routledge.
Shulman, Carl. 2012. “Are Pain and Pleasure Equally Energy-Efficient?” Reflective Disequillibrium. http://reflectivedisequilibrium.blogspot.com/2012/03/are-pain-and-pleasure-equally-energy.html.
Smith, Tom W., Peter Marsden, Michael Hout, and Jibum Kim. 2022. “General Social Surveys, 1972-2022.” National Opinion Research Center. https://www.norc.org/PDFs/COVID%20Response%20Tracking%20Study/Historic%20Shift%20in%20Americans%20Happiness%20Amid%20Pandemic.pdf.
Tarsney, Christian. 2022. “The Epistemic Challenge to Longtermism.” Global Priorities Institute. https://globalprioritiesinstitute.org/wp-content/uploads/Tarsney-Epistemic-Challenge-to-Longtermism.pdf.
Tegmark, Max. 2017. Life 3.0: Being Human in the Age of Artificial Intelligence. New York: Alfred A. Knopf.
Tomasik, Brian. 2006. “On the Seriousness of Suffering.” Essays on Reducing Suffering. https://reducing-suffering.org/on-the-seriousness-of-suffering/.
———. 2011. “Risks of Astronomical Future Suffering.” Foundational Research Institute. https://foundational-research.org/risks-of-astronomical-future-suffering/.
———. 2013a. “The Future of Darwinism.” Essays on Reducing Suffering. https://reducing-suffering.org/the-future-of-darwinism/.
———. 2013b. “Values Spreading Is Often More Important than Extinction Risk.” Essays on Reducing Suffering. https://reducing-suffering.org/values-spreading-often-important-extinction-risk/.
———. 2014. “Why the Modesty Argument for Moral Realism Fails.” Essays on Reducing Suffering. https://reducing-suffering.org/why-the-modesty-argument-for-moral-realism-fails/.
———. 2015. “Artificial Intelligence and Its Implications for Future Suffering.” Center on Long-Term Risk. https://longtermrisk.org/artificial-intelligence-and-its-implications-for-future-suffering/.
———. 2017. “Will Future Civilization Eventually Achieve Goal Preservation?” Essays on Reducing Suffering. https://reducing-suffering.org/will-future-civilization-eventually-achieve-goal-preservation/.
Vinding, Magnus. 2020. Suffering-Focused Ethics: Defense and Implications. Ratio Ethica.
West, Ben. 2017. “An Argument for Why the Future May Be Good.” Effective Altruism Forum. https://forum.effectivealtruism.org/posts/kNKpyf4WWdKehgvRt/an-argument-for-why-the-future-may-be-good.
Wolf, Clark. 1997. “Person-Affecting Utilitarianism and Population Policy; or, Sissy Jupe’s Theory of Social Choice.” In Contingent Future Persons, eds. Nick Fotion and Jan C. Heller. Dordrecht: Springer Dordrecht. https://doi.org/10.1007/978-94-011-5566-3_9.
Yudkowsky, Eliezer. 2004. “Coherent Extrapolated Volition.” The Singularity Institute. https://intelligence.org/files/CEV.pdf.
———. 2007. “The Hidden Complexity of Wishes.” LessWrong. https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes.
 For the sake of brevity, while I have my own views of moral value and disvalue, I don’t tie this essay to any particular view (e.g., utilitarianism). For example, it can include subjective goods (valuable for a person) and objective goods (valuable regardless of people), and it can be understood as estimates or direct observation of realist good (stance-independent) or anti-realist good (stance-dependent). Some may also have moral aims aside from maximizing expected “value” per se, at least for certain senses of “expected” and “value.” There is a substantial philosophical literature on such topics that I will not wade into, and I believe such non-value-based arguments can be mapped onto value-based arguments with minimal loss (e.g., not having a duty to make happy people can be mapped onto there being no value in making happy people). In general, I will try to keep this blog post modular and at most briefly summarize extant literature.
 For the sake of brevity, I analyze human survival and interstellar colonization together under the label “human expansion.” I gloss over possible futures in which humanity survives but does not colonize space.
 For example, the portion of historical progress made through market mechanisms is split among Historical Progress insofar as this is a large historical trend, Value Through Intent insofar as humans intentionally progressed in this way, Value Through Evolution insofar as selection increased the prevalence of these mechanisms, and Reasoned Cooperation insofar as the intentional change was through reasoned cooperation. How is this splitting calculated? I punt to future work, but in general, I mean some sort of causal attribution measure. For example, if I grow an apple tree that is caused by both rain and soil nutrients, then I would assign more causal force to rain if and only if reducing rain by one standard deviation would inhibit growth more than reducing soil nutrients by one standard deviation. Related measures include Shapley values and LIME.
 I do not provide specific explanations for these weights because they are meant as intuitive, subjective estimates of the linear weight of the argument as laid out in the description column. As discussed elsewhere, unpacking these weights into probability distributions and back-of-the-envelope estimates is a promising direction for better estimating the EV of human expansion.
 I do not provide specific explanations for the weights in the spreadsheet because they are meant as intuitive, subjective estimates of the linear weight of the argument as laid out in the description column. As discussed elsewhere, unpacking these weights into probability distributions and back-of-the-envelope estimates is a promising direction for better estimating the EV of human expansion. The evaluations rely on a wide range of empirical, conceptual, and intuitive evidence. These numbers should be taken with many grains of salt, but as the “superforecasting” literature evidences, it can be useful to quantify seemingly hard-to-quantify questions. The weights in this table are meant as linear, and the linear sum is -7. There are many approaches we could take to aggregating such evidence, reasoning, and intuitions; we could entirely avoid quantification entirely and take the gestalt of these arguments. If taken as logarithms of 2 (e.g., take 0 as 0, take 1 as 2, take 10 as 2^10=1024) as the prior that EA arguments tend to vary in weight by doubling rather than linear scaling, then the mean is -410. Again, these are just two of the many possible ways to aggregate arguments on this topic. Also, for methodological clarity at the risk of droning, I assign weights constantly across arguments (e.g., 2 arguments of weight +2 are the same evidential weight as 4 arguments of weight +1), though other assignment methods are reasonable, and again, other divisions of the arguments (i.e., other numbers of rows in the table) are reasonable and would make no difference in my own additive total, though they could change the exponential total and other aggregations.
 In my opinion, there are many different values involved in developing and deploying an AI system, so the distinction between inner and outer alignment is rarely precise in practice. Much of identifying and aligning with “good” or “correct” values can be described as outer alignment. In general, I think of AI value alignment as a long series of mechanisms from the causal factors that create human values (which themselves can be thought of as objective functions) to a tangled web of objectives in each human brain (e.g., values, desires, preferences) to a tangled web of social objectives aggregated across humans (e.g., voting, debates, parliaments, marketplaces) to a tangled web of objectives communicated from humans to machines (e.g., material values in game-playing AI, training data, training labels, architectures) to a tangled web of emergent objectives in the machines (e.g., parametric architectures in the neural net, (smoothed) sets of possible actions in domain, (smoothed) sets of possible actions out of domain) and finally to the machine actions (i.e., what it actually does in the world). We can reasonably refer to the alignment of any of these objects with any of the other objects in this long, tangled continuum of values. Two examples of outer alignment work that I have in mind here are Askell et al. (2021) “A General Language Assistant as a Laboratory for Alignment” and Hobbhan et al. (2022) “Reflection Mechanisms as an Alignment Target: A Survey.”
 I’m not persuaded by, and I don’t account for, moral uncertainty because I don’t think a “Discoverable Moral Reality” is plausible, and I doubt I would be persuaded to act in accordance with it if it did exist (e.g., to cause suffering if suffering were stance-indepently good)—though it is unclear what it would even mean for a vague, stance-independent phenomenon to exist (Anthis 2022). Moreover, I’m not compelled by arguments to account for any sort of anti-realist moral uncertainty, views which are arguably better not even described as “uncertainty” (e.g., weighting my future self’s morals, such as after a personality-altering brain injury or taking rationality- and intelligence-increasing nootropics; across different moral frameworks, such as in a moral parliament, e.g., Bostrom 2009). Of course, I still account for moral cooperation and standard empirical uncertainty.
 There are many things to say about how Aumann’s Agreement Theorem obtains in the real world. For example, Andrew Critch states that the “common priors” assumption “seems extremely unrealistic for the real world.” I’m not sure if I disagree with this, but when I describe Aumann updating, I’m not referring to a specific prior-to-posterior Bayesian update; I’m referring to the equal treatment of all the evidence going into my belief with all the evidence going into my interlocutor’s belief. If nothing else, this can be viewed as an aggregation of evidence in which each agent is still left with aggregating their evidence and prior, but I don’t like approaching such questions with a bright line between prior and posterior except in a specific prior-to-posterior Bayesian update (e.g., you believe the sky is blue but then walk outside one day and see it looks red; how should this change your belief?).