Kitabı oxu: «Algorithms to Live By: The Computer Science of Human Decisions»
Copyright
William Collins
An imprint of HarperCollinsPublishers
1 London Bridge Street
London SE1 9GF
This eBook first published in Great Britain by William Collins in 2016
First published in the United States by Henry Holt and Company, LLC in 2016
Copyright © 2016 by Brian Christian and Tom Griffiths
Brian Christian and Tom Griffiths assert the moral right to be identified as the authors of this work
A catalogue record for this book is available from the British Library
Cover design by Jonathan Pelham
All rights reserved under International and Pan-American Copyright Conventions. By payment of the required fees, you have been granted the non-exclusive, non-transferable right to access and read the text of this e-book on screen. No part of this text may be reproduced, transmitted, down-loaded, decompiled, reverse engineered, or stored in or introduced into any information storage and retrieval system, in any form or by any means, whether electronic or mechanical, now known or hereinafter invented, without the express written permission of HarperCollins.
Source ISBN: 9780007547999
Ebook Edition © April 2016 ISBN: 9780007547982
Version: 2018-09-27
Dedication
For our families
Contents
Cover
Title Page
Copyright
Dedication
Introduction
Algorithms to Live By
1 Optimal Stopping
When to Stop Looking
2 Explore/Exploit
The Latest vs. the Greatest
3 Sorting
Making Order
4 Caching
Forget About It
5 Scheduling
First Things First
6 Bayes’s Rule
Predicting the Future
7 Overfitting
When to Think Less
8 Relaxation
Let It Slide
9 Randomness
When to Leave It to Chance
10 Networking
How We Connect
11 Game Theory
The Minds of Others
Conclusion
Computational Kindness
Notes
Bibliography
Index
Acknowledgments
Also by Brian Christian
About the Authors
About the Publisher
Introduction
Algorithms to Live By
Imagine you’re searching for an apartment in San Francisco—arguably the most harrowing American city in which to do so. The booming tech sector and tight zoning laws limiting new construction have conspired to make the city just as expensive as New York, and by many accounts more competitive. New listings go up and come down within minutes, open houses are mobbed, and often the keys end up in the hands of whoever can physically foist a deposit check on the landlord first.
Such a savage market leaves little room for the kind of fact-finding and deliberation that is theoretically supposed to characterize the doings of the rational consumer. Unlike, say, a mall patron or an online shopper, who can compare options before making a decision, the would-be San Franciscan has to decide instantly either way: you can take the apartment you are currently looking at, forsaking all others, or you can walk away, never to return.
Let’s assume for a moment, for the sake of simplicity, that you care only about maximizing your chance of getting the very best apartment available. Your goal is reducing the twin, Scylla-and-Charybdis regrets of the “one that got away” and the “stone left unturned” to the absolute minimum. You run into a dilemma right off the bat: How are you to know that an apartment is indeed the best unless you have a baseline to judge it by? And how are you to establish that baseline unless you look at (and lose) a number of apartments? The more information you gather, the better you’ll know the right opportunity when you see it—but the more likely you are to have already passed it by.
So what do you do? How do you make an informed decision when the very act of informing it jeopardizes the outcome? It’s a cruel situation, bordering on paradox.
When presented with this kind of problem, most people will intuitively say something to the effect that it requires some sort of balance between looking and leaping—that you must look at enough apartments to establish a standard, then take whatever satisfies the standard you’ve established. This notion of balance is, in fact, precisely correct. What most people don’t say with any certainty is what that balance is. Fortunately, there’s an answer.
Thirty-seven percent.
If you want the best odds of getting the best apartment, spend 37% of your apartment hunt (eleven days, if you’ve given yourself a month for the search) noncommittally exploring options. Leave the checkbook at home; you’re just calibrating. But after that point, be prepared to immediately commit—deposit and all—to the very first place you see that beats whatever you’ve already seen. This is not merely an intuitively satisfying compromise between looking and leaping. It is the provably optimal solution.
We know this because finding an apartment belongs to a class of mathematical problems known as “optimal stopping” problems. The 37% rule defines a simple series of steps—what computer scientists call an “algorithm”—for solving these problems. And as it turns out, apartment hunting is just one of the ways that optimal stopping rears its head in daily life. Committing to or forgoing a succession of options is a structure that appears in life again and again, in slightly different incarnations. How many times to circle the block before pulling into a parking space? How far to push your luck with a risky business venture before cashing out? How long to hold out for a better offer on that house or car?
The same challenge also appears in an even more fraught setting: dating. Optimal stopping is the science of serial monogamy.
Simple algorithms offer solutions not only to an apartment hunt but to all such situations in life where we confront the question of optimal stopping. People grapple with these issues every day—although surely poets have spilled more ink on the tribulations of courtship than of parking—and they do so with, in some cases, considerable anguish. But the anguish is unnecessary. Mathematically, at least, these are solved problems.
Every harried renter, driver, and suitor you see around you as you go through a typical week is essentially reinventing the wheel. They don’t need a therapist; they need an algorithm. The therapist tells them to find the right, comfortable balance between impulsivity and overthinking.
The algorithm tells them the balance is thirty-seven percent.
* * *
There is a particular set of problems that all people face, problems that are a direct result of the fact that our lives are carried out in finite space and time. What should we do, and leave undone, in a day or in a decade? What degree of mess should we embrace—and how much order is excessive? What balance between new experiences and favored ones makes for the most fulfilling life?
These might seem like problems unique to humans; they’re not. For more than half a century, computer scientists have been grappling with, and in many cases solving, the equivalents of these everyday dilemmas. How should a processor allocate its “attention” to perform all that the user asks of it, with the minimum overhead and in the least amount of time? When should it switch between different tasks, and how many tasks should it take on in the first place? What is the best way for it to use its limited memory resources? Should it collect more data, or take an action based on the data it already has? Seizing the day might be a challenge for humans, but computers all around us are seizing milliseconds with ease. And there’s much we can learn from how they do it.
Talking about algorithms for human lives might seem like an odd juxtaposition. For many people, the word “algorithm” evokes the arcane and inscrutable machinations of big data, big government, and big business: increasingly part of the infrastructure of the modern world, but hardly a source of practical wisdom or guidance for human affairs. But an algorithm is just a finite sequence of steps used to solve a problem, and algorithms are much broader—and older by far—than the computer. Long before algorithms were ever used by machines, they were used by people.
The word “algorithm” comes from the name of Persian mathematician al-Khwārizmī, author of a ninth-century book of techniques for doing mathematics by hand. (His book was called al-Jabr wa’l-Muqābala—and the “al-jabr” of the title in turn provides the source of our word “algebra.”) The earliest known mathematical algorithms, however, predate even al-Khwārizmī’s work: a four-thousand-year-old Sumerian clay tablet found near Baghdad describes a scheme for long division.
But algorithms are not confined to mathematics alone. When you cook bread from a recipe, you’re following an algorithm. When you knit a sweater from a pattern, you’re following an algorithm. When you put a sharp edge on a piece of flint by executing a precise sequence of strikes with the end of an antler—a key step in making fine stone tools—you’re following an algorithm. Algorithms have been a part of human technology ever since the Stone Age.
* * *
In this book, we explore the idea of human algorithm design—searching for better solutions to the challenges people encounter every day. Applying the lens of computer science to everyday life has consequences at many scales. Most immediately, it offers us practical, concrete suggestions for how to solve specific problems. Optimal stopping tells us when to look and when to leap. The explore/exploit tradeoff tells us how to find the balance between trying new things and enjoying our favorites. Sorting theory tells us how (and whether) to arrange our offices. Caching theory tells us how to fill our closets. Scheduling theory tells us how to fill our time.
At the next level, computer science gives us a vocabulary for understanding the deeper principles at play in each of these domains. As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.” Even in cases where life is too messy for us to expect a strict numerical analysis or a ready answer, using intuitions and concepts honed on the simpler forms of these problems offers us a way to understand the key issues and make progress.
Most broadly, looking through the lens of computer science can teach us about the nature of the human mind, the meaning of rationality, and the oldest question of all: how to live. Examining cognition as a means of solving the fundamentally computational problems posed by our environment can utterly change the way we think about human rationality.
The notion that studying the inner workings of computers might reveal how to think and decide, what to believe and how to behave, might strike many people as not only wildly reductive, but in fact misguided. Even if computer science did have things to say about how to think and how to act, would we want to listen? We look at the AIs and robots of science fiction, and it seems like theirs is not a life any of us would want to live.
In part, that’s because when we think about computers, we think about coldly mechanical, deterministic systems: machines applying rigid deductive logic, making decisions by exhaustively enumerating the options, and grinding out the exact right answer no matter how long and hard they have to think. Indeed, the person who first imagined computers had something essentially like this in mind. Alan Turing defined the very notion of computation by an analogy to a human mathematician who carefully works through the steps of a lengthy calculation, yielding an unmistakably right answer.
So it might come as a surprise that this is not what modern computers are actually doing when they face a difficult problem. Straightforward arithmetic, of course, isn’t particularly challenging for a modern computer. Rather, it’s tasks like conversing with people, fixing a corrupted file, or winning a game of Go—problems where the rules aren’t clear, some of the required information is missing, or finding exactly the right answer would require considering an astronomical number of possibilities—that now pose the biggest challenges in computer science. And the algorithms that researchers have developed to solve the hardest classes of problems have moved computers away from an extreme reliance on exhaustive calculation. Instead, tackling real-world tasks requires being comfortable with chance, trading off time with accuracy, and using approximations.
As computers become better tuned to real-world problems, they provide not only algorithms that people can borrow for their own lives, but a better standard against which to compare human cognition itself. Over the past decade or two, behavioral economics has told a very particular story about human beings: that we are irrational and error-prone, owing in large part to the buggy, idiosyncratic hardware of the brain. This self-deprecating story has become increasingly familiar, but certain questions remain vexing. Why are four-year-olds, for instance, still better than million-dollar supercomputers at a host of cognitive tasks, including vision, language, and causal reasoning?
The solutions to everyday problems that come from computer science tell a different story about the human mind. Life is full of problems that are, quite simply, hard. And the mistakes made by people often say more about the intrinsic difficulties of the problem than about the fallibility of human brains. Thinking algorithmically about the world, learning about the fundamental structures of the problems we face and about the properties of their solutions, can help us see how good we actually are, and better understand the errors that we make.
In fact, human beings turn out to consistently confront some of the hardest cases of the problems studied by computer scientists. Often, people need to make decisions while dealing with uncertainty, time constraints, partial information, and a rapidly changing world. In some of those cases, even cutting-edge computer science has not yet come up with efficient, always-right algorithms. For certain situations it appears that such algorithms might not exist at all.
Even where perfect algorithms haven’t been found, however, the battle between generations of computer scientists and the most intractable real-world problems has yielded a series of insights. These hard-won precepts are at odds with our intuitions about rationality, and they don’t sound anything like the narrow prescriptions of a mathematician trying to force the world into clean, formal lines. They say: Don’t always consider all your options. Don’t necessarily go for the outcome that seems best every time. Make a mess on occasion. Travel light. Let things wait. Trust your instincts and don’t think too long. Relax. Toss a coin. Forgive, but don’t forget. To thine own self be true.
Living by the wisdom of computer science doesn’t sound so bad after all. And unlike most advice, it’s backed up by proofs.
* * *
Just as designing algorithms for computers was originally a subject that fell into the cracks between disciplines—an odd hybrid of mathematics and engineering—so, too, designing algorithms for humans is a topic that doesn’t have a natural disciplinary home. Today, algorithm design draws not only on computer science, math, and engineering but on kindred fields like statistics and operations research. And as we consider how algorithms designed for machines might relate to human minds, we also need to look to cognitive science, psychology, economics, and beyond.
We, your authors, are familiar with this interdisciplinary territory. Brian studied computer science and philosophy before going on to graduate work in English and a career at the intersection of the three. Tom studied psychology and statistics before becoming a professor at UC Berkeley, where he spends most of his time thinking about the relationship between human cognition and computation. But nobody can be an expert in all of the fields that are relevant to designing better algorithms for humans. So as part of our quest for algorithms to live by, we talked to the people who came up with some of the most famous algorithms of the last fifty years. And we asked them, some of the smartest people in the world, how their research influenced the way they approached their own lives—from finding their spouses to sorting their socks.
The next pages begin our journey through some of the biggest challenges faced by computers and human minds alike: how to manage finite space, finite time, limited attention, unknown unknowns, incomplete information, and an unforeseeable future; how to do so with grace and confidence; and how to do so in a community with others who are all simultaneously trying to do the same. We will learn about the fundamental mathematical structure of these challenges and about how computers are engineered—sometimes counter to what we imagine—to make the most of them. And we will learn about how the mind works, about its distinct but deeply related ways of tackling the same set of issues and coping with the same constraints. Ultimately, what we can gain is not only a set of concrete takeaways for the problems around us, not only a new way to see the elegant structures behind even the hairiest human dilemmas, not only a recognition of the travails of humans and computers as deeply conjoined, but something even more profound: a new vocabulary for the world around us, and a chance to learn something truly new about ourselves.
1 Optimal Stopping
When to Stop Looking
Though all Christians start a wedding invitation by solemnly declaring their marriage is due to special Divine arrangement, I, as a philosopher, would like to talk in greater detail about this …
—JOHANNES KEPLER
If you prefer Mr. Martin to every other person; if you think him the most agreeable man you have ever been in company with, why should you hesitate?
—JANE AUSTEN, EMMA
It’s such a common phenomenon that college guidance counselors even have a slang term for it: the “turkey drop.” High-school sweethearts come home for Thanksgiving of their freshman year of college and, four days later, return to campus single.
An angst-ridden Brian went to his own college guidance counselor his freshman year. His high-school girlfriend had gone to a different college several states away, and they struggled with the distance. They also struggled with a stranger and more philosophical question: how good a relationship did they have? They had no real benchmark of other relationships by which to judge it. Brian’s counselor recognized theirs as a classic freshman-year dilemma, and was surprisingly nonchalant in her advice: “Gather data.”
The nature of serial monogamy, writ large, is that its practitioners are confronted with a fundamental, unavoidable problem. When have you met enough people to know who your best match is? And what if acquiring the data costs you that very match? It seems the ultimate Catch-22 of the heart.
As we have seen, this Catch-22, this angsty freshman cri de coeur, is what mathematicians call an “optimal stopping” problem, and it may actually have an answer: 37%.
Of course, it all depends on the assumptions you’re willing to make about love.
The Secretary Problem
In any optimal stopping problem, the crucial dilemma is not which option to pick, but how many options to even consider. These problems turn out to have implications not only for lovers and renters, but also for drivers, homeowners, burglars, and beyond.
The 37% Rule* derives from optimal stopping’s most famous puzzle, which has come to be known as the “secretary problem.” Its setup is much like the apartment hunter’s dilemma that we considered earlier. Imagine you’re interviewing a set of applicants for a position as a secretary, and your goal is to maximize the chance of hiring the single best applicant in the pool. While you have no idea how to assign scores to individual applicants, you can easily judge which one you prefer. (A mathematician might say you have access only to the ordinal numbers—the relative ranks of the applicants compared to each other—but not to the cardinal numbers, their ratings on some kind of general scale.) You interview the applicants in random order, one at a time. You can decide to offer the job to an applicant at any point and they are guaranteed to accept, terminating the search. But if you pass over an applicant, deciding not to hire them, they are gone forever.
The secretary problem is widely considered to have made its first appearance in print—sans explicit mention of secretaries—in the February 1960 issue of Scientific American, as one of several puzzles posed in Martin Gardner’s beloved column on recreational mathematics. But the origins of the problem are surprisingly mysterious. Our own initial search yielded little but speculation, before turning into unexpectedly physical detective work: a road trip down to the archive of Gardner’s papers at Stanford, to haul out boxes of his midcentury correspondence. Reading paper correspondence is a bit like eavesdropping on someone who’s on the phone: you’re only hearing one side of the exchange, and must infer the other. In our case, we only had the replies to what was apparently Gardner’s own search for the problem’s origins fiftysome years ago. The more we read, the more tangled and unclear the story became.
Harvard mathematician Frederick Mosteller recalled hearing about the problem in 1955 from his colleague Andrew Gleason, who had heard about it from somebody else. Leo Moser wrote from the University of Alberta to say that he read about the problem in “some notes” by R. E. Gaskell of Boeing, who himself credited a colleague. Roger Pinkham of Rutgers wrote that he first heard of the problem in 1955 from Duke University mathematician J. Shoenfield, “and I believe he said that he had heard the problem from someone at Michigan.”
“Someone at Michigan” was almost certainly someone named Merrill Flood. Though he is largely unheard of outside mathematics, Flood’s influence on computer science is almost impossible to avoid. He’s credited with popularizing the traveling salesman problem (which we discuss in more detail in chapter 8), devising the prisoner’s dilemma (which we discuss in chapter 11), and even with possibly coining the term “software.” It’s Flood who made the first known discovery of the 37% Rule, in 1958, and he claims to have been considering the problem since 1949—but he himself points back to several other mathematicians.
Suffice it to say that wherever it came from, the secretary problem proved to be a near-perfect mathematical puzzle: simple to explain, devilish to solve, succinct in its answer, and intriguing in its implications. As a result, it moved like wildfire through the mathematical circles of the 1950s, spreading by word of mouth, and thanks to Gardner’s column in 1960 came to grip the imagination of the public at large. By the 1980s the problem and its variations had produced so much analysis that it had come to be discussed in papers as a subfield unto itself.
As for secretaries—it’s charming to watch each culture put its own anthropological spin on formal systems. We think of chess, for instance, as medieval European in its imagery, but in fact its origins are in eighth-century India; it was heavy-handedly “Europeanized” in the fifteenth century, as its shahs became kings, its viziers turned to queens, and its elephants became bishops. Likewise, optimal stopping problems have had a number of incarnations, each reflecting the predominating concerns of its time. In the nineteenth century such problems were typified by baroque lotteries and by women choosing male suitors; in the early twentieth century by holidaying motorists searching for hotels and by male suitors choosing women; and in the paper-pushing, male-dominated mid-twentieth century, by male bosses choosing female assistants. The first explicit mention of it by name as the “secretary problem” appears to be in a 1964 paper, and somewhere along the way the name stuck.
Whence 37%?
In your search for a secretary, there are two ways you can fail: stopping early and stopping late. When you stop too early, you leave the best applicant undiscovered. When you stop too late, you hold out for a better applicant who doesn’t exist. The optimal strategy will clearly require finding the right balance between the two, walking the tightrope between looking too much and not enough.
If your aim is finding the very best applicant, settling for nothing less, it’s clear that as you go through the interview process you shouldn’t even consider hiring somebody who isn’t the best you’ve seen so far. However, simply being the best yet isn’t enough for an offer; the very first applicant, for example, will of course be the best yet by definition. More generally, it stands to reason that the rate at which we encounter “best yet” applicants will go down as we proceed in our interviews. For instance, the second applicant has a 50/50 chance of being the best we’ve yet seen, but the fifth applicant only has a 1-in-5 chance of being the best so far, the sixth has a 1-in-6 chance, and so on. As a result, best-yet applicants will become steadily more impressive as the search continues (by definition, again, they’re better than all those who came before)—but they will also become more and more infrequent.
Okay, so we know that taking the first best-yet applicant we encounter (a.k.a. the first applicant, period) is rash. If there are a hundred applicants, it also seems hasty to make an offer to the next one who’s best-yet, just because she was better than the first. So how do we proceed?
Intuitively, there are a few potential strategies. For instance, making an offer the third time an applicant trumps everyone seen so far—or maybe the fourth time. Or perhaps taking the next best-yet applicant to come along after a long “drought”—a long streak of poor ones.
But as it happens, neither of these relatively sensible strategies comes out on top. Instead, the optimal solution takes the form of what we’ll call the Look-Then-Leap Rule: You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase.
We can see how the Look-Then-Leap Rule emerges by considering how the secretary problem plays out in the smallest applicant pools. With just one applicant the problem is easy to solve—hire her! With two applicants, you have a 50/50 chance of success no matter what you do. You can hire the first applicant (who’ll turn out to be the best half the time), or dismiss the first and by default hire the second (who is also best half the time).
Add a third applicant, and all of a sudden things get interesting. The odds if we hire at random are one-third, or 33%. With two applicants we could do no better than chance; with three, can we? It turns out we can, and it all comes down to what we do with the second interviewee. When we see the first applicant, we have no information—she’ll always appear to be the best yet. When we see the third applicant, we have no agency—we have to make an offer to the final applicant, since we’ve dismissed the others. But when we see the second applicant, we have a little bit of both: we know whether she’s better or worse than the first, and we have the freedom to either hire or dismiss her. What happens when we just hire her if she’s better than the first applicant, and dismiss her if she’s not? This turns out to be the best possible strategy when facing three applicants; using this approach it’s possible, surprisingly, to do just as well in the three-applicant problem as with two, choosing the best applicant exactly half the time.*
Enumerating these scenarios for four applicants tells us that we should still begin to leap as soon as the second applicant; with five applicants in the pool, we shouldn’t leap before the third.
As the applicant pool grows, the exact place to draw the line between looking and leaping settles to 37% of the pool, yielding the 37% Rule: look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far.
How to optimally choose a secretary.
As it turns out, following this optimal strategy ultimately gives us a 37% chance of hiring the best applicant; it’s one of the problem’s curious mathematical symmetries that the strategy itself and its chance of success work out to the very same number. The table above shows the optimal strategy for the secretary problem with different numbers of applicants, demonstrating how the chance of success—like the point to switch from looking to leaping—converges on 37% as the number of applicants increases.
A 63% failure rate, when following the best possible strategy, is a sobering fact. Even when we act optimally in the secretary problem, we will still fail most of the time—that is, we won’t end up with the single best applicant in the pool. This is bad news for those of us who would frame romance as a search for “the one.” But here’s the silver lining. Intuition would suggest that our chances of picking the single best applicant should steadily decrease as the applicant pool grows. If we were hiring at random, for instance, then in a pool of a hundred applicants we’d have a 1% chance of success, and in a pool of a million applicants we’d have a 0.0001% chance. Yet remarkably, the math of the secretary problem doesn’t change. If you’re stopping optimally, your chance of finding the single best applicant in a pool of a hundred is 37%. And in a pool of a million, believe it or not, your chance is still 37%. Thus the bigger the applicant pool gets, the more valuable knowing the optimal algorithm becomes. It’s true that you’re unlikely to find the needle the majority of the time, but optimal stopping is your best defense against the haystack, no matter how large.