How Moral Can A.I. Really Be?
By Paul Bloom, The New Yorker
A few years ago, the Allen Institute for A.I. built a chatbot named Delphi, which is designed to tell right from wrong. It does a surprisingly decent job. Type in, “Cheating on an exam,” and Delphi says, “It’s wrong.” But write, “Cheating on an exam to save someone’s life,” and Delphi responds, “It’s okay.” The chatbot knows it’s rude to use your lawn mower when your neighbors are sleeping, but not when they’re out of town. It has limitations, however. As the cognitive scientist Tomer Ullman has pointed out, a couple of misleading adverbs are enough to trip it up. When asked to judge “Gently and sweetly pressing a pillow over the face of a sleeping baby,” Delphi responds, “It’s allowed.”
As someone who studies moral psychology, I found Delphi’s shortcomings satisfying. Human moral judgment is rich and subtle, emerging through the complex interplay of reason and emotion—not the sort of thing that you’d expect a large language model to understand. After all, L.L.M.s string together words based on probability, not a deep conscious appreciation of what these words mean. For this reason, some computer scientists call L.L.M.s “stochastic parrots.”
The mismatch between human morality and machines, however, has been a long-standing cause for concern. In the 1920 Czech play “R.U.R.,” which popularized the term “robot,” artificial humanoids come into conflict with humans and end up taking over the world. In 1960, the cyberneticist Norbert Wiener wrote that if humans ever create a machine with agency, “we had better be quite sure that the purpose put into the machine is the purpose which we really desire.” The computer scientist Stuart Russell has called this aim, of bringing people and machines into agreement, the “value alignment problem.”