Goals and Definitions of Friendly AI
Many experts have argued that AI systems with goals that are not perfectly identical to or very closely aligned with human ethics are intrinsically dangerous unless extreme measures are taken to ensure the safety of humanity. Decades ago, Ryszard Michalski, one of the pioneers of Machine Learning, taught his Ph.D. students that any truly alien mind, to include machine minds, was unknowable and therefore dangerous to humans. More recently, Eliezer Yudkowsky has called for the creation of “Friendly AI” to mitigate the existential threat of hostile intelligences. Stephen Omohundro argues that all advanced AI systems will, unless explicitly counteracted, exhibit a number of basic drives/tendencies/desires because of the intrinsic nature of goal-driven systems and that these drives will, “without special precautions”, cause the AI to act in ways that range from the disobedient to the dangerously unethical.
According to the proponents of Friendliness, the goals of future AIs will be more arbitrary and alien than commonly depicted in science fiction and earlier futurist speculation, in which AIs are often anthropomorphised and assumed to share universal human modes of thought. Because AI is not guaranteed to see the "obvious" aspects of morality and sensibility that most humans see so effortlessly, the theory goes, AIs with intelligences or at least physical capabilities greater than our own may concern themselves with endeavours that humans would see as pointless or even laughably bizarre. One example Yudkowsky provides is that of an AI initially designed to solve the Riemann hypothesis, which, upon being upgraded or upgrading itself with superhuman intelligence, tries to develop molecular nanotechnology because it wants to convert all matter in the Solar System into computing material to solve the problem, killing the humans who asked the question. For humans, this would seem ridiculously absurd, but as Friendliness theory stresses, this is only because we evolved to have certain instinctive sensibilities which an artificial intelligence, not sharing our evolutionary history, may not necessarily comprehend unless we design it to.
Friendliness proponents stress less the danger of superhuman AIs that actively seek to harm humans, but more of AIs that are disastrously indifferent to them. Superintelligent AIs may be harmful to humans if steps are not taken to specifically design them to be benevolent. Doing so effectively is the primary goal of Friendly AI. Designing an AI, whether deliberately or semi-deliberately, without such "Friendliness safeguards", would therefore be seen as highly immoral, especially if the AI could engage in recursive self-improvement, potentially leading to a significant power concentration.
This belief that human goals are so arbitrary derives heavily from modern advances in evolutionary psychology. Friendliness theory claims that most AI speculation is clouded by analogies between AIs and humans, and assumptions that all possible minds must exhibit characteristics that are actually psychological adaptations that exist in humans (and other animals) only because they were once beneficial and perpetuated by natural selection. This idea is expanded on greatly in section two of Yudkowsky's Creating Friendly AI, "Beyond anthropomorphism".
Many supporters of FAI speculate that an AI able to reprogram and improve itself, Seed AI, is likely to create a huge power disparity between itself and statically intelligent human minds; that its ability to enhance itself would very quickly outpace the human ability to exercise any meaningful control over it. While many doubt such scenarios are likely, if they were to occur, it would be important for AI to act benevolently towards humans. As Oxford philosopher Nick Bostrom puts it:
- "Basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has. Therefore, it is extremely important that the goals we endow it with, and its entire motivation system, is 'human friendly.'"
It is important to stress that Yudkowsky's Friendliness Theory is very different from ideas relating to the concept that AIs may be made safe by including specifications or strictures into their programming or hardware architecture, often exemplified by Isaac Asimov's Three Laws of Robotics, which would, in principle, force a machine to do nothing which might harm a human, or destroy it if it does attempt to do so. Friendliness Theory rather holds that the inclusion of such laws would be futile, because no matter how such laws are phrased or described, a truly intelligent machine with genuine (human-level or greater) creativity and resourcefulness could potentially design infinitely many ways of circumventing such laws, no matter how broadly or narrowly defined they were, or otherwise how categorically comprehensive they were formulated to be.
Rather, Yudkowsky's Friendliness Theory relates, through the fields of biopsychology, that if a truly intelligent mind feels motivated to carry out some function, the result of which would violate some constraint imposed against it, then given enough time and resources, it will develop methods of defeating all such constraints (as humans have done repeatedly throughout the history of technological civilization). Therefore, the appropriate response to the threat posed by such intelligence, is to attempt to ensure that such intelligent minds specifically feel motivated to not harm other intelligent minds (in any sense of the word "harm"), and to that end will deploy their resources towards devising better methods of keeping them from harm. In this scenario, an AI would be free to murder, injure, or enslave a human being, but it would strongly desire not to do so and would only do so if it judged, according to that same desire, that some vastly greater good to that human or to human beings in general would result (though this particular idea is explored in Asimov's Robot series stories, via the Zeroth Law). Therefore, an AI designed with Friendliness safeguards would do everything in its power to ensure humans do not come to "harm", and to ensure that any other AIs that are built would also want humans not to come to harm, and to ensure that any upgraded or modified AIs, whether itself or others, would also never want humans to come to harm - it would try to minimize the harm done to all intelligent minds in perpetuity. As Yudkowsky puts it:
- "Gandhi does not want to commit murder, and does not want to modify himself to commit murder."
Read more about this topic: Friendly Artificial Intelligence
Famous quotes containing the words goals and, goals, definitions and/or friendly:
“Our ego ideal is precious to us because it repairs a loss of our earlier childhood, the loss of our image of self as perfect and whole, the loss of a major portion of our infantile, limitless, aint-I-wonderful narcissism which we had to give up in the face of compelling reality. Modified and reshaped into ethical goals and moral standards and a vision of what at our finest we might be, our dream of perfection lives onour lost narcissism lives onin our ego ideal.”
—Judith Viorst (20th century)
“I think that any woman who sets goals for herself and takes her own life seriously and moves to achieve the goals that she wants as a person in her own right is a feminist.”
—Frances Kuehn (b. 1943)
“Lord Byron is an exceedingly interesting person, and as such is it not to be regretted that he is a slave to the vilest and most vulgar prejudices, and as mad as the winds?
There have been many definitions of beauty in art. What is it? Beauty is what the untrained eyes consider abominable.”
—Edmond De Goncourt (18221896)
“The comfortable smell of friendly fingers,
Hairs fragrance, and the musty reek that lingers
About dead leaves and last years ferns. . . .”
—Rupert Brooke (18871915)