What does it mean to be intelligent? — Penn Science Policy and Diplomacy Group

By Sanjana Hemdev & Sonia Roberts

This is the fifth post in a series about artificial intelligence, along with its uses and social/political implications.

“Artificial intelligence” (AI) is ubiquitous in today’s world. From Amazon’s Alexa to Five Nights at Freddy’s, from weather prediction systems to vaccine development, AI is a constant source of both wonder and trepidation in the popular imagination. As development in the field of AI continues at breakneck speeds, we are invited to consider not only how AI can change the physical world, but also how it can change our understanding of what it means to be “intelligent.” Can AI systems truly be considered intelligent in the same way that humans are? If so, by whose definition – if any? In search of answers to these questions, we spoke with Dr. Lisa Miracchi Titus, a Professor of Philosophy and seasoned artificial intelligence expert here at Penn. Dr. Miracchi Titus is affiliated with the GRASP Lab, one of the oldest and most well known robotics labs in the country. “This is a beautiful time to be asking all of these questions,” Dr. Miracchi Titus says. “The technological progress that we’ve made both enables us and requires us to take a step back at this moment of time and ask some pretty foundational questions.”

An intuitive first question is: What is intelligence? Alan Turing was a British mathematician and computer scientist, regarded by many as the father of theoretical computer science and artificial intelligence. He believed that any machine which a human could mistake for another human while conversing with it in text could be considered intelligent, and he termed this evaluation of machine intelligence the “Turing test”. At first glance, this definition may make sense – if a new system behaves in a way that is indistinguishable from something we all agree is intelligent, why wouldn’t we call that machine intelligent? But in practice, there are two issues. First, the popular understanding of “intelligence” suggests that there is a mind somewhere in the machine that is creating behavior rather than mindlessly replicating it from examples – the Turing test does not specify anything about minds, and could plausibly be fooled by a good enough replication. Second, the Turing test does not capture the types of “intelligent” processes actually demonstrated by machines. Even comparing within human intelligence, we know that standard metrics for intelligence – including SAT scores – correlate more strongly with socioeconomic status than with performance on relevant tasks. For example, the SAT only predicts GPA during the first year of college. When our systems of evaluation themselves are so flawed, how can we claim to accurately differentiate between an “intelligent” and “unintelligent” machine?

If “intelligence” is related to everything that a system – human or machine – can either know about or accomplish, then there must be many ways of expressing intelligence, all of which must be measured and compared in different ways. “Comparing, say, a bird and a dolphin is hard enough when you’re trying to see how much intelligence or what kind of intelligence is required for accomplishing a certain kind of task,” Dr. Miracchi Titus says. “That’s hard enough because the systems are different enough. The difference between a deep neural net and a human agent is so vast that [the] issue is even more exacerbated.” Computers can easily perform mind-bending calculations and high-dimensional analyses that humans struggle to even formulate. We usually associate human intelligence with being able to perform those tasks which are trivial for computers! So does it really make sense to compare the type of “intelligence” that comes easily to a machine to the type of “intelligence” that comes easily to a human?

One area where researchers have tried to simulate human intelligence is language. GPT-3, a system which can generate large chunks of text based on a prompt, represents language by mapping words into a high-dimensional vector space. An input vector (in this case, a linguistic prompt) is multiplied by a series of very large matrices (the “neural network weights”) with a little bit of random noise thrown in, which then produces an output vector (next bit of language). It’s trained to produce better outputs by a method called back propagation, which requires a staggeringly large number of paired input prompts and output language. Back propagation slightly adjusts the values in those large matrices based on the differences between the output that the system currently generates for some input, and the correct output. Trained using this method, GPT-3 can achieve similar performance to a high schooler on SAT analogy questions – a traditional metric for language comprehension. “These vector representations have interesting properties,” Dr. Miracchi Titus says. “You can do certain kinds of algebra over these vectors that people think are really interesting.” For example, you could take the vector representation of king and subtract the vector representation of queen to get a vector representing the difference between those two concepts. Then, if you subtract the difference vector from prince, could you get princess? “A lot of people are really wondering whether we can think of these vector representations as models for human word knowledge and human analogical reasoning.” However, if GPT-3 is accomplishing its performance just by measuring the distance between symbols but doesn’t understand the meaning of the high-dimensional space, has it truly demonstrated any kind of real, human-like knowledge?

This is particularly important when we consider the potential harms of AI systems. It might be obvious that a system trained to predict the likelihood that someone who has served jail time will reoffend is likely to be horribly biased if it’s trained on racist data, but even language-generation systems like GPT-3 can be far from harmless. Recently, a group of pioneering AI ethics researchers put out a paper highlighting some of these issues, summarized in plain language here. Two of these issues are applicable to most AI systems trained on large datasets. First, there is a huge environmental and financial cost from training a large language model like GPT-3. Training just one version of a model – which would typically go through many, many iterations – takes about as much carbon as a round-trip flight between San Francisco and New York. Second, the massive amounts of language data used to train models like GPT-3 are typically mined directly from the internet with little or no cleaning. If you’ve spent any time online, you know that racist and sexist language abounds – and a model that doesn’t understand what its data means won’t understand not to repeat this language. Unfortunately, racist, sexist, classist, Americentric, transphobic, and other kinds of biased data are everywhere.

This brings us to the second problem with the Turing test: Machine “intelligence” seems to be inherently different from human intelligence, and the benchmarks we want to rely on to measure intelligence just don’t capture that. Dr. Miracchi Titus reminds us that “our job is to get more precise about what we mean by intelligence and why we think that certain tasks when done by AI should be illustrative of something like human intelligence.” Why do we think being able to answer SAT analogy questions should be a mark of machine intelligence, anyway? “People have these intuitive ideas about what requires a lot of intelligence for humans to do,” Dr. Miracchi Titus says. “But that is just an intuitive assumption about what task not only requires intelligence in humans but is going to require a similar sort of intelligence in the AI [system].” Bioinformatics algorithms, flight control programs and neural networks used in stock market analysis are all “smart” and can easily do things that are really hard for humans, but do not behave at all like humans do.

A better goal might be to create systems that humans can use to augment their decision-making, or that have animal-level intelligence, rather than building systems that are meant to be as intelligent as humans in their own right. In fact, because the tasks that are easy and hard for humans and machines are so different, the tasks given over to the machines might not seem to require any intelligence at all. “Many of these AIs are… amazing correlation machines,” Dr. Miracchi Titus says. “You feed it a bunch of data and it spits out these correlations that are totally opaque to humans, but because they can crunch all this data they’re brilliant — to use an intelligence term — at discovering these complex correlations. That is an incredibly useful tool.” Likewise, for a lot of tasks — including search-and-rescue — “you want a robot with roughly the intelligence of a dog that can sniff out the humans that it needs to save and has the social capacity to alert the people that it’s working with that they’re there.”

Based on the rate of progress in AI development that we’ve seen so far, it is clear that more complex systems which seem more “intelligent” are in the not-so-distant future. But while machine intelligence captures the imagination, we must remember that natural and artificial intelligence are different, and there’s nothing wrong with that.