Human Cognition in Artificial Intelligence: Bridging the Gap Between Mind and Machine

By Amanda N. Weiss

This is the first post in a series about artificial intelligence, along with its uses and social/political implications.

When you hear the phrase “artificial intelligence,” many different thoughts may come to mind. Perhaps you think of the helpful internet assistants on your phone and computer. Maybe your mind goes to the villainous computers of science fiction stories, seeking revenge on humanity. Whichever contexts your mind wanders into, it is clear that certain types of artificial intelligences have risen in prominence in both in everyday life and in popular media. At its core, artificial intelligence (AI) is the collection of computers or computational programs that are able to exhibit behaviors we would associate with human intelligence. AI algorithms can analyze data and incorporate new information into future decision-making. However, does the ability to learn in this sense truly constitute intelligence? Could advancements in AI eventually lead to sentient, self-aware computers? Or is there something unique to human psychology that cannot be modeled by algorithms and data processing? These are all questions that arise as we continue to use AI technology in our daily lives. In order to gain experienced insight, I had a conversation with Camilo Fosco, a machine learning PhD student at MIT, about artificial intelligence in the context of his research on computer vision and cognition.

It’s helpful to first go into a bit more depth about how to classify AIs and how artificial intelligences relate to human psychology. At present, the AIs that have been developed are weak, or narrow, AI, which means that they are algorithms that perform pre-defined functions. In contrast, strong AI, or general AI, is the concept of a computational system capable of human-level intelligence and adaptation. Although strong AI has not been achieved, some of the best-performing AIs that do exist are based on neural networks, which are machine learning systems modeled after the human brain. These systems involve numerous connected processing nodes, which can feed information forward to other nodes, similar to how neurons in the brain propagate signals. Neural networks are usually trained on pre-characterized datasets so that they can learn how to process new datasets. These networks can form the basis of technologies such as engineered robotics, vehicle control, and simulations. While deep neural networks (those with more than 3 layers of processing nodes) exhibit more independence from human intervention than other forms of machine learning, they still do not qualify as strong AI.

Our inability to develop strong AI despite the intrigue it offers is not a new problem. Several decades ago, Philosopher Hubert Dreyfus argued that it is impossible for computers to think in the same way as humans, since people interpret the world and learn through lived experience and interactions with their environments. This philosophical argument has persisted through time, even when taking recent technological advancements into account. Ragnar Fjelland argued that the reason computer bots have not been able to pass the Turing test (i.e; fool a human into thinking they are talking to another human) is because computers do not have a sense of causality, only associations and correlations. Further, computers cannot understand and respond to all questions in a human-like manner because they do not have experience existing and growing in the world. It is, of course, true that computer algorithms do not grow up within a sociocultural context and do not develop tacit knowledge.

However, technological developments have allowed for computer systems to have means of interacting with the world or processing a variety of types of stimuli. For example, recent years have seen the emergence of embedded AI, which is the implementation of AI software at the device level, rather than on the data cloud. This feature allows for both machine learning analysis and subsequent action to take place on a device itself, which gives AI more of a concrete presence in the physical world. One current avenue of research for embedded AI is the detection of COVID-19 by smartphones and other personal devices. General Electric is working on developing small surface sensors to detect viral particles. These sensors will be incorporated into a variety of devices with which people make physical contact. Additionally, Fitbit has been testing an algorithm to identify COVID-19 cases based on physiological factors, including breathing rate and heart rate. The study reportedly has shown promising initial results, including detecting cases prior to symptoms being reported. These technologies that bridge the physical and the computational have the potential to be incredibly useful for reducing the spread of the virus and may be adapted in the future to address other biomedical concerns; yet they still function based on a limited set of inputs and are directed at a single task, which does not reflect the breadth of human cognition. A more integrative approach to AI that incorporates large inputs of various types has been explored by the company DeepMind in their recent development of the Perceiver and the Perceiver IO. The original Perceiver is able to parse the most relevant data from inputs including images, videos, audio, and 3D data point clouds to complete simple classification-type tasks. The Perceiver IO improves on these capabilities to allow for outputs of significantly more complexity, such as natural language. Again, while this technology alone is not capable of full human-like cognition, it pushes the capabilities of computers further towards those of the human brain.

Using AI to model how humans think requires an understanding of the mechanisms of human cognition, a field of research that is busy at work. One cognitive element currently under study is relational abstraction. Put simply, this is the ability to make connections about relationships between objects, beyond the specific identities of the objects themselves. For instance, we can recognize flowers in a vase or water in a cup as two instances of a relation of containment. A preprint study conducted by Hafri et al. found that the human mind automatically interprets these relational abstractions regardless of whether given the task to do so, as evidenced by subjects mistaking a pair of objects for another that shares the same relationship. Identification of abstract relations is important for interpreting and understanding the world, and Camilo Fosco and co-author Alex Andonian, along with collaborators, developed a computational model to find such relations in sets of videos. Their model was trained and tested on its ability to detect abstract relations, match videos to a set of related reference videos, and determine which video in a set does not fit with the others on this relational basis. They also incorporated human annotations of the videos to help their model learn, as, in Camilo’s words, “there’s so much information in language [that] it’s almost a waste not to use it”, and “language can always give you meaningful supervision”. The model ended up performing nearly as well as, or better than, humans on these tasks (there was some performance variability between datasets).

Another crucial element of cognition that is under investigation is attention. There is endless information and stimuli in the world, but we must focus our attention on the information most pertinent to us--“attention is absolutely crucial for solving tasks”, as Camilo would say. Using computational algorithms to identify patterns of attention can offer insight into how the human mind takes in and processes information. For images, this can be studied by tracking where the eyes focus. Camilo and colleagues developed a computational model to study visual attention over time and found that the duration of time for which a person views an image impacts their gaze patterns. Such information can be applied in tasks for advertising, education, and automated image editing. For instance, an advertiser might choose to show a picture on screen for a shorter or longer amount of time depending on whether they want their audience’s eyes to wander around or focus only on the most eye-catching part.

While studies such as these reveal some of the elements of human attention, it is also important to note that machine learning-based attention may not be equivalent. Attention in machine learning is a mechanism that can be coded into algorithms so that the computer focuses on the most relevant parts of the input in order to complete a task. It does so by attributing different weights to the different input elements based on their relations to other parts of the input. In order to discuss how attention can be applied to visual tasks, it is worth clarifying how computer vision works. Since computers don’t have visual context through experience in the world, they must be given extensive data to learn how to process images and distinguish elements of importance through pattern recognition. Machine learning algorithms take in images and generate feature maps by accounting for the values of the pixels that comprise the input. These maps can be used for various tasks including assigning labels to part of (or the entirety of) the picture. With enough training and self-evaluation, the outputs become more reliable and accurate, and the computer learns to “see” what is depicted in the image. However, computer vision is still an ongoing field of research, as we do not fully understand how human vision works and thus it remains a challenge to teach computers to see the same way that we do. That being said, there are many applications of computer vision even with the limits of current technology. These applications include facial recognition, self-driving automobiles, and detecting medical anomalies in scans and images. Interestingly, a recent study from 2020 found that for several visual tasks, such as distinguishing differences between similar images, computers performed better when their attention mimicked that of humans (i.e. when the algorithms were trained using human visual attention maps rather than using what the algorithms could automatically learn through their own attention mechanisms). As such, increasing our understanding of how human attention works may lead to the development of AIs that perform even more effectively, at least on specific tasks. If you are curious about more aspects of computational learning, you may want to take a look at the Transformer. Simply stated, for tasks involving translating an input to an output, a Transformer compares different parts of the input to each other as a self-attention mechanism to help the program focus on what is important in order to shape the appropriate output.

Beyond where our attention is drawn, the memorability of the stimuli we perceive and interpret is also a crucial element of how humans deal with information. Research suggests that images have an intrinsic level of memorability. For example, a 2011 study by Isola et al. showed that there was consistency between images that were remembered by each half of a randomly split human participant group, and a 2013 follow-up study by Bainbridge et al. found a similar consistency effect with images specifically of faces, which had lower memorability compared to images of scenes (both studies were conducted in the lab of Aude Oliva, who is also Camilo’s advisor). There have been several machine learning programs developed in recent years to predict the memorability of images. A study presented in 2021 by Needell and Bainbridge found that their new model, which was based on residual neural networks and accounted for conceptual features, outperformed programs that only account for perceptual, but not conceptual, features. Perhaps just as interesting as predicting a static level of memorability for images is predicting the dynamics of memorability over time. Camilo and co-author Anelise Newman helmed a study in which they compiled the Memento10k database, which contains videos with captions and human annotations from variously delayed time points after initial viewing, and they used this database to develop a model to predict how memorability of the videos decayed over time. If computers’ abilities to predict visual memorability continue to improve, we may be able to utilize these predictions, similarly to the knowledge from attention-modeling, in shaping advertising, public information campaigns, student learning tools, and more.

Artificial intelligence in the context of human cognition and psychology is not limited solely to modeling cognitive processes; it also has the potential for psychological intervention practices. For instance, using algorithms to analyze large, clinically-relevant datasets, such as patient surveys and symptom reports, can offer insight into the types of services that are needed and predict effective treatment courses. Interestingly, direct computer-to-patient conversation technology has already been shown to have efficacy in therapy. A study published earlier this year by Darcy et al. found that patients were able to form a working alliance bond with an AI-based cognitive-behavioral therapy chatbot (called Woebot) to a comparable extent as with a human therapist. The study did not focus on long term therapeutic outcome, but the establishment of such relationships is promising, especially in the context of virtual therapy. Therapy administered by a computer program rather than a human therapist offers both advantages and disadvantages. For example, it may be able to improve access to treatment otherwise impeded by cost barriers or therapist shortages, but it may also have impacts on patient disclosure and trust. Furthermore, as AIs are programmed by humans, they may adopt some of their programmer’s innate biases. However, faculty members at Northwestern University have suggested that close monitoring of psychology AI algorithms during their development and use over time will be a useful way to catch biases early and correct for them. Further research may illuminate the effectiveness of different types of AI-administered therapy and thus may shape the future of psychological intervention to help as many people as possible.

All in all, the field of AI is very much entwined with psychology, in both computationally modeling the human mind and in applications to address mental distress. As we continue to uncover how human cognition works, we can improve our AI systems to better model these processes, as well as develop more practical applications across various fields, including healthcare and transportation. As such, we may, for better or worse, further blur the line between organic and computational minds.