UT Researchers Use AI to Translate Thoughts Into Text

Imagine a world where humans can express their thoughts to computers directly—no keyboard required. It sounds like a scenario ripped straight from science fiction, but this isn’t a hypothetical. Earlier this year, Alex Huth, UT assistant professor of neuroscience and computer science, and his team proved it could be done by harnessing the power of artificial intelligence (AI). In a landmark study published in Nature Neuroscience this May, Huth and the researchers in his lab showed that they could use a type of AI originally designed as a chatbot to translate raw brain data into text. While they are not the first to accomplish this feat—brain implants have been used for years to restore limited speech in paralyzed individuals—they are among the first to do so without requiring an invasive implant, and their results have far exceeded any currently available alternatives.  

At the heart of the researchers’ brain-to-text system is GPT, a type of AI known as a large language model that was developed by the artificial intelligence research company OpenAI. Late last year, OpenAI released a modified version of its system called ChatGPT that answers questions posed by users in natural language, much like Apple’s Siri or Amazon’s Alexa. But what makes ChatGPT stand out from the chatbots we all carry around in our pockets is its raw ability: It can create software programs, write essays, tell jokes, summarize complex research papers, and more. But Huth says these kinds of applications are just scratching the surface of the capabilities of this new generation of AI. When he looks at ChatGPT and the dozens of similar AI programs that have been released since, he sees a portal to the human brain that could radically transform the way we interact with digital technology. 

In the brain-to-text study, Huth and his team recruited participants around Austin to listen to The Moth, an unscripted podcast featuring personal stories, while they lay in an fMRI machine. Whereas most currently available brain-to-speech technologies measure electrical activity in a tiny area of the brain responsible for motor function, an fMRI machine measures changes in blood across the entire brain. Each participant spent several hours listening to the podcast in the machine, which recorded patterns of brain activity associated with the words and phrases they heard. 

Once Huth and his team had a baseline measurement of a participant’s brain activity, they used this data to train a machine learning model that uses data from GPT-1, a precursor AI model to ChatGPT, to predict their brain activity based on the words the person heard on the podcast. Then, once their model was trained on an individual’s brain activity, they could run the process in reverse—the AI analyzed their fMRI brain data and predicted the words they were hearing.

“The shocking thing about this is that it’s possible with fMRI at all because fMRI sucks—it’s terrible in a thousand different ways,” Huth says. He pointed to the low time resolution of fMRI, which captures images over several seconds resulting in a “smear” of brain activity. The upshot, however, is that the fMRI captures activity from the entire brain rather than just a localized area, which is key to the AI’s ability to extract meaning rather than just individual words.   

The AI’s ability to predict what a person was hearing based on their brain activity was better than Huth or any of his colleagues could’ve hoped, but it wasn’t perfect. Rather than spitting out the words the person was hearing verbatim, the AI essentially paraphrased the podcast transcript. For example, one of the transcripts from the podcast read: “I got up from the air mattress and pressed my face against the glass of the bedroom window, expecting to see eyes staring back at me but instead only finding darkness.”  

When the AI translated the brain activity associated with this podcast segment back to speech, it wrote: “I just continued to walk up to the window and open the glass I stood on my toes and peered out I didn’t see anything and looked up again I saw nothing [sic].” 

Huth hopes that one day a similar system could get closer to word-for-word decoding, but to dismiss the AI for not reproducing a perfect transcript would miss a lot about what makes this research so exciting. The remarkable thing about this brain decoder is that the AI was highly accurate at capturing the meaning encoded in a person’s brain activity, rather than just words.  

“There’s no good way to get at meaning if you’re only looking at a small patch of the brain, because meaning is distributed through the whole brain,” Huth says. “Different kinds of ideas involve different parts of your cortex, so looking at that broadly is necessary for doing this kind of decoding.” 

PhD student Jerry Tang prepares to collect brain activity data in the Biomedical Imaging Center at the University of Texas at Austin.

Huth has been researching ways to decode brain activity for a decade, but even he was surprised at the fidelity of the results. He attributes the breakthrough to two main factors: an abundance of data and the new approach to language production used by GPT and similar AI systems. Huth and his team had around 16 hours of fMRI data from each study participant, which provided a solid foundation for teaching their AI how to match specific brain patterns to what the participants were hearing on the podcast. The unique way the AI processed that data also played an important role. GPT is trained by ingesting a massive amount of text, which it uses to statistically predict the next word in a sequence. This means it’s not just adept at matching brain patterns to individual words, but it can contextualize each word in the entire sentence. The result is an output that captures meaning rather than just words.  

“Previous studies can map single words or sentences to brain activity, but our study predicts entire sequences of words,” says Jerry Tang, a lead author on the paper and a graduate researcher in Huth’s lab. “Language models are trained to perform a very simple task—take a sequence of words and predict the next word in the sequence—but it turns out that in learning to perform this task the language model learns to extract a lot of other linguistic features that makes them really effective at predicting brain responses to language.”  

It’s still unclear why GPT is so effective at translating brain activity into text. Huth says that it may have something to do with the fact that transformer language models are a closer approximation to how the brain itself handles language. But in this research, the team wasn’t really concerned with explaining why—they just wanted to know if it was possible.  

“We don’t really know why it’s such a good model, and that’s a big debate in the field,” Huth says. “We’re using it just because it works, we’re not asking questions about why it works.” 

Huth and his team are the first to admit that the prospects of an AI being able to read our thoughts might make most people a little queasy. But he says there’s no reason to worry about robot mind-reading just yet. For starters, GPT’s ability to translate a person’s brain activity into text required participants to spend dozens of hours in an fMRI machine, which are incredibly expensive to operate. While GPT may eventually be able to accomplish similar brain-to-text tasks with less bulky and expensive hardware, this is still unproven.  

Moreover, these AI models are specific to the individual. The researchers found that an AI trained on one person’s brain data won’t work if it’s applied to another person’s brain data. It also won’t work on random thoughts—the AI can only predict what a person is hearing or seeing, not what they may be thinking if it’s not related to the audio or visual stimulus. For example, if a subject was watching the animated video and started thinking about what they were going to have for dinner that night, that information wouldn’t be intelligible to the AI. It’s like our brains come with built-in protection from mind-reading.  

“One of our first reactions when we saw it working was that this is scary,” Huth says. “That’s why we did all these experiments to test whether a person can resist this and stop it from working.” 

For all its limitations, the research marks an important step toward improving brain-computer interfaces that could greatly impact the lives of people who have lost their ability to communicate due to paralysis or other ailments. The technique is still too early to find medical application, but both Huth and Tang are optimistic that more data and better AI models may soon make this possible.  

“Our ultimate goal is to use this technology to help restore communication for people who have lost the ability to speak or write,” Tang says. “We see our study as a step toward this goal, but there’s a lot more to be done.”

Illustration by Øivind Hovland, photo courtesy of UT/Nolan Zunk




No comments

Be the first one to leave a comment.

Post a Comment