Categories: Metaverse and A.I.

Why AI’s Large Language Models Can’t Spell ‘Strawberry’

How many times does the letter “r” appear in the word “strawberry”? According to formidable AI products like GPT-4o and Claude, the answer is twice.

Large language models (LLMs) can write essays and solve equations in seconds. They can synthesize terabytes of data faster than humans can open up a book. Yet, these seemingly omniscient AIs sometimes fail so spectacularly that the mishap turns into a viral meme, and we all rejoice in relief that maybe there’s still time before we must bow down to our new AI overlords.

The failure of large language models to understand the concepts of letters and syllables is indicative of a larger truth that we often forget: These things don’t have brains. They do not think like we do. They are not human, nor even particularly humanlike.

Most LLMs are built on transformers, a kind of deep learning architecture. Transformer models break text into tokens, which can be full words, syllables, or letters, depending on the model.

“LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.’”

This is because the transformers are not able to take in or output actual text efficiently. Instead, the text is converted into numerical representations of itself, which is then contextualized to help the AI come up with a logical response. In other words, the AI might know that the tokens “straw” and “berry” make up “strawberry,” but it may not understand that “strawberry” is composed of the letters “s,” “t,” “r,” “a,” “w,” “b,” “e,” “r,” “r,” and “y,” in that specific order. Thus, it cannot tell you how many letters — let alone how many “r”s — appear in the word “strawberry.”

This isn’t an easy issue to fix, since it’s embedded into the very architecture that makes these LLMs work. TechCrunch’s Kyle Wiggers dug into this problem last month and spoke to Sheridan Feucht, a PhD student at Northeastern University studying LLM interpretability.

“It’s kind of hard to get around the question of what exactly a ‘word’ should be for a language model, and even if we got human experts to agree on a perfect token vocabulary, models would probably still find it useful to ‘chunk’ things even further,” Feucht told TechCrunch. “My guess would be that there’s no such thing as a perfect tokenizer due to this kind of fuzziness.”

This problem becomes even more complex as an LLM learns more languages. For example, some tokenization methods might assume that a space in a sentence will always precede a new word, but many languages like Chinese, Japanese, Thai, Lao, Korean, Khmer and others do not use spaces to separate words. Google DeepMind AI researcher Yennie Jun found in a 2023 study that some languages need up to 10 times as many tokens as English to communicate the same meaning.

“It’s probably best to let models look at characters directly without imposing tokenization, but right now that’s just computationally infeasible for transformers,” Feucht said.

Image generators like Midjourney and DALL-E don’t use the transformer architecture that lies beneath the hood of text generators like ChatGPT. Instead, image generators usually use diffusion models, which reconstruct an image from noise. Diffusion models are trained on large databases of images, and they’re incentivized to try to re-create something like what they learned from training data.

Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute, told TechCrunch, “Image generators tend to perform much better on artifacts like cars and people’s faces, and less so on smaller things like fingers and handwriting.”

This could be because these smaller details don’t often appear as prominently in training sets as concepts like how trees usually have green leaves. The problems with diffusion models might be easier to fix than the ones plaguing transformers, though. Some image generators have improved at representing hands, for example, by training on more images of real, human hands.

“Even just last year, all these models were really bad at fingers, and that’s exactly the same problem as text,” Guzdial explained. “They’re getting really good at it locally, so if you look at a hand with six or seven fingers on it, you could say, ‘Oh wow, that looks like a finger.’ Similarly, with the generated text, you could say, that looks like an ‘H,’ and that looks like a ‘P,’ but they’re really bad at structuring these whole things together.”

As these memes about spelling “strawberry” spill across the internet, OpenAI is working on a new AI product code-named Strawberry, which is supposed to be even more adept at reasoning. The growth of LLMs has been limited by the fact that there simply isn’t enough training data in the world to make products like ChatGPT more accurate. But Strawberry can reportedly generate accurate synthetic data to make OpenAI’s LLMs even better. According to The Information, Strawberry can solve the New York Times’ Connections word puzzles, which require creative thinking and pattern recognition to solve and can solve math equations that it hasn’t seen before.

Terron Gold

Next OpenSea Gets 'Wells Notice' From SEC, Which Calls NFTs Sold on Platform 'Securities' »

Previous « Wyoming Aims to Launch 'Dollar-Dependent' Stablecoin in Q1 2025

Published by

Terron Gold

Tags: AILarge language modelsLLMmidjourneystrawberry

2 years ago

Senator Murphy Alleges White House Insiders Profited From Iran Strike Bets, Pushes to Ban Prediction Markets on Government Actions

U.S. Senator Chris Murphy (D-Conn.) is calling for legislation to ban prediction markets that allow traders to bet…

2 days ago

U.S. Regulation

IRS Proposes Electronic-Only Delivery For Crypto Tax Forms Under New Reporting Rules

The U.S. Internal Revenue Service (IRS) has proposed a new rule that would allow cryptocy brokers to deliver…

2 days ago

Market Watch

Crypto-Friendly Fintech Revolut Files For U.S. Banking License to Expand Crypto and Payments Services

Global fintech powerhouse Revolut has filed an application for a U.S. banking license, a move that would allow…

2 days ago

Crime

Suspect Arrested on Caribbean Island of Saint Martin in $46M Seized Crypto Theft Case

A man accused of stealing tens of millions of dollars in cryptocy from U.S. government…

2 days ago

Market Watch

NYSE Parent ICE Invests in Crypto Exchange OKX at $25B Valuation Amid Tokenized Stocks Push

Intercontinental Exchange (ICE) — the parent company of the New York Stock Exchange — has taken a strategic…

2 days ago

Metaverse and A.I.

AI Models Favor Bitcoin as a Store of Value, Stablecoins for Payments, BPI Study Finds

A new study from the Bitcoin Policy Institute (BPI) found that leading artificial intelligence models overwhelmingly favor Bitcoin…

2 days ago

Why AI’s Large Language Models Can’t Spell ‘Strawberry’

Related Post

Recent Posts

Senator Murphy Alleges White House Insiders Profited From Iran Strike Bets, Pushes to Ban Prediction Markets on Government Actions

IRS Proposes Electronic-Only Delivery For Crypto Tax Forms Under New Reporting Rules

Crypto-Friendly Fintech Revolut Files For U.S. Banking License to Expand Crypto and Payments Services

Suspect Arrested on Caribbean Island of Saint Martin in $46M Seized Crypto Theft Case

NYSE Parent ICE Invests in Crypto Exchange OKX at $25B Valuation Amid Tokenized Stocks Push

AI Models Favor Bitcoin as a Store of Value, Stablecoins for Payments, BPI Study Finds