- In March of 2024, U.S.-based AI company Anthropic released Claude 3, an update to its powerful large language model AI.
- Its immense capabilities, especially some introspection during testing, left some wondering if Claude 3 had reached a certain level of self-awareness, or even sentience.
- While Claude 3’s abilities are impressive, they’re still a reflection of the AI’s (admittedly) remarkable ability to identify patterns, and lacks the important intelligence criteria to match human sentience.
AI large language models (LLMs)—such as Chat GPT, Claude, and Gemini (formerly Bard)—appear to go through a predictable hype cycle. Posts trickle out about a new model’s impressive capabilities, people are floored by the model’s sophistication (or experience existential dread over losing their jobs), and, if you’re lucky, someone starts claiming that this new-and-improved LLM is displaying signs of sentience.
This hype cycle is currently in full force for Claude 3, an LLM created by the U.S.-based AI company Anthropic. In early March, the company introduced its latest lineup of AI models, Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus—all in ascending order of capability. The new models delivered updates across the board, including near-perfect recall, less hallucinations (a.k.a. incorrect answers), and quicker response times.
“Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more,” Anthropic wrote in its announcement release. “It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.”
Following the announcement, AI experts posted their own thoughts on X (formerly Twitter), and detailed some pretty impressive results. As Live Science details, one expert compared how quickly Claude 3 could summarize a 42-page PDF (almost instantly) to Open AI’s Chat GPT-4 (much slower).
But things got creepier when Anthropic prompt engineer Alex Albert pulled back the testing curtain to detail one of the more strange responses Claude 3 gave when fulfilling certain tasks designed to stump it. In a post on X, Albert said they were performing a “needle-in-the-haystack eval,” where the a sentence is inserted into a random documents and then a question is asked that only that sentence can answer. Claude 3’s response to the question was…surprising.
“When we ran this test on Opus, we noticed some interesting behavior—it seemed to suspect that we were running an eval on it,” Albert posted on X. “Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.”
This was Claude’s response after correctly finding the “needle” and answering the question:
“However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping “fact” may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings”
So, is this an example of the long-sought-after artificial general intelligence, or AGI? A thinking robot that shows signs of sentience? Well, no. Similarly to when ChatGPT-4 debuted in 2023, some researchers reported that the model showed “sparks of AGI” in early testing. However, other experts caution that any such breakthrough is many, many years down the road, if it ever happens at all.
One prevailing theory is that AGI requires mastery of many different types of intelligences, including things like mathematical, interpersonal, intrapersonal, spatial-visual, etc… While LLMs like Claude are after superhuman abilities when it comes to learned behaviors, analyzing data, and producing results, these things only make up a portion of human intelligence. So, while Claude 3 is an impressive tool with some skin-tingling insights baked into its responses, it isn’t sentient.
Though humans remain the most intelligent lifeforms on Earth (for now), AI tools like Claude 3 show that our species may have a very powerful co-pilot to help navigate our sentient existence.