Comment: LLMs, truth, and consistency--they don't have any idea

LLMs have a big problem:

P/C Dall-E. Prompt: “happy robots talking about truth”

They have zero idea what’s true. They only know the probabilities of words in text. That’s NOT the same thing as “knowing” something–it’s a bit like knowing that “lion” is the most likely word following “king of the jungle…” without having any idea about monarchies, metaphor, or what a king really is all about.

The folks at Oxford Semantic Technologies wrote an interesting blog post about LLMs and finding verifiable facts. They call the fundamental problem the “Snow White Problem.” The key idea is that LLMs don’t really know what’s true–they just know what’s likely.

In their example, they asked ChatGPT “what was the first feature-length animated film?” ChatGPT confidently said that it was the 1937 film “Snow White.”

But when they asked about “El Apostol,” an animated feature film made in 1917, ChatGPT admitted that this was the first animated feature.

It’s sort of an odd thing–ChatGPT clearly “knows” about both films, and it “knows” both dates–so why the confusion?

Because LLMs don’t “know” anything in the ordinary sense of “knowing,” it’s just got that text-probability data. That is, inconsistent knowledge just isn’t a thing for LLMs. Having coherence within a conversation doesn’t really matter much. If you ask ChatGPT:

Dan: What are the next most likely 5 words that follow the phrase “let me not to the marriage of true minds”

ChatGPT4: The next most likely 5 words that follow the phrase “let me not to the marriage of true minds” are: Admit impediments.

Well, that’s just plain wrong. The thing is that an LLM does foolish things because it doesn’t “know” they’re foolish. I asked for 5 words, not 2.

On the other hand, Bard did much better with:

Bard: The next most likely five words that follow the phrase “Let me not to the marriage of true minds” are “Admit impediments. Love is not,” which is from William Shakespeare’s Sonnet 116. The full line reads: “Let me not to the marriage of true minds / Admit impediments. Love is not love.”

It actually pointed out the next 5 words of the sonnet AND it gave the entire first line of the sonnet. Did it just get lucky? Or is there an internal consistency thing at work?

So I thought I’d try a similar tack to what Oxford Semantic Tech did in their post and ask about what was first short animated film.

I asked Bard:

Dan: What was the first animated short?

Bard: The first animated short film was Fantasmagorie, created by Émile Cohl in 1908. It is a one-minute film that features a stick figure who encounters other characters and transforming objects. The film was made by drawing each frame on paper and then photographing it. Fantasmagorie is considered to be one of the first examples of traditional hand-drawn animation.

Interesting. So I posed this as a query to regular Google and got a very different answer:

Interesting. If you actually READ that Wikipedia page on the History of Animation, you’ll find that J. Stuart Blackton created an animated film, “The Enchanted Drawing” in 1900 that had elements of hand-drawing along with live action elements. A few years later, he produced another film, “Humorous Phases of Funny Faces” in 1906 with even more bits of what we’d think of as animation.

I went back to Bard and asked:

Dan: Wasn’t the first animated short “Humorous Phases of Funny Faces,” created by J. Stuart Blackton in 1906?

Bard: Yes, you are correct. Humorous Phases of Funny Faces, created by J. Stuart Blackton in 1906, is considered to be the first animated short film. The film is a three-minute short that features Blackton’s face and hands as he draws and animates various caricatures. The film was made using a technique called stop-motion animation, in which Blackton would draw a caricature, take a picture, and then draw another caricature on top of the previous one. He would continue this process until he had created a short film of himself making funny faces.

I apologize for the previous mistake. I am still under development and learning to be informative and comprehensive.

Points for politeness, but the “Snow White” effect points out another instance in which the LLM doesn’t know its own mind. Did it really change its mind?

One more attempt:

Dan: And didn’t J. Stuart Blackton produce an animated short in 1900? That would be even earlier, yes?

Bard: That’s correct. J. Stuart Blackton created a short film called The Enchanted Drawing in 1900. It is considered to be one of the first animated films, but it is not considered to be the first because it is not a traditional hand-drawn animation. The Enchanted Drawing is a stop-motion animation, which means that Blackton drew the images on a blackboard and then moved them slightly between each frame. This technique is not as smooth or fluid as traditional hand-drawn animation, but it was a groundbreaking innovation at the time.

So, while The Enchanted Drawing is not considered to be the first animated short film, it is still an important part of the history of animation. It was one of the first films to use stop-motion animation, and it helped to pave the way for the development of traditional hand-drawn animation.

Contradictory, no?

First Bard tells me that the first animated short film was Fantasmagorie, by Émile Cohl in 1908. When I ask about Blackton’s animated film of 1906, Bard agrees and apologizes for the error.

When I ask about Blackton’s 1900 piece, it says that yes, it’s from 1900, but while it’s “one of the first animated films, it’s not the first because it is not a traditional hand-drawn animation…” Wait a second! Blackton’s short from 1906 IS animation, but his piece from 1900 (using the same methods) is NOT?

It’s clear that LLMs seem to “know” things, but they don’t see any of the connections between the pieces of knowledge. It is as though everything is dissociated; and since everything is separate, contradictions and internal inconsistencies don’t trouble it at all.

To understand that something is inconsistent means that you (or the LLM) recognizes that there are different statements about something, and that both cannot be true simultaneously. In this case, one of these animated shorts is the “first.” Was it Blackton’s 1900 “The Enchanted Drawing,” or Blackton’s 1906 work “Humorous Phases of Funny Faces,” or was it Cohl’s 1907 “Japon de fantasie,” or Cohl’s 1908 “Fantasmagorie”? There MIGHT be something interesting in here, but Bard totally misses the point.

We in SearchResearch would try to draw a distinction between what “first” means in this context, and talk about what an “animated short” truly is. But that’s not a conversation an LLM can have. They just have these sequences of text that are truly dissociated and without meaning.

Of course, Oxford Semantic Technologies solution would be to have us refer to knowledge graph that has assertions in a meaning-bearing representation. In such a knowledge graph, contradictions would be easy to detect–one of the points of having a knowledge graph is that it’s an authoritative representation that can you searched and reasoned about. If there’s a contradiction in the graph, you can find it easily.

That’s a laudable goal. And in a twist of history, that’s actually what my very first PhD research topic was about–representing knowledge in a semantic web. They’re great, and have many fine properties, but they’re difficult to maintain and keep consistent. Wonderful tools, but still probably in the future.

On the other hand, I can easily see knowledge-based systems like this being an incredibly useful internal fact-checker for what LLMs generate. Imagine a knowledge-based system working hand-in-hand (code-in-code?) with an LLM text-generator. There’s real possibility of power there. (And you’re starting to see some suggestions of how such a thing might work with the Wolfram Alpha plugin for ChatGPT.)

But we can’t count on LLMs to tell us true things. At least not by themselves.

Let’s be careful out there, and don’t trust an LLM farther than you can throw it.

Keep searching.