Answer: How well do LLMs answer SRS questions?

Remember this?

P/C Dall-e. Prompt: happy robots answering questions rendered in a
ukiyo-e style on a sweeping landscape cheerful

Our Challenge was this:

1. I’d like you to report on YOUR experiences in trying to get ChatGPT or Bard (or whichever LLM you’d like to use) to answer your curious questions. What was the question you were trying to answer? How well did it turn out?

Hope you had a chance to read my comments from the previous week.

On April 21 I wrote about why LLMs are all cybernetic mansplaining—and I mean that in the most negative way possible. If mansplaining is a kind of condescending explanation about something the man has incomplete knowledge (and with the mistaken assumption that he knows more about it than the person he’s talking to does), then that’s what’s going on, cyberneticly.

On April 23 I wrote another post about how LLMs seem to know things, but when you question them closely, they don’t actually know much at all.

Fred/Krossbow made the excellent point that it’s not clear that Bard is learning. After asking a question, then asking a follow-up and getting a changed response: “Bard corrected the response. What I now wonder: will Bard keep that correction if I ask later today? Will Bard give the same response to someone else?”

It’s unclear. I’m sure this kind of memory (and gradual learning) will become part of the LLMs. But at the moment, it’s not happening.

And that’s a big part of the problem with LLMs: We just don’t know what they’re doing, why, or how.

As several people have pointed out, that’s true of humans as well. I have no idea what you (my dear reader) are capable of doing, whether you’re learning or not… but I have decades of experience dealing with other humans of your make and model, and I far a pretty good idea about what a human’s performance characteristics are. I don’t have anything similar for an LLM. Even if I spent a lot of time developing one, it might well change tomorrow when a new model is pushed out to the servers. Which LLM are you talking to now?

P/C Dall-E. Prompt: [ twenty robots, all slightly different from each other, trying to answer questions in a hyperrealistic style 3d rendering ]

What happens when the fundamental LLM question-answering system changes moment by moment?

Of course, that’s what happens with Google’s index. It’s varying all the time as well, and it’s why you sometimes get different answers to the same query from day-to-day–the underlying data has changed.

And perhaps we’ll get used to the constant evolution of our tools. It’s an interesting perspective to have.

mateojose1 wonders if LLMs are complemented by deep knowledge components (e.g., grafting Wolfram Alpha to handle the heavy math chores), if THEN we’ll get citations.

I think that’s part of the goal. I’ve been playing around with Scite.ai LLM for the scholarly literature (think of it as ChatGPT trained on the contents of Google Scholar). It’s been working really well for me when I ask it questions that are “reasonably scholarly,” that is, with papers that might address the question at hand. I’ve been impressed with the quality of the answers, along with the lack of hallucination AND the presence of accurate citations.

This LLM (scite.ai) is so interesting that I’d devote an entire post to it soon. (Note that I’m not getting any funding from them to talk about their service. I’ve just been impressed.)

As usual, remmij has a plethora of interesting links for us to consider. You have to love remmij’s “robots throwing an LLM into space” Dall-E images. Wonderful. (Worth a click.)

But I also really agree with the link that points to Beren Millidge’s blog post about how LLMs “confabulate not hallucinate.”

This is a great point–the term “hallucination” really means that one experiences an apparent sensory perception of something not actually present. At the same time “confabulation” happens when someone is not able to explain or answer questions correctly, but does so anyway. The confabulator (that’s a real word, BTW) literally doesn’t know if what they’re saying is true or not, but does ahead regardless. That’s much more like what’s going on with LLMs.

Thanks to everyone for their thoughts. It’s been fun to read them the past week. Sorry about the delay. I was at a conference in Hamburg, Germany. As usual, I thought I would have the time to post my reply, but instead I was completely absorbed in what was happening. As you can imagine, we all spent a lot of time chatting about LLMs and how humans would understand them and grow to use them.

The consensus was that we’re just at the beginning of the LLMs arms race–all of the things we worry about (truth, credibility, accuracy, etc.) are being challenged in new and slightly askew ways.

I feel like one of the essential messages of SearchResearch has always been that we need to understand what our tools are and how they operate. The ChatGPTs and LLMs of the world are clearly new tools with great possibilities–and we still need to understand them and their limits.

We’ll do our best, here in the little SRS shop on the prairie.

Keep searching, my friends.

News

Answer: How well do LLMs answer SRS questions?

Recent News

Upcoming Events

Answer: How well do LLMs answer SRS questions?

Share

Recent News

Upcoming Events