So, in summary…
Last week we asked an important question: How can we use LLMs (and AI tools in general) to help summarize things that we'd like to read?
As I said, I find that I spend roughly half of my research time just browsing, scanning, and reading—it’s what you need to do to understand a topic area enough to do real work.
Will AI systems make the long, original documents into a shorter, more focused work, punchier crystals of knowledge? Or will that tool end up blurring everything into mush? That's our choice: beautiful crystals of insight or mushy oatmealy language.
What did we find out?
1. How well does your favorite bit of AI technology do at summarizing long texts? What system do you like for summarizing, and why?
Let me start by pointing out the obvious: Creating a good summary is fairly hard.
I could quote Woody Allen's quip about speed-reading the infamously lengthy Tolstoy novel War and Peace--"It's about Russia."
That's a summary, but not a useful one. (Likewise, summarizing Moby Dick as "It's about whales," also doesn't help anyone.)
What makes for a good summary?
Interestingly, there are oodles of testing systems that measure the performance of LLMs on various dimensions: Closed-book Open-domain question-answering, Reading Comprehension, Instruction Following, Fact-Checking... etc.
One of those dimensions is "Summarization." (Probably the best known of the LLM performance measurements is the Hallucinations Leaderboard by HuggingFace. You can click on it for today's measures on a bunch of different LLMs on a bunch of different dimensions.)
Basically, these measures of the quality of a summarization look at the overall similarity between the text (usually texts that are between 200 - 4K words long) and the summary that's generated by the AI tool. That is, if there's a high degree of overlap in the words used in the summary and the original text, then the score is high. (The most commonly used measure is the ROUGE score to measure summary/original similarity.)
Of course, while the ROUGE score is handy and easy to compute, it's not necessarily what makes for a good summary. If I summarize Frankenstein as "an exploration of how a quest for understanding the secret of life, and how it all goes horribly wrong," that's not going to be ranked very highly by ROUGE.
A good summary, like a good explanation, is largely determined by who the summary is for, not its term overlaps. A good summary of War and Peace to a 10-year old kid is (or should be) very different than for a 25-year old graduate student in world literature. Likewise, a summary of something very technical (say, Calculus: Early Transcendentals by James Stewart, probably the best-selling math textbook on calculus in English) would be very different depending on the target, but also what the purpose of the summary is. Summaries can have very, very different uses. A summary of a book for the New York Times Book Review (which is intended to give a critical perspective on the text) will be different from a review that you post on GoodReads.com (which usually gives the plot points and description).
The bigger question is actually "what LLM will actually summarize long texts?"
2. Test out your summarization method on two long-form texts. First, let’s try that classic gothic tale, Frankenstein by Mary Shelley. (Here’s a link to the full text that you can use.) How good is that summary?
As I'm sure you found out, there's no convenient way to upload a long text file into Bard/Gemini. If you paste the entire text of Frankenstein into Bard, it stops accepting input around page 15. This is obviously a problem.
If you upload the full-text of Frankenstein into ChatGPT4, it not only uploads easily, but if you then ask for a summary, it produces a very nice one. This is the ChatGPT4 summary of Frankenstein:
The novel "Frankenstein" by Mary Shelley begins with letters from an explorer named Robert Walton to his sister, where he describes his journey to the North Pole. During this expedition, Walton encounters Victor Frankenstein, who then narrates his own tragic story of ambition, creation, and loss. Victor, a young scientist, creates a sentient creature in an experiment, but is horrified by his creation and abandons it. The creature, seeking companionship and understanding, faces rejection and becomes vengeful, leading to tragic consequences for both creator and creation. The novel explores themes of ambition, isolation, responsibility, and the quest for knowledge.
If you ask for a longer summary (which I did), it creates a slightly longer version, but only about 100 words longer. I couldn't figure out how to get it to create a much longer version. (Even when I asked for 1000 words, it topped out at 400.)
So, for summarizing long texts, ChatGPT4 wins over Bard by a monstrous amount.
3. Second, let’s try your summarization method on a piece of text you might have read, but might have slightly forgotten—Chapter 10 of my book, The Joy of Search, link to the full-text of Chapter 10 here. "When was oil first discovered in California?" Did your summary method create a good/useful summary?
This was really interesting: I got both ChatGPT4 and Bard/Gemini to produce good summaries of Chapter 10 of The Joy of Search. Let's look at their summaries side-by-side.
First, here's Bard/Gemini's summary:
Summary of the Text on California Oil Discovery:
The text explores the complexities of discovering when and where oil was first found in California. It emphasizes the importance of critical thinking and careful research when navigating historical information.
- Multiple Claims: Different sources make various claims about "first" oil discoveries, citing events like:
- 1865: Union Matolle Company's commercial sale from Humboldt County.
- 1876: Pico No. 4 well producing continuously for over a century.
- 1892: Edward Doheny's supposed oil strike using a eucalyptus trunk (less credible).
- 1855: Andreas Pico's small-scale distillation for local use.
- Challenges in Research:
- Unreliable sources and repetition of errors, especially online.
- Conflicting interests and "romantic wildcatter" narratives influencing storytelling.
- Best Answer:
- For commercial sales, evidence points to the Union Matolle Company in 1865.
- For long-term success and impact, Pico No. 4 in 1876 holds the title.
- Research Lessons:
- Define your question clearly.
- Be wary of duplicated content and unattributed citations.
- Organize your findings and compare competing claims with evidence.
- The text criticizes poor scholarship and plagiarized content.
- It emphasizes the importance of scrutinizing sources and understanding potential biases.
- The provided research tips can be applied to various historical inquiries.
This summary captures the main points of the text while highlighting its critical approach to historical research.
And here's ChatGPT4's summary:
The author finds that the Union Matolle Company had the first commercial sales of refined oil from a well in Humboldt County in 1865, but Pico No. 4, beginning operations in 1876, had the most enduring success. The narrative concludes with insights on research methodology, emphasizing the importance of precise questions, skepticism toward duplicated content, and the organization of findings to answer complex research questions.
Both are decent summaries, but there's an interesting difference between them. ChatGPT4's summary is primarily focused on the oil discovery claims (that's what the 4 points are all about).
By contrast, Bard/Gemini's summary is focused on what makes this online research difficult to do, and the lessons you should learn along the way. (That's what Bard's bullet points highlight.)
To be sure, they both mention that this chapter is about research methods and skills... but I think Bard/Gemini gets the focus a bit more right than ChatGPT4.
1. As we've learned before, different tools have different strengths. ChatGPT4 is MUCH more able to handle large quantities of information (we could upload Frankenstein to ChatGPT4 but not to Bard). You, the SearchResearcher need to understand what you can do with each tool.
2. Summaries are complicated--ask for exactly the kind of summary you need. For instance, I could have asked for a summary of Chapter 10 "written for a 6th grader" and the language and lessons would have been much simpler.
I'm sure I'll have more to say about this topic in days to come. But for the moment, different LLMs have different strengths. Try them all!