• About
    • About the FIA
    • Priorities
    • Our Team
    • Brainstorming Board
    • Partners and Affiliates
    • Contact Us
  • News + Events
    • News
    • Events
    • Videos
    • Newsletters
    • @FIAumd
    • In the Media
  • Spark Grants
    • Spark Grants Overview
    • Spark Grants FAQ
    • 2012-2015 Seed Grants
    • 2012-2015 Seed Grant Winners
  • Special Topics
    • SearchReSearch
    • Curated Topics
FIA

SearchReSearch

Answer: How can I search over audio?

Dan Russell • February 23, 2022
 SearchReSearch
Republished with permission from SearchReSearch
Answer: How can I search over audio? Dan Russell

We live in a multi-media world...


So why shouldn't search engines work on audio files as well?

This question originally came up for me when I was looking for a particular episode of RadioLab. This is a wonderful podcast with much that's thought-provoking, and memorable.

Except when you can't remember WHICH episode that memorable comment was made.

The other time I need to search through audio is when I have a recording of some event, and I'd like to be able to search the TEXT of that recording.

These audio search questions leads to this week's Challenges:

1. Is there some way I can search through all of the podcasts on the internet for ones that mention a particular topic? Let's try finding a few podcasts that discuss the way oceanic tides work. Can you find a podcast or two?

It's not hard to find a podcast about ocean tides:

[ ocean tides podcast ]

will turn up dozens. That's pretty straightforward. Most podcasts have gone out of their way to make the podcast a discoverable object by search engines. That means they have a title page with the name (including the word "podcast") and usually links to audio recordings. The better ones (in my opinion) also have transcripts of the podcast content. (RadioLab does this, but not every podcast does. You have to treasure the ones that do provide transcripts.)

A very real question is "have you found ALL of the podcasts"?

This is called coverage--that is, does your search engine provide a really complete set of results for the topic you're searching?

That's a difficult question to answer, but if you compare the results from Bing, DuckDuckGo, and from Google, you'll see they're really pretty similar (the top 10 are exactly the same).

Which suggests that we might want to find a search engine that's specialized for podcasts.

So, I did a search for podcast search engines: [ search engine podcast ] and found several. Here's my list:

* CastBox.fm – (about Castbox) All they index is podcasts, so that's all you'll find here. Use the magnifying glass lens to bring up the search box. Alas, I couldn't figure out how to search the contents of a podcast.

* AudioBurst.com - (about) seems well suited for searching recent radio programs, but they couldn't find any podcasts with ocean tide (which seems odd--as we know, there are a bunch). They index the full-text of the shows. One nice thing is that you can control the degree of match: exact,all, any).

* Google Podcasts - (about) This is the Google podcast search tool. Oddly, while it seems to index the contents of the podcast, it doesn't find nearly as many podcasts about ocean tides as regular old Google search does. Huh.

* ListenNotes - (about) I have to admit that this did the best job of all of the podcast search tools, finding many plausible casts. It returned so many results that I had to ultimately use double quotes to limit the results to just those with "ocean tides" in the podcast text.

There are a few other podcast search tools, but they're mostly limited in their coverage.

I would be remiss to not mention YouTube as a podcast source. Lots of podcasters put their casts up on YT, so be sure to check there as well.


2. If I have a recording of a conversation, what's the best way to be able to search the contents of that recording for mentions of a particular key word or phrase? How would you recommend I do this? (Bonus points if you can figure out how to do this for more than just English.)

There are a number of ways to do this. Here's the method that my buddy, Henk van Ess, posted about recently.

Method 1:

1. Convert audio to mp3 using one of the many converters available.

2. Use VideoIndexer to upload, speech reco, and index it.

3. Make sure you choose the right language. (And let's hope yours is supported.)


Method 2:

1. Upload your audio to YouTube (yes, create a new YouTube video with just the audio track).

2. After the upload is done, you can get access to a time-stamped text file with the text in it. According to YouTube, they support: English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

As always, there are a number of additional ways to do this: Art Weiss reccommends HappyScribe.com (he tested it on a Hebrew audio file and was impressed with the accuracy). I haven't tried it, but if you do, post your results in the comment thread.

Jon also points to Otter.ai for transcription services. They have automated speech / audio recognition with summary keywords, highlights, and full audio transcripts. Their service offers 600 mins free every month. It does require an account.

3. How can I find a particular non-spoken sound--say, the bells of Notre Dame or the sound of a glass harmonica?

This wasn't hard, but absolutely fun.

The glass harmonica (aka "armonica" -- both spellings are allowed) can be easily found on YouTube (examples: Thomas Bloch, Adagio for Glass Harmonica by Mozart, glass harmonica setup and assembly). Likewise for Notre Dame bells, finding pre-fire bells is easy (850 anniversary peal).

But of course, there are other, specialized collections of sounds that you might want to access (or download for your media project). In general, the best approach is to look for that particular collection and then search within the collection. Examples: [cartoon sound effects] or [famous speeches].

And, as always, don't forget the Internet Archive Audio file search. (Not just for podcasts, but also for all of those sounds you've been searching for.)


SearchResearch Lessons

1. Searching for particular audio is possible, but it might require checking multiple sources. Google is good, but it doesn't have perfect coverage of all the possible audio on the internet. Check multiple sources! (And, every so often, look for a new audio/podcast search tool. You never know what you'll find.)

2. Doing your own speech recognition isn't hard anymore: just look for a transcription service (or use YouTube).


And that RadioLab clip I was looking for? All I remembered about it was that they were talking about some kind of butterfly--a kind of butterfly that was called a "satyr." My query, [ Radiolab satyr butterfly] was enough to find me the episode... but ONLY because they provided the transcript!

As always... Search on!

Share

Comments

This post was republished. Comments can be viewed and shared via the original site.
13 comments

About the Author

Dan RussellDan Russell

I study the way people search and research. I guess that makes me an anthropologist of search. While I work at Google, my blog and G+ posts reflects my own thoughts and not those of my employer. I am FIA's Future-ist in Residence. More »

Recent News

  • Deepfakes and the Future of Facts
    Deepfakes and the Future of FactsSeptember 27, 2019
  • Book cover for Joy of Search by Daniel M. Russell
    The Joy of Search: A Google Insider’s Guide to Going Beyond the BasicsSeptember 26, 2019
  • The Future of Facts in a ‘Post-Truth’ World
    The Future of Facts in a ‘Post-Truth’ WorldMay 15, 2018
  • The Future of Virtual and Augmented Reality and Immersive Storytelling
    The Future of Virtual and Augmented Reality and Immersive StorytellingJune 6, 2017

More »

Upcoming Events

There are no upcoming events scheduled. Please check back later.
Event Archive »
Video Archive »

Join Email List

SearchReSearch

  • Answer: What do these everyday symbols mean?
    Answer: What do these everyday symbols mean?March 15, 2023
  • SearchResearch Challenge (3/8/23): What do these everyday symbols mean?
    SearchResearch Challenge (3/8/23): What do these everyday symbols mean?March 8, 2023
  • PSA:  Read Clive Thompson’s article about how he does research
    PSA: Read Clive Thompson’s article about how he does researchMarch 3, 2023
  • Answer: World’s largest waterfall?
    Answer: World’s largest waterfall?March 2, 2023

More »

University of Maryland logo
Robert W. Deutsch Foundation logo
Google logo
Barrie School
Library of Congress logo
State of Maryland logo
National Archives logo
National Geographic Society logo
National Park Service logo
Newseum logo
Sesame Workshop logo
Smithsonian logo
WAMU
© 2023 The Future of Information Alliance, University of Maryland | Privacy Policy | Web Accessibility