• About
    • About the FIA
    • Priorities
    • Our Team
    • Brainstorming Board
    • Partners and Affiliates
    • Contact Us
  • News + Events
    • News
    • Events
    • Videos
    • Newsletters
    • @FIAumd
    • In the Media
  • Spark Grants
    • Spark Grants Overview
    • Spark Grants FAQ
    • 2012-2015 Seed Grants
    • 2012-2015 Seed Grant Winners
  • Special Topics
    • SearchReSearch
    • Curated Topics
FIA

SearchReSearch

Search Challenge (9/2/15): How to search in a scanned document?

Dan Russell • September 2, 2015
 SearchReSearch
Republished with permission from SearchReSearch
Search Challenge (9/2/15): How to search in a scanned document? Dan Russell

If your research is like mine...

... you fairly frequently find a document that's from another era. It doesn't even have to be that long ago before you find yourself dealing with infernally annoying crufty docs.

For instance, when I'm searching, I fairly often find a document that was scanned as an image. It's great to have the document in the first place, but as a scan, it's often less than completely useful.

Here's an example. A document I found in one of my research studies was this excellent paper that's available only in a scanned PDF format. (Here's the LINK to the paper.) When you open it up, you'll see sections that appear like this:


Of course, our usual Control-F / CMD-F tricks don't work on this kind of scanned doc, and since this is a long paper, it makes it very much harder to read. In particular, what I WANT is this--something I CAN use Control-F on:



Our SearchResearch Challenge for this week is meant to give you an additional powerful tool for importing scanned documents and making them findable.

1. How can you transform this document (LINK) into something that you can search within?
2. Once you've done that, can you determine how many times the authors refer to "multiple documents" in that paper? (This was my original search task--finding interesting papers about how people read multiple documents at the same reading session. That's how I found this paper.)

So this Challenge is really about "tool finding" -- can you figure out how to convert from a scanned document into a readable / findable / searchable one?

(Big hint: It's much easier than you think.)

Let us know how you found out how to do the magic process!

Search on!

Share

Comments

This post was republished. Comments can be viewed and shared via the original site.
16 comments

About the Author

Dan RussellDan Russell

I study the way people search and research. I guess that makes me an anthropologist of search. While I work at Google, my blog and G+ posts reflects my own thoughts and not those of my employer. I am FIA's Future-ist in Residence. More »

Recent News

  • Deepfakes and the Future of Facts
    Deepfakes and the Future of FactsSeptember 27, 2019
  • Book cover for Joy of Search by Daniel M. Russell
    The Joy of Search: A Google Insider’s Guide to Going Beyond the BasicsSeptember 26, 2019
  • The Future of Facts in a ‘Post-Truth’ World
    The Future of Facts in a ‘Post-Truth’ WorldMay 15, 2018
  • The Future of Virtual and Augmented Reality and Immersive Storytelling
    The Future of Virtual and Augmented Reality and Immersive StorytellingJune 6, 2017

More »

Upcoming Events

There are no upcoming events scheduled. Please check back later.
Event Archive »
Video Archive »

Join Email List

SearchReSearch

  • SearchResearch Challenge (3/22/23):  What do you call the sediment that blocks a river from flowing to the sea?
    SearchResearch Challenge (3/22/23): What do you call the sediment that blocks a river from flowing to the sea?March 22, 2023
  • Answer: What do these everyday symbols mean?
    Answer: What do these everyday symbols mean?March 15, 2023
  • SearchResearch Challenge (3/8/23): What do these everyday symbols mean?
    SearchResearch Challenge (3/8/23): What do these everyday symbols mean?March 8, 2023
  • PSA:  Read Clive Thompson’s article about how he does research
    PSA: Read Clive Thompson’s article about how he does researchMarch 3, 2023

More »

University of Maryland logo
Robert W. Deutsch Foundation logo
Google logo
Barrie School
Library of Congress logo
State of Maryland logo
National Archives logo
National Geographic Society logo
National Park Service logo
Newseum logo
Sesame Workshop logo
Smithsonian logo
WAMU
© 2023 The Future of Information Alliance, University of Maryland | Privacy Policy | Web Accessibility