… when you do a search, you want to look just in certain regions of the document.
The LA of dreams |
For instance, a newspaper article usually has a title, a first paragraph, an author, a publication name, and a publication date. (Example: headline:”LA City Council wise gun-safety action,” published-by: Los Angeles Times, date: Aug 10, 2015.) Being able to search just within one of those kinds of fields would be a boon to searchers.
In other words, news articles (and document in general) have some kinds of metadata that you can search on, all of which gives you a very fine grain search ability.
This week, a reporter wrote to me to ask if there was some way to search through newspapers in his home town for any headlines about a particular topic.
What he wanted to do was to get a sense for how much news coverage a particular topic received over time. Did the news in his town really cover the topic? Or did they just let it slide? How could you tell? His idea was to look at the headlines, and count up how often the topic had been written about.
I realized today that knowing how to do this is a valuable skill for SearchResearchers, and hence, it makes a great Challenge. Here’s this week’s Challenge, modified slightly to protect the person who suggested the idea.
Can you figure out how to do this?
As you know, I’m originally from Southern California, Los Angeles to be exact, so I’m always curious about what’s going on there.
Suppose I’m a reporter trying to understand how the Los Angeles City Council deals with gun-related issues. Can you (expert SearchResearchers) tell me how to do the following?
1. Can you search the major news outlets in the Los Angeles (LA) region for news articles over the past year that report on the City Council considering any kind of gun-related actions? (Be generous here–if the council heard a report about the use of guns, that would count.)
2. (Harder) Can you find the top 100 LA City Council headlines on guns, and then extract the publication dates to create a week-by-week histogram of when these articles were published? (This is a two-step challenge: (a) find and extract the dates, (b) put the dates into a spreadsheet and create a histogram showing the number of publications on this topic by week.)
Can you figure this one out? Don’t worry if you can’t figure out how to do part 2–I’ll show you how I did it next week.
I can think of at least two ways to do the headline-search, but I’m curious HOW you figured out how to do it. Would you please let us know as you write up your answer in the comments below?
Search on!