Panel with Rachel Shorey, New York Times, Anthony Pesce, Los Angeles Times, Chase Davis, Minnesota Star Tribune
Many tasks in journalism boil down to classification problems. Is my city’s police department cooking its crime stats by assigning incident reports to the wrong categories? Of the thousands of planes in the air each day, which ones might be involved in government surveillance? Is this person I just snapped on my cellphone a Member of Congress?
Drawing on examples including the Los Angeles Times’ investigation into the misclassification of violent crimes by the LAPD, BuzzFeed News’ identification of spy planes operating in U.S. airspace, and a New York Times service that can identify Members of Congress from photos sent by SMS message (used in this story), we’ll address questions including:
I’m not a data scientist, I’m a reporter. What’s in it for me? What type of story or reporting task can machine learning help with? When is machine learning not the answer? Which algorithm should I choose? How can I structure my data to give the algorithm more to work with?
You’ll learn how to use the dplyr package to sort, filter, join and carry out some other basic functions in R to identify trends in your data. We’ll also make some simple charts with ggplot2.