Hacking Into a Data Journalism Revolution

Feb 22012
  • With several dozen people gathered, the meet-up was designed to introduce the fundamentals of Python using a simple web scraping example. (Photo by Maite Fernandez)

  • For journalists, the pull toward Python is being able to access the data they need to build solid, well-researched, well-documented stories. (Photo by Maite Fernandez)

  • ICFJ's Ben Colmery (standing, left) discusses Python and other data technology tools with Jeremy Bowers, senior developer at The Washington Post, and instructor Jackie Kazil. (Photo by Maite Fernandez)

We are on the verge of a data-journalism revolution. That was my big take away from the Hacks/Hackers “Python Web Scraping 101 for Journalists” meetup this past weekend in Washington, D.C.

Working with Hacks/Hackers D.C. and the Sunlight Foundation, the International Center for Journalists co-organized this meetup as an afternoon of training and networking led by Serdar Tumgoren, News Applications Developer at The Washington Post, and Jackie Kazil, Developer Lead for CACI International Inc. at the Library of Congress. The goal: to help local journalists and programmers learn how to scrape a website using the open-source language Python.

Why web scraping? Because websites are rife with data, but are typically not built in a way for easy analysis or with downloadable data sets. Web scraping makes it possible, when all else fails, to extract data and export them into formats that can be analyzed more easily.

At the International Center for Journalists, we see the direction that journalism and technology are heading, particularly in terms of data, and are building exactly these kinds of approaches into several of our Knight International Journalism Fellowships. As part of his fellowship, Justin Arenstein is launching Hacks/Hackers chapters in many parts of Africa, and making open data a key focus. This spring, we will also initiate two data-journalism fellowships in Argentina and Brazil, which will tap into local hacker communities, and help journalists engage and develop data scraping and analysis technology to enhance their investigative reporting. So, coordinating an event like this was a way to get a better hands-on sense of how it is done and hook into the local innovation boom in D.C.

Of course, I already had an idea that data journalism is taking off. In the last few years, we’ve seen a groundswell of new data technologies: programming languages like Python and PHP for accessing data, and online tools like Google Fusion Tables, Dataviz and the Investigative Dashboard for analyzing and visualizing data. Many are now free and much easier to use than the tech of old. Meanwhile, universities like Harvard and MIT are putting their computer science courses online – and at no cost.

What first struck me about this meetup was the demand. When we originally posted 25 spots, they were filled almost instantaneously. We opened up ten more, and they were quickly filled. By the day of the event, we not only had 37 in attendance, but another 29 were on the waiting list. I was also excited to see that the gender breakdown of attendees was 21 male, 16 female, a near parity that one might not have expected just a decade or two ago.

Sure, there were challenges at the meetup. Most people needed the Python software installed, which was exacerbated by the intrinsic differences between Windows and Mac operating systems. Moreover, we needed a lot of roaming helpers to offer guidance to a room full of people new to Python. But, the challenges paled in comparison to the sheer energy and enthusiasm that was tangible in the room all afternoon long.

It was this energy and enthusiasm, combined with the demand that led me to my big takeaway. Anytime you make technology that people actively crave widely available to them, something big happens. Think of the advent of the personal computer, of the Internet, of the mobile phone, of Twitter and Facebook. As each was adopted into the mainstream, we witnessed a transformation in society and in journalism. This energy among hacks and hackers alike tells me we are on our way to seeing data journalism adopted into the mainstream, and this same kind of transformation.

During our pizza break, I was speaking with Will Atwood Mitchell, web developer at the Washington City Paper, when I looked several years into the future and said to him, “You know, pretty soon, we are going to reach a point – thanks to all of these learning resources and tech becoming free and easy to use – where the barrier to entry is no longer skills, but ideas.”

Think of what the newsroom, and journalism, will look like when this generation of hacker journalists, armed with data and enthusiasm, gets through with them.

It’s exciting to be a part of this innovation. In D.C. In Argentina and Brazil. Across Africa.

For further learning, these are the web scraping resources that Tumgoren and Kazil used during the training.

Editor’s note: Ben Colmery is the deputy director of ICFJ’s Knight International Journalism Fellowships program. He’s also widely regarded around here as the go-to guy for web technology and innovation. Already actively involved in the local Hacks/Hackers group, which merges technology gurus and journalists, Ben saw an opportunity for ICFJ to take a leading role. We co-hosted a forum designed to help journalists access the data they need to produce solid, well-documented stories.