By Christina Sherwood
Published Jan 1, 2013 8:00 AM
Data science is not just good for business. Jake Porway Ph.D. '10, a former data scientist in The New York Times' Research & Development Lab, is founder and executive director of Data-Kind, a nonprofit connecting scientists with social organizations. The recruits, from places like Tumblr and New York University, spend their spare time "hacking on data" for groups like the Grameen Foundation and the American Red Cross of Greater Chicago.
Q: In a recent article about you, a website for IT professionals called data scientists "the new rock stars" of the tech world. Why are data scientists so hot right now?
A: We're entering this revolution where data is more ubiquitous than it's ever been. We're creating more varieties of data from smartphones and sensors and webcams. And we have incredible access to computing power to do something with it. With relatively cheap hardware, anyone can mine this data. There's a need for people who can handle that data, from data engineers collecting it to statisticians making sense of it.
Q: Isn't "data scientist" just a trendy new name for statistician?
A: This is a debate that's raging on the Internet. Data science is not necessarily just a rebranding of statistics. It incorporates computer science components. But I agree that if stats departments were thinking about it right, all of that should be part of statistics.
Q: Why did you choose data science as a career?
A: I came from a computer science background. During my schooling I stumbled on machine learning, which was this amazing concept to me. You didn't have to teach the machine how to be. You could teach the machine how to learn and let it figure out what it needed to know. Machine learning was my research topic at UCLA. I was in computer vision, which is getting computers to understand what's in images, for instance in face recognition. That opened up my eyes to how broad statistics was and how important it was going to be in this data-science revolution.
Q: How does this show up in our everyday lives?
A: If people want to know how data's touched their lives, they need look no further than the last time they bought something or watched a movie. Netflix worked because it collected vast amounts of data about what movies people like. Google Maps knows the best route because it's been listening to traffic data, weather data and search-history data. It's how Amazon recommends products and how Facebook thinks you know friends. Our lives are becoming incredibly data-driven.
Q: With companies using data to build their businesses, there's a lot of money in data science. Why did you bring data science to social activism?
A: I was excited by the idea of hackathons [events lasting a day or several days, where tech experts collaborate on projects]. I thought this was how we'd make change. But the stuff coming out of the hackathons was more of the same. I sat racking my brain for an app I wanted to build and couldn't think of anything useful. Following that, I saw a sign that said, 'Stop Global Warming.' The group's website said you could hand out flyers and turn off your air conditioner. But I have data-mining skills. It was that combination of wanting to use my skills for good, realizing other people had skills to spare, and realizing there were social organizations that had data available. I said, "Let's try to get these guys together."
Q: You've organized "DataDive" events in San Francisco, New York, Chicago and Washington to connect data scientists and social organizations for a weekend of datadriven problem solving. How do you solve problems at these events?
A: The New York City Parks Department was our first government attendee. They have a pruning program, where they prune trees in the hope they'll reduce the work they have to do later. They wanted an analysis to see if pruning blocks had reduced the number of work orders. The team could not find much of any sign that there was a significant decrease in work orders because of pruning. That was an initial analysis. After doing a deeper dive, we now think it's only in some neighborhoods where it didn't cause a reduction. They'll independently verify it. The quality of the tools that come out are mind-blowing. Guidestar has all the financial data of every nonprofit in the U.S. They wanted to use that to predict how nonprofits would do the next year. The teams spent all day hacking on the data, bringing in other data sources, and writing complicated machine-learning models. On presentation day, they did a mini-machine learning conference.
Q: Has the success of the DataDives surprised you?
A: To be honest, I'm more amazed anyone shows up. If you told me someone would want to spend almost 48 hours working on data, I'd be surprised. It's never just one person. There are many people who want to keep working on this.
Q: Why do they come?
A: They're data people, so they love cool problems. What's cooler than a real-world problem? And as big as their minds are, they have big hearts, as well. A lot of workplaces have volunteer programs. But it's not the same to clean up trash from a river as it is to write an algorithm to help identify polluters. People are looking for ways to use their skills for good.
Q: What are some of the issues your volunteer teams, the DataCorps, are working on?
A: The DataCorps are meant to tackle longer-term analyses and projects. The questions are similar to the DataDives, [but] we go into a lot more depth. For example, a [DataCorps] group is analyzing political contributions. They're looking to understand how susceptible politicians are to influence by different organizations. In the DataCorps, we have months to dig into the research and analyze where we're going. We can build a tool that could continue this beyond an analysis.
Q: What's the biggest challenge to growing this movement?
A: As fun as the DataDives are, there is a lot of weight to what's going on. People are looking at sensitive data and making decisions that could result in gains or losses in funding or policies being made. I'm concerned as this depends on everyone being responsible, buying the philosophy and doing this in the best way possible.
Q: How fast is your idea spreading?
A: One of the clearest signs is the number of [nonprofit] groups that have continued to work together after the weekends. They've not just adopted the results. Many have changed the way they think about data. One group set aside resources to hire a data scientist. We're seeing data companies saying they want volunteer programs for employees and asking to connect on social projects. We're seeing foundations looking for support from us or other groups for their grantees. We've had requests [for DataDives or DataKind chapters] from more than 20 cities and have seen people start to build their own in at least five. It's going to keep spreading.