Episodes
Audio
Chapters (AI generated)
Speakers
Transcript
Data Journalism at ProPublica w/ Scott Klein
Data stories is brought to you by Qlik, who allows you to explore the hidden relationships within your data that lead to meaningful insights. Let your instincts lead the way to create personalized visualizations and dynamic dashboards with Qlik sense.
Moritz StefanerData stories is brought to you by Qlik, who allows you to explore the hidden relationships within your data that lead to meaningful insights. Let your instincts lead the way to create personalized visualizations and dynamic dashboards with Qlik sense, which you can download for free at Qlik. Deries. That's q l I K. Deries. Don't forget the data stories.
Enrico BertiniHi, everyone. Data stories number 49. Hey, Moritz, how are you?
Moritz StefanerHey, Enrico. I'm good. That's almost busy as usual.
Enrico BertiniThat's almost 50.
Moritz StefanerThat's. Yeah. We're getting closer and we still don't know what we do for our birthday. Right?
Enrico BertiniYeah. What should we do? Something.
Moritz StefanerWe'll invite everybody over and have a good time.
Enrico BertiniWe should actually organize a party, but it's too late.
Moritz StefanerWe'll see.
Enrico BertiniWe'll see. Yeah.
Moritz StefanerHow are things for you?
Enrico BertiniWell, lots of work as usual, but New York is a little sunnier than usual today, so that's good. And you?
Moritz StefanerLife is good. Yeah, I'm traveling quite a bit, so the year is slowly kicking in. And so I've been to Paris last week, going to London this week.
Enrico BertiniWow.
Moritz StefanerLots of miles.
Enrico BertiniLots of miles. Yeah, yeah.
Moritz StefanerBut also meeting lots of people, so that's good.
Enrico BertiniInteresting. Yeah, I just started my new course. That's pretty interesting. I'm excited. And I have lots of students this year, so I am a little worried. But let's see. Let's see what?
Moritz StefanerHow many do you have? How many?
Enrico BertiniI have 50 something. 50? That's a lot. And considering that I assign projects to my students and they have to develop some vis applications, so that's going to be a lot of work. But let's see.
Moritz StefanerYou will see lots of different solutions, so keep us updated on that.
Enrico BertiniYeah, I'll try to.
Moritz StefanerMaybe we should crowdsource the criticism to the data storage audience.
Enrico BertiniYeah, that will be teaching concepts. Amazing.
Scott KleinYeah.
Enrico BertiniI could publish some polls and see. And then give grades according to the vote. Yeah, exactly. So let's start. We have another special guest today. We have Scott Klein from ProPublica. Hi, Scott.
Data journalism in the age of Trump AI generated chapter summary:
We have another special guest today. We have Scott Klein from ProPublica. And we wanted to talk about data journalism for a long time. And I also have to thank Alberto Cairo for introducing me to Scott.
Enrico BertiniI could publish some polls and see. And then give grades according to the vote. Yeah, exactly. So let's start. We have another special guest today. We have Scott Klein from ProPublica. Hi, Scott.
Scott KleinHello. Great to be here.
Enrico BertiniHow are you?
Scott KleinI'm doing very well.
Enrico BertiniI'm so happy that you are here. And we wanted to talk about data journalism for a long time, so we are a perfect person to talk about it today. And I also have to say that I have to thank Alberto Cairo for introducing me to Scott in the first place.
Scott KleinThat's right.
Enrico BertiniAnd. Yeah. So today we're going to talk about data journalism at ProPublica. But we want to start from the start so can you tell us a little bit about what ProPublica is and also introduce yourself?
Exploring ProPublica's data journalism AI generated chapter summary:
ProPublica is a not for profit organization that does journalism in the public interest. Our special focus is on investigative journalism. We started publishing in 2008 in New York City. We've been the very fortunate recipients of two Pulitzer prizes since the time what we started.
Enrico BertiniAnd. Yeah. So today we're going to talk about data journalism at ProPublica. But we want to start from the start so can you tell us a little bit about what ProPublica is and also introduce yourself?
Scott KleinSure. Sure. I'll start with ProPublica. ProPublica is a not for profit organization that does journalism in the public interest. So we were founded in 2007, started publishing in 2008 in New York City. Our special focus is on investigative journalism. We sort of were started at a time when newspapers were really in a rough spot. And the feeling was then that the thing that was going to go first in the newspapers as they were kind of in the midst of an economic collapse, would be the kind of long form, long term projects that really a big differences in the world. So ProPublica was founded to try to preserve some of that work. So, as I say, we started publishing in 2008. I've been here since the beginning. We've been the very fortunate recipients of two Pulitzer prizes since the time what we started. Yeah. The main founding editor was Paul Steiger from the Wall Street Journal. It's now edited by Steve Engelberg from the New York Times, Robin Fields, most recently of the Los Angeles Times. So a bunch of folks from the newspaper world, from the philanthropy world, and since then, a motley crew of journalists from lots of different places coming here to try to, you know, make a real difference in the world.
Enrico BertiniNice. And can you tell us a little bit more about yourself? What is your background? How did you end up doing proPublica?
In the Elevator With ProPublica's Team AI generated chapter summary:
How did you end up doing proPublica? I was an english major who realized when it was too late that I was really a nerd who loved computers. Went from the oldest magazine in the US to the newest news organization in the United States. Has been helping build out the data team since then.
Enrico BertiniNice. And can you tell us a little bit more about yourself? What is your background? How did you end up doing proPublica?
Scott KleinYeah, I mean, you know, we can talk a bit about my team, who are much more interesting than I am, but lots of us come from humanities. I was an english major who realized when it was too late that I was really a nerd who loved computers. So I came through the publishing technology. I was at the New York Times for a little while and for a long time at the Nation magazine, which has kind of left progressive politics magazine in the US. It's actually the oldest weekly magazine in the United States. But I kind of, I went from the oldest magazine in the United States to the newest news organization in the United States in 2008 and helped push the button to published ProPublica on the first day and have been helping build out the data team since then.
Enrico BertiniIt's an interesting trajectory. So you said that you have people with very different backgrounds. Can you tell us a little bit more about that? I'm really curious because I think there are a lot of people out there who are really interested in understanding how do you become a data journalist? And I think there are so many different paths and.
How Do You Become a Data Journalist? AI generated chapter summary:
I'm the editor of a team of about ten journalists who built something that we kind of call news applications. They use data itself to do the work of journalism. So we actually expose large data sets to people. We make large interactive databases. There are so many different kinds of people in the data journalism world.
Enrico BertiniIt's an interesting trajectory. So you said that you have people with very different backgrounds. Can you tell us a little bit more about that? I'm really curious because I think there are a lot of people out there who are really interested in understanding how do you become a data journalist? And I think there are so many different paths and.
Scott KleinYeah, yeah, I mean, let me talk sort of about what my team in particular does. So, yeah, I'm the editor of a team of about ten journalists who built something that we kind of call news applications, which are data journalism of a specific type. They use data itself to do the work of journalism. So we actually expose large data sets to people. We make large interactive databases that themselves are the work of journalism. So, you know, an example would be a project we did called dollars for docs, which for the first time put together datasets from seven, now 15 pharmaceutical companies which had been making disclosures about payments they make to doctors. This project put all of those disclosures together and made it searchable for people. So, you know, while we worked with a terrific reporter here at ProPublica to do that, the data itself became part of the storytelling and didn't just become an input to a long form newspaper story. So that's sort of just to kind of lay out what my team does. So who they are, they're kind of a mix of designers and statisticians and journalists and software developers. And when I say that they aren't those things, we don't have sort of a team of developers and a team of designers and a team of data specialists and a team of reporters. We're all sort of responsible to do each of these things for our projects. So these people come from the journalism world. They come from design backgrounds. A lot of them come from, as I say, they were humanities majors who figured out that they were nerds when it was too late, or they were nerds who realized that they were humanities people when it was too late. And I think, sought out places where their creativity could involve building software and designing interfaces and designing visualizations.
Enrico BertiniYeah, that's really interesting. And I think that's what is really interesting to me for the data journalism world, that there are so many different kinds of people. And I think this makes the old field very, very interesting. And so I wanted to ask you, you briefly mentioned that you create news apps, and I really like the idea of doing apps rather than calling what you do as apps rather than just visualization. So can you tell us a little bit more about what is a news app?
News Apps: What Is a News App? AI generated chapter summary:
A news application can include visualization, social tools, that can include search. They inherit as much from a traditional news story as they do from a piece of software. It's a way of giving you information so that you can find what's relevant to yourself and to your community.
Enrico BertiniYeah, that's really interesting. And I think that's what is really interesting to me for the data journalism world, that there are so many different kinds of people. And I think this makes the old field very, very interesting. And so I wanted to ask you, you briefly mentioned that you create news apps, and I really like the idea of doing apps rather than calling what you do as apps rather than just visualization. So can you tell us a little bit more about what is a news app?
Scott KleinSure.
Enrico BertiniBecause I think some of the things that you do that do include visualization elements, you would call them apps, right?
Scott KleinYes.
Enrico BertiniI think one aspect that I really like of the way you are doing these apps is that it's something that, for me looks like more. I don't know, it has an extended lifespan compared to other types of visualizations.
Scott KleinExactly. So a news application, as I say, we think about them a lot like news stories. They sort of come to life in much the same way that a news story does. They're edited much in the same way that a news story is. They're developed in much the same way. They really inherit as much from a traditional news story as they do from a piece of software. But they are ways to use raw data, to display raw data to people in a way that lets them tell a story for themselves. And let me tell you a little bit more what I mean by that. As a reporter, if I were a reporter and I was doing a national story about, say, payments that pharmaceutical companies make to doctors, I will work very, very hard to find anecdotes that are incredibly meaningful for people. But they are abstract anecdotes. They're abstract examples that you are as a reader meant, you know, we hope to associate with your own lives. And sometimes this works great and you find just the right anecdote. Sometimes the anecdote isn't meaningful for people. When we build a piece of interactivity that lets you find your own doctor within the data, and we can marry that with a national story or enough context to give you an understanding of the broader phenomenon, that's really us at our best. So it's a way of giving you information so that you can find what's relevant to yourself and to your community, but to do so in a context that lets you understand a broad and sometimes very complex subject. So that's a news application, and that can include visualization, that can include social tools, that can include search. A lot of times it includes search. So we, we bring to bear everything that we can to build these interactive experiences for people that they're telling themselves a story, they're free form, investigating through a data set in a way that makes them feel empowered instead of sort of overwhelmed, but that as they're doing it, we're revealing to them a big national story.
Moritz StefanerAnd would you also have more guided summary articles around such a deep data set? Or do you basically say, here's the database, here's the types of things you can investigate. Now try it out. Is it purely exploratory or do you sort of mix analysis with exploratory tools? What's the usual style there?
Scott KleinWe'll do anything, and we have done everything. We will often have an investigative story that goes along with a news application that you can read and understand the phenomenon and get policy analysis and quotations from experts and things like that. But if you look at the traffic patterns, huge, huge numbers of people will use these tools. And it's important for us that if you use these tools, but if you don't read, they don't take the tour, or if you use the tools, you don't read the story, that you're still getting an understanding of the phenomenon. So we always, and really, wherever it's possible, we don't give you just a raw number. We give you a number, then say, this is how this number ranks compared to other things like it, or this is the, you know, how this is where this number compares to other people like it. So, you know, not just, you know, your doctor prescribes this much of a certain drug, but, you know, this doctor is the top prescriber of this drug in this state or is the fifth most popular prescriber of this drug in this state. So that because the wrong numbers are often meaningless without that kind of context.
Moritz StefanerYeah. And I mean, many traditional journalists will say you can't just dump all this data out there and then it gets misunderstood or people use it for the wrong purposes, or people won't know how to work with that. You need to summarize it for them. And you can't just dump all this data out there. So I think that's kind of interesting that you see, this is the most important thing we can do at the moment in investigative journalism is to actually make these databases available, I think. Very interesting.
Scott KleinI sort of simultaneously vigorously disagree with that and vigorously agree with that. I mean, I think you guys talk about data literacy a lot, and I think you would agree with this, that you'd be surprised how data literate you were if it had to do with, you know, you and your family and your medical care and, you know, making really important decisions in your life. You kind of catch up quick when it really, when the data's really, really relevant. You know, surely if you understand a baseball box score, if you understand very sophisticated statistics that come out of sports.
Moritz StefanerRight, there's nothing, suddenly the numbers are not so scary.
Scott KleinNot so scary, right. There's nothing that I've ever published that is as complex as very basic baseball statistics. But at the same time, we almost never, in fact, I would say never put raw data up. It is always something that we have worked very hard to understand, worked very hard to make sure that people understand, couch the number in context in a story. But it's not just, well, if you read the number, you have this misinterpretation. But did you read the story, too? Right. We don't do that even in context. You sort of see, you know, how this number compares to another number or how this doctor compares to another doctor or how this facility compares to some other facility. So we are very. It is our responsibility as journalists to help you understand something instead of just kind of giving it to you and hoping you find something.
Moritz StefanerRight. And another thing I noticed is you have this, our investigations tab on the website, and there are, like 30, 40, maybe 50 different areas or bigger themes you are continuously working on.
Scott KleinYes.
Moritz StefanerAnd then you can see all the stories, like more than 100 stories related to one theme. Like buying your vote, dark money and big data.
Scott KleinYes. We are nothing if not tenacious.
Enrico BertiniYeah.
Moritz StefanerBut I'd like to see that from more newspapers also, that they build up these long over months, these bigger narratives instead of these isolated articles, and you have to piece everything together. And this sort of dossier type approach is really interesting, I think.
Scott KleinYeah. Propublic is very lucky. We're very fortunate. One of my bosses has a metaphor about a newspaper, is a big supermarket, and they have to fill all the shelves. It's true. And we don't have, you know, we only have an investigative aisle to fill. Yeah. So we can really focus our energy. I mean, we don't. One of the things we sort of realized in our first redesign of the website is that we don't really have, quote, unquote politics and quote, unquote sports and quote unquote media. Right. We don't have desks. We don't have these kind of broad subject areas. We have very specific investigations. So, you know, we have hydraulic fracturing. We have dark money.
Enrico BertiniRight.
Scott KleinWe don't have a politics section. We have a dark money section. Same thing.
Moritz StefanerBut it's okay if your front page doesn't change, like, on a day, it's fine.
Scott KleinRight.
Moritz StefanerIf there's no update on one of the big investigations, it will be the same.
Scott KleinNo, it changes every day. But, I mean, something that I think. I'm not sure who said it, but, you know, one of the things that the data, the news applications can do, actually, you know, these are not. These are durable resources. So these are not things that kind of pass into history a few days after they've come out. Some of the things that we've built, like dollars for docs or we made a. An app that lets you look up waiting times in emergency rooms, are as popular today as they were, you know, the day they came out, because these are. It's still relevant in current information.
Keeping the News Apps Up to Date AI generated chapter summary:
These are durable resources. So these are not things that kind of pass into history a few days after they've come out. If you build something that's highly data driven, there's a better chance to keep it updated. We have to be very clear with readers about when we've stopped updating something.
Scott KleinNo, it changes every day. But, I mean, something that I think. I'm not sure who said it, but, you know, one of the things that the data, the news applications can do, actually, you know, these are not. These are durable resources. So these are not things that kind of pass into history a few days after they've come out. Some of the things that we've built, like dollars for docs or we made a. An app that lets you look up waiting times in emergency rooms, are as popular today as they were, you know, the day they came out, because these are. It's still relevant in current information.
Moritz StefanerYeah, but here's a practical question. Aren't you building up a huge debt of hundreds of apps that need to be maintained and updated, and the whole ship gets much more heavy, and it's much harder to do new stuff because you have to maintain the old stuff. How do you deal with that?
Scott KleinIt's one of the hardest things about the job. Dollars for docs, which is still incredibly popular, probably the most popular thing on the site is something that we have to. That we update once a year. It takes a few months because this is 15 different data sets, all of which are hostile to our analysis in one way or another. But there are others. That's something that we very rightly spend our resources on. There are others that we update a few times or for a few years, and then as the traffic sort of tails off, we just kind of put our limited resources somewhere else and we put a sign on them that says, this is when we stopped updating it. It's something we think a lot about and something we're very careful to do. Right. A lot of times, especially as we got good at this, we started making it much more easy to keep these up to date. So there are some, we have one nursing home guide that helps you see violations that happened in nursing homes in the US that I think it's less than a day to keep, to update. So there are things that, yes, debt is a huge problem, but there are ways that we can mitigate it. And then when we can't, we have to be very clear with readers about when we've stopped updating something and what to take.
Moritz StefanerHow current is this thing?
Scott KleinRight?
Enrico BertiniYeah. But I have to say that this is exactly what I like when we talk about news apps. I think that's really interesting. And even just focusing on the visualization side of things, there are not many visualizations out there that have a very long time span.
Scott KleinRight, right, that's right. I mean, I think that a visualization, at least in the most traditional sense, I think is more like a news story. Right. You're telling one story, it has a beginning, a middle and an end, and then it sort of fades through time in the way that a news story does because you sort of designed it to be timely. You designed it to be something that's very relevant to right now.
Enrico BertiniYeah.
Scott KleinWhereas the things we built at the.
Moritz StefanerSame time, if you build something that's highly data driven, I think there's even a better chance to keep it updated, because, let's say the New York Times, they had this Iraq dashboard type thing where they would update all the counts of the deaths and things like this. So ideally, you would build at once and just have a simple way of updating it with current data. Right?
Scott KleinSure. I mean, there are data sets where you can make a simple workflow to keep it up to date, and there are data sets where you had to scrape or you had to keyboard in a lot of stuff where you had to take 1000 PDF's and boil them down. So there are things that can't be simplified. But when things are simple for us, we keep it up to date as frequently as we can. But, you know, it's actually, it points to an interesting phenomenon that we found in the work that we do, which is that we build these things. They are information resources as much as they are stories. And because we focus on communities and explaining to people. Sorry. Where, you know, a phenomenon compares or where it's ranked in a larger set of data, it not only makes it more meaningful for regular people, it makes it meaningful for local journalists. So one of the phenomena that we were very surprised by at the beginning and now really do our best to foster is that local reporters will take pieces of our data and do stories about them. So we were talking about dollars for dogs, which again, lets you look up to see if your doctor is taking payments from pharmaceutical companies. And we found that. And I think it's 200 now, or it's upwards of 200. News organizations have done stories using dollars for docs data. That doesn't mean they did a story that said, isn't ProPublica interesting? And they did this thing, which we love. But what I'm talking about is big investigative takeout stories in the St. Louis paper where they find the doctors in their area who got the most money and go and talk to them and say, is this a conflict of interest? What does it mean that you've taken this? I think the Raleigh paper did a story about a medical system that actually had a rule, a nonprofit medical system that had a rule that said that doctors aren't allowed to take money from private school companies, but they were doing it anyway. So these have this opportunity to really kind of generate journalism. And I love ProPublica, but we're 40 people in an office in New York City, and we don't know anything about St. Louis. We don't know anything about Raleigh. We can't do a story everywhere in the country. But these news applications give us the ability to be in all of those places. Because we're giving this information to local reporters, we're spending the time to really understand it, to make sure the data is bulletproof so they can trust the information and they can use it to base stories.
Moritz StefanerThat's super interesting, this sort of data food chain aspect. You know, we had also people from the World bank or maybe OECD who take very raw data and make it more refined and consolidated. Then there's people like you who put that in a contextualize that information and edit it but still keep it open. And then, as you say, you have maybe local journalists who produce very specific stories for people who then share their comment on Facebook on that story. You know, and sort of these, I think this is getting much more diverse now, this sort of the different granularities and the different, the food chain aspects.
Scott KleinAnd I think there was a time when, you know, I mean, I think in the kind of old way of news, if somebody took your information and made their own story, what you would call it as quote, scooped by your own story, and you never wanted that. So if somebody followed up on your story, this was a tragedy and you sort of had a meeting to figure out how to stop it from ever happening again. But I think one of the great things about being at ProPublica is that again, our mission is not to just make journalism. Our mission is to change things in the real world, to have real world impact. So when something like that happens in the St. Louis paper with La paper or a paper out in the country without even necessarily letting us know they were doing it, that we hope they do comes out with a story that's really hard hitting and makes change in a city that's far away. You know, we get very happy here. We sort of get together and we say, how can we make this happen again? So, you know, now we're coming out with these things called reporting recipes where you can read and give you ideas for stories that you can do. With our data, we have these. Exactly. We have these open conference calls where you can call in and ask any question you want of the people who put the data together so that you can understand it better. And a lot of those calls are filled with people saying, what's interesting here? What story should I do?
Enrico BertiniIt's great.
Moritz StefanerSo you want facilitating, coaching, helping people to do journalism. That's a very interesting perspective. So let's take a minute to talk about our sponsor again. It's Qlik and their new product, Qlik Sense. So Enrico, you tried it out recently, I heard.
Culture of Data: Qlik and Qlik Sense AI generated chapter summary:
Qlik and their new product, Qlik Sense. You can basically create your own charts. Click also has a blog post about using visualization for kids. Thanks so much to Qlik and Qlik sense for sponsoring us again.
Moritz StefanerSo you want facilitating, coaching, helping people to do journalism. That's a very interesting perspective. So let's take a minute to talk about our sponsor again. It's Qlik and their new product, Qlik Sense. So Enrico, you tried it out recently, I heard.
Enrico BertiniYes, I just tried click today and I have to say that it's really interesting. So they have two main products. One is called Qlikview and another one is called Qlik. Sense. And the main difference is that Qlik sense, you can basically create your own charts. And the nice thing about Qlik is that it's everything on the web, is everything web based. And they had very interesting discussions with some people there. And one thing that I didn't realize before is that one big problem for visualization tools like this one is to be able to adapt to different interfaces, faces, different size, screen sizes and so on. And one nice feature for Qlik is that it's designed in a way that it can shrink without losing too much information. So they have very interesting mechanisms by which, for instance, labels are added or removed according to how much space is there.
Moritz StefanerSo they have to zoom a charity wheel on the right hand or left hand side. See, sort of an indication that there's like more data you're currently not seeing. So very smart ways of working in a couple of environments.
Enrico BertiniYeah, it's very interesting. And I also wanted to mention that last time we talked about visualization for kids. And so the click folks wrote to us saying that they also have something, they have a blog post about using visualization for kids. So we will put this link on our blog post because it's interesting.
Scott KleinYeah.
Moritz StefanerSo it's business intelligence for first graders, which is like an interesting starting point.
Enrico BertiniWe totally need that. So I am totally with click with that.
Scott KleinYeah.
Moritz StefanerBut it's very, very interesting to see like, yeah, which types of shards do these young kids like, understand? And so on. So, yeah, and there's a nice report there. So the vp of global industry solutions actually tried it out in a classroom and you can read the blog post. What is the experiences work?
Enrico BertiniYeah.
Moritz StefanerSo thanks so much to Qlik and Qlik sense for sponsoring us again and supporting us. And now back to the interview.
ProPublica: Impact of Our Work AI generated chapter summary:
How do you define impact at ProPublica? I find it actually a somewhat dangerous question that I have to ask myself sometimes. What I talk about is change in the real world. We're very grateful when it does, and it often leads to real impact.
Moritz StefanerSo thanks so much to Qlik and Qlik sense for sponsoring us again and supporting us. And now back to the interview.
Enrico BertiniSo one thing I wanted to ask you is, so you briefly mentioned impact and I don't know, recently I've been thinking a lot about impact and relationship between what we do and impact and how we define impact for ourselves. So I'm just curious to hear from you, how do you guys define impact at ProPublica, I guess, what are the type of events that have to happen to let you scream and say, oh, we made it. I mean, it's like. Right. Oh, that's really cool. That's really what we are really meeting our mission here.
Scott KleinYou know, it's a, it's a question we think about a lot.
Enrico BertiniYeah, it's a dangerous question.
Scott KleinRight.
Enrico BertiniIt's quite painful because, I mean, sorry for interrupting, but I think, I mean, we all go about just doing what we like to do. And I don't know if, I mean, we don't have many opportunities to stop and think, am I somewhat useful? I mean, seems a stupid question, but it's not. And I find it actually even a somewhat dangerous question that I have to ask myself sometimes.
Scott KleinWell, I mean, the way that I like to think about impact, I'm a maximalist when it comes to this. I think that having a congressman put out a press release or to send an angry letter or to call for hearings, I think that's all great. But if you've done journalism a long time, you know that it's actually not so hard to make happen. We're very grateful when it does, and it often leads to real impact. But what I talk about when I talk about impact is, you know, change in the real world. You know, the innocent person out of jail, the guilty person into jail, the law changed. You know, one of the early big pieces of impact that we had at ProPublica, we did a story, and actually there was a little news app attached to it about the difficulty that California was having taking nurses who were, you know, committing crimes and stealing drugs from their patients and hurting their patients, doing all sorts of terrible things. The difficulty that California was having getting them out of the system. So the nurses were being put into, you know, these very lengthy diversion programs, and they never seemed to be fired. And we did a story in the Los Angeles Times on Sunday, one Sunday, and the next Monday morning, Arnold Schwarzenegger, then governor of California, fired the entire nursing board, and it made real change. And that's what we mean. Right? We mean, you know, the world is a different place because of the work that ProPublica did.
Enrico BertiniYeah. Yeah. This is what I like. Yeah. Very interesting. So I wanted to ask you something about, I think everyone here is super curious about how do you, how do you actually start a new project? I think we had also one question from Twitter, from Lynn Cherney. She said, how do you pick stories to investigate? And I would add to that, how does a project develop on time?
How Do You Pick a Data Story? AI generated chapter summary:
ProPublica is a very reporter driven place. The people who work for me are called news application developers. The creative act is the same as writing a story. How does a project develop on time?
Enrico BertiniYeah. Yeah. This is what I like. Yeah. Very interesting. So I wanted to ask you something about, I think everyone here is super curious about how do you, how do you actually start a new project? I think we had also one question from Twitter, from Lynn Cherney. She said, how do you pick stories to investigate? And I would add to that, how does a project develop on time?
Scott KleinSo lots and lots and lots of ways, as you might imagine. But ProPublica is a very reporter driven place. And the people who work for me, though, they are called news application developers. They are very much peers with our reporters. They are reporters, right? They make phone calls. They write stories. They, in addition to kind of building these things and really the software that they build, the creative act is the same as writing a story. It just comes out in a different form than a story. So I'm talking about them as well. But ProPublica is very much driven from the bottom up. So reporters pitch ideas. The people on my team pitch ideas. If I come up with an idea, it is a half a sentence. It's a, you know, you know, legislative redistricting is interesting. I think that, you know, we understand the statistics well enough to reverse engineer it. Go, you know, that's the entirety of the instructions that they get. And then as editors, we sort of help develop these ideas and help keep them on track. But, you know, the overwhelming majority of the cases, it's the reporters, the data journalists, the developers who are coming and saying, you know, I think this is interesting. I think, you know, I've got a source who's told me this, this document set is coming out, and I think we can find this story in it. So again, it's all from the journalists and the developers up.
Enrico BertiniSo I'm really curious to understand, how does it work then? So you come up with an idea and not necessarily the data that you need for this idea is already there, right. Or easy to access. So how do you go about finding, or, I don't know, discovering the data that you need? Because I think this is a major, major part of being a data journalist, if I understand correctly.
Scott KleinI mean, sometimes the discovery of the data or the knowledge the data is there and that no one else has done anything with it yet is the impetus that brings the developer or the reporter to us and say, hey, you know, we've got this data on nursing home injuries or something and no one's done a project on it. Or we have this data that a reporter brought in and said, here's the emergency room wait times, which actually turned out to be a key quality metric for emergency rooms in the hospitals. No one's done anything with it yet. Maybe there's something we can do. And we start sort of thinking creatively in whiteboarding, you know, what could we do with this? And what's the story that we're trying to tell? And what's kind of the nugget, what might bring people to it? How can we make this into a resource? So sometimes it starts with the data. Sometimes it starts with a question. Sometimes it starts with a hunch or a hint from a source or a tip from a source, followed by one of us writing a FOIA letter to the government and asking for data that isn't yet released, we think holds a story in it. So there's lots of ways interesting. Oh, and by the way, and by the way, before I forget, plenty of times we have done stories where we thought, oh, there's plenty of data for this. Let's dive right in. And the data ends up not being what we thought it was. The data turns out to need months and months of cleaning. We did a project that used federal Department of Education data, which, you know, seemed like it would be a cinch and we could start visualizing right away. And, you know, eight weeks of cleaning later, and, you know, about a quarter of the records thrown away, we were able to start doing something.
Enrico BertiniYeah. So you have an initial phase where you understand whether you can really do something with this data or not, right?
Scott KleinYes, sometimes it's initial. It's an initial phase. Sometimes it's right up until the very end phase, but, yes.
Enrico BertiniOkay. And so, and then how do you come up with the pieces? And so I'm just curious to understand how do you organize work after, you know that there is something interesting there.
Scott KleinLike I said, the department is a lot like other, other teams of journalists. So I am the editor. The people who work for me, they're titles developer, but they're a lot more like reporters. And what they're tasked with doing is making phone calls, finding the people who are the experts in this topic and talking with them. Help me understand what's interesting about this. I looked at the data, and I don't understand this. Can you explain it to me? So it is a process of talking to and listening to academics, scientists, experts in the field. So that we start. Right. That's sort of prerequisite to starting to kind of play with the data, because we don't. Can't really play with the data if we don't understand it. So that's an enormous part of it. Did that answer your question?
Enrico BertiniYeah, absolutely. And actually, I think this part is really interesting because, of course, you are taking responsibility of, I don't know, taking a data set and extracting information out of it and communicate it to a very large public and probably in trying to argue something on top of this data. So I think it's very dangerous doing. I mean, if you don't do all the good work that is needed in order to understand whether, I don't know, everything is understood very well, you might actually get in trouble. Is that correct?
Scott KleinOh, there's no question. And I think that, I mean, I.
Enrico BertiniWould be super scared to publish data the way you do it. I mean, sometimes the things you do are very strong statements and.
Scott KleinYeah.
Enrico BertiniAnd you never know what kind of reactions you can get.
Scott KleinWell, I mean, we never the very few times we've put something up because we thought that we were right and we thought we knew the things very well and we didn't show it to somebody and say, have we understood this correctly? You know, our average goes down when we do that. You know, when we, when we have made mistakes and thank God we haven't made many, when we've made mistakes, it's invariably because, you know, some part of this process broke down, the process of talking to someone, even when you think, you know, talking to someone, you know, who's in a position to push back against your point of view, to tell you where you're wrong, to make the counter argument, to help you sort of make sure that, that you've understood the information, you're telling a fair and accurate story with the information. We do that every time. That's absolutely required for everything that we do.
Have You Had to Correct a Data Set? AI generated chapter summary:
Did you ever have to retract or fundamentally edit a data set? Did that ever happen? The only time that it sort of happened in a big way, it's the Internet. If people point it out immediately, that speaks for a healthy environment.
Moritz StefanerDid you ever have to retract or fundamentally edit a data set? Did that ever happen?
Scott KleinThe only time that it sort of happened in a big way, it's the Internet. So I think it wasn't up for more than five minutes. But we can be enough to be five minutes. No, but, you know, one of the things I think if you go to any reporter, any good reporter in the country and ask them what was their last correction, they'll be able to tell you verbatim because they're sort of etched in your heart. But we put up a data set of some campaign finance information. It was literally up for five minutes or less that had sort of a miscalculation. And luckily there was Twitter, and many of our colleagues wrote to us and said, hey, I think you've done this a little bit wrong. And we took it down, put it back up the next morning. But all's well that ends well. But again, that was something where we thought we knew exactly. We thought we understood it perfectly and what could go wrong. So we pressed go. And that's a great lesson to learn.
Enrico BertiniYeah.
Moritz StefanerThat speaks for a healthy environment. If people point it out immediately. So that means they are critical and sort of think it through and let you know.
Scott KleinNo, exactly. That's exactly right. And I'll tell you, we'll never make.
Moritz StefanerIt ties to question.
Scott KleinSure.
In the World of Data Journalism, Documentation of Methodologies AI generated chapter summary:
You cannot just put out the end result of a calculation. You have to sort of also document how you arrived at that. We either do a methodology post every time or close to every time. It's absolutely required.
Moritz StefanerJon Schwabish, so he asked on Twitter, what are your thoughts on publicly documenting and providing data sources and statistical models and methodologies? And I think that's a very interesting one, which has been discussed quite a bit in the data visualization community, but nobody really takes any concrete steps, I feel. But the general thought is, of course, you cannot just put out the end result of a calculation. You have to sort of also document how you arrived at that. Right. Absolutely. Reproducibility is a huge topic. And how do you make sure you can reproduce the same calculations five years later with a new data set? Things like this. Any thoughts on this? How do you handle that?
Scott KleinYeah, we take it incredibly seriously. And we have a methodology story that goes with all of our data journalism work. Either we explain the methodology, you know, right there on the page itself. If it's a short one and if it's long or requires a calculation or something, we'll have a separate story. Or if it's very complex and technical and we don't want to bore regular readers, we'll put it into its own post. But I would say that we either do a methodology post every time or close to every time. It's absolutely required. I mean, in part because as journalists, we have to admit that we're not experts in everything and we want to be completely transparent with people and to give them a chance to tell us where we screwed up. So we do methodologies for absolutely everything. And in fact, I can show you examples of data journalism done back in the 1970s that had a little box on the printed page that had a fairly technical explanation of exactly how the numbers were gotten to. It's actually part of the tradition that we've inherited from our kind of, you know, forefathers, the, you know, foremothers, to, you know, to really explain to readers, even in a technical way, how you did something so that they can repeat it, so that they can, you know, take the other data and use those, this as an example or to just say, hey, you know, you screwed up.
Moritz StefanerWouldn't you also put the code, the code on GitHub or share the code? Or is it more like a verbal description of like an extended caption?
Scott KleinNo, no. If we used code, we'll put in the code. If we use a calculus, we'll put in the formula. Though, as I've admitted at the very beginning, I was an english major, so sometimes I just sort of pointed those. But still, we put them out there. So, yeah, we put as much as we can out.
Enrico BertiniYeah. And I think what we are discussing is also, I mean, recently I've been thinking a lot about this problem, that it's so easy to be wrong on everything, but at the same time, we cannot be paralyzed by that. Right? I mean, I'm reading this fantastic book, how not to be wrong.
How Not to Be Wrong in Data Journalism AI generated chapter summary:
It's so easy to be wrong on everything, but at the same time, we cannot be paralyzed by that. And I think it's one of the reasons that data journalists are so careful. Do you also test, do you beta test with users if, let's say, you could have all the statistics?
Enrico BertiniYeah. And I think what we are discussing is also, I mean, recently I've been thinking a lot about this problem, that it's so easy to be wrong on everything, but at the same time, we cannot be paralyzed by that. Right? I mean, I'm reading this fantastic book, how not to be wrong.
Scott KleinYeah, it's a great book.
Enrico BertiniIt's an amazing book. And I mean, turns out it's not so easy. I mean, I think I'm really struggling with this concept, right. Because the more you did into the specifics of how we do anything related to data or statistics or science per se, it's so easy to be wrong, right? But at the same time, this is not a good reason for stopping what we are doing. So I don't know, I don't know where the right balance is. And I think it's an interesting problem for everyone. And I guess it's the same in data journalism. I guess it's very easy to be wrong there as well, right?
Scott KleinI know it is. It is. And I think it's one of the reasons that data journalists are so careful. I mean, you know, when we put out a huge news application that exposes sometimes millions of pieces of data that our surface area of things to be wrong about gets bigger and bigger and, you know, we have to be very careful. We have to be very careful to explain what we know and what we don't know. And again, a lot of that we inherit from journalism, how to explain your certainty about things. And in a way, it also makes us use, we have an entire, very involved process of bulletproofing data. So not only do I edit a news application for sort of interface and are we telling the right story and how do we know we're right, but we have an entire process of back reading statistically significant amounts of data from all the way back to the source. We have a second person, what I'm trying to explain here is we have a second person who tries to go back and redo all of the calculations that went into a data set to make sure that they come up with the same results.
Moritz StefanerFact checking.
Scott KleinRight? So we're incredibly careful and we don't draw conclusions that are not, you know, very well supported. So it's great.
Moritz StefanerDo you also test, do you beta test with users if, let's say, you could have all the statistics, right? You could use all the technically right terms to describe things. You could have a lot of disclaimers and asterisks, but still, people could understand the wrong thing, right? Because, I don't know, they jump to conclusions or some of the words you use, maybe they over generalize suddenly or something like that. Do you test that? Like, do you expose people on the street, let's say just some guinea pigs, to, let's say, a new tool and see how they interpret it?
Scott KleinNot enough. I mean, the time that they stuck.
Enrico BertiniPeople on the street.
Scott KleinRight. The time that we did that, the time that we've done that the most formally is actually one of our best projects. And it was amazing. And this sort of incredible experience to watch people narrate their way through one of our projects. And, you know, I'm sure you guys have both experienced this. You ask them, you know, you know, how do you get from place to place in this? Or how, you know, tell me what you understand about this page. And their responses are so different than you thought they would be. You know, a lot of times, you know, it wasn't even on your list of things that to worry about, you know, are the things that they see and don't see. So we need to be doing a lot more of them.
Moritz StefanerI think it's very fascinating, and often it's just you do it somehow in between to verify something, but it would be great if it were more part of the process.
Scott KleinYeah, I'm working very hard.
Moritz StefanerIt's the same for me. I always say, like, oh, it would be fantastic. But then with every new project, you have to really work hard to fit that in.
Enrico BertiniYeah.
Scott KleinWhat I liked about it, though, is that we can have a lot of back and forth and a lot of debate here, and I think everywhere about design decisions. And so on some level, it becomes down to aesthetics and say, well, you like this, I like that. But then with the user testing, well, if no one saw that menu bar, it's just gotta go. It doesn't matter if you love it. Even the person who made it says, oh, I loved it, but it's gotta go. No one seems.
Moritz StefanerNobody knows how to use it. Yeah, exactly.
Enrico BertiniYeah. This reminds me something I always say to my students, actually said this thing yesterday in class, that when you do this, you have to be ready to trash whatever you've done, that you have to build something in order to understand whether it works or not. And then it doesn't work, it's okay. But, so the outcome is that you now know it. You know better what you need to do. So I think this kind of mindset is really useful.
Moritz StefanerKill your darlings.
Scott KleinKill your darling.
Enrico BertiniIt's hard sometimes. It's really hard.
Scott KleinIt is. Sometimes it is. But a lot of times those darlings sort of come back in different ways. I mean, it's code and it's design and sort of comes back as a slightly different creature for a different project. But, you know, things get reused.
What's More Important Than a Perfect First Draft? AI generated chapter summary:
Are you more in favor of putting out an almost sloppy, or at least very reduced first version and then iterate publicly on it? We are empowering people with these tools. This is also related to increasing visual literacy or data literacy. As more people are exposed to these things, the more they will learn.
Enrico BertiniYeah.
Moritz StefanerWhat's more important, like, in your mind, to get things absolutely right the first time around and then put it out there and then it's there. Or are you more in favor of putting out an almost sloppy, or at least very reduced first version and then iterate publicly on it?
Scott KleinWe try to put everything out as perfect as we can make it, but we are open minded to the idea that we may not be right. So we did a project, I'll give you an example. And it's sort of, it's gone from the Internet, the earlier version. But we did a project about the way that doctors prescribe drugs and bill Medicare for it. So we are able to see this because Medicare pays for the drugs. So it's not the entire country, but it's this huge set of people who are over 65 or who have disabilities. And we can see how just about every doctor in the country, or many, most of the doctors in the country prescribe medicine. And we had this idea that how you prescribe like your peers, in the sense that how much of a certain kind of set of drugs, how much you are like your peers said something about you, and if you were some sort of far outlier, then somebody might want to talk to you about that. So we made all of these visualizations using euclidean distance and all sorts of really exciting math, and we made all of these beautiful, these things called herd charts, like a cow herd. So we were very excited to sort of show, like, where you were relative to the herd. And we published it, and we were very, very happy with it. And it was very clear from the very beginning that no one understood anything that we. And if we had user tested, we would have known before we published, and that would have been great. But it was not getting the traction we kind of expected. But we started right away thinking about how we could make this more simple. We kept some of the idea of how you prescribe like your peers, but replaced it with much more straightforward bar charts that kind of show, I think, much more easy to grasp things. So it's sort of both we went out with the numbers were all right. We had done the calculations and the code right, and I think it was quite beautiful. It was really arresting to look at. But I think that we weren't really communicating what we wanted to communicate with people. So we killed many, many darlings that day and replace them with what? With bar charts, which we're all happy. I think it did much better.
Enrico BertiniYeah, I think the whole idea of user testing this kind of visualizations that go on the web for visual storytelling, it's an interesting research gap that I don't see covered by current research in.
Moritz StefanerGeneral, and it's not part of traditional journalistic repertoire. You know, we have to keep that in mind. I mean, the main back channel was making maybe letters to the editor or something, but in no way was there this idea of like, well, actually we don't quite know what we're doing. Let's ask the readers. You know, that would be like the traditional newspapers.
Enrico BertiniYeah, but this is fantastic, right. That's a big change. And, I mean, I don't know. We are empowering people with these tools. And as we said before, this is also related to increasing visual literacy or data literacy. Right. As more people are exposed to these things, the more they will learn, especially if, as you said, if what you show is very personal. Right. So making things personal is a very interesting strategy. I like it a lot.
Scott KleinYeah. And it's sort of fascinating to see how people react to that. And one of the things that I want to start understanding more is what are we inspiring them to do once they know these things and we know a little bit about this, we can sort of watch them on Google Analytics. And we sort of, you know, in one of our projects, we made a printable guide that you could sort of print out and bring, you know, to your doctor and have a conversation. We know how many people printed those out, but I would just love to know how the work we're doing is, as I say, you know, supporting ProPublica's mission. How. How are we, you know, impacting people's lives in a real way?
Enrico BertiniYes. Yeah.
Scott KleinIt's difficult to get them to come back and tell you.
Enrico BertiniYeah. I mean, the oligarch is understanding whether people are taking action after reading what you have prepared. Right.
Scott KleinYeah.
Enrico BertiniFor them. Yeah.
Scott KleinRight. And we have echoes and hints, but nothing, you know, nothing we can be certain about.
Podesta on Newsworthiness AI generated chapter summary:
Do you spend any time discussing whether this might actually have any negative impact on some people's life? I'm just curious. News, what journalists think about is this concept of newsworthiness. Is this information that people need to know to make good decisions?
Enrico BertiniSo another thing that I wanted to ask you, I mean, I think still related to what we were discussing before, I think even if, let's assume that you publish a new piece, that is absolutely right. It's not wrong in any respect. Right. But still, I guess you. So do you spend any time discussing whether this might actually have any negative impact on some people's life? I mean, something like collateral effects or something? I don't know. I'm just curious. So I was thinking, for instance, just to give you a concrete example, even if maybe it doesn't match exactly, you've been publishing a lot of information about doctors on the web so that people can, if I understand correctly, people can see whether their doctor is funded by pharmaceutical companies. Right. But before publishing something like that, do you discuss whether this might actually have negative effects on doctors, even in cases where maybe they've done nothing bad? I don't know. I'm just building the case right now. But I think you understand what I want to say.
Scott KleinYeah. I mean, news, what journalists think about is this concept of newsworthiness.
Enrico BertiniWorthiness.
Scott KleinYes, newsworthiness. So is this newsworthy? Is this, you know, something that people need to know to be able to make good decisions about how to live their lives? You know, and often this question is, is a philosophical one and has no easy answers. But, you know, we, if we got a, you know, I don't know, I can't make it up on the fly, but if we got a data set that was just lurid interest, just nosiness, I don't think it would rise to the level of all the work that it would take to put it on the Internet. So it really does have to pass a sort of bar of being information that people need to know to make good decisions. Right. I would imagine there's a gossip app. I mean, for instance, here's a great example, actually. And I don't see ProPublica doing this. A lot of newsrooms, a few years ago, it was very fashionable to make mugshot apps, so you would build an interactive, you would get a data feed from the local police department.
Moritz StefanerThat's true.
Scott KleinYeah. And it would just be, here's all the people who were arrested last night, because that's a public record. And often it's obviously, you're seeing people the worst day of their lives. Some people, just another day at the office. But most people, a lot of people, it's the worst day of their lives and the worst moment and the worst day of their life. And you're sort of publishing that via live feed. And I understand, I know some people who built these and they're very popular, and I think some newsrooms were building them in a desperate effort to make money. But I don't think ProPublica would ever do that. It's too much work for it to be get rich quick scheme for us.
Enrico BertiniYeah. So I think many of our reader, not readers, listeners, are always interested about more geeky stuff. So I'm wondering if you can tell us a little bit about what kind of tools you use, whether your team does a lot of coding or, I don't know, anything that is related to tools in general.
ProPublica's technical culture AI generated chapter summary:
ProPublica uses Ruby on rails and JavaScript and other web technologies. We tend to open source components when we can. We do a tremendous amount of taking data out of PDF's. The easiest of those three to teach is the coding part. The hardest part is design.
Enrico BertiniYeah. So I think many of our reader, not readers, listeners, are always interested about more geeky stuff. So I'm wondering if you can tell us a little bit about what kind of tools you use, whether your team does a lot of coding or, I don't know, anything that is related to tools in general.
Moritz StefanerDo you start from scratch?
Scott KleinYeah.
Enrico BertiniDo you start from scratch every time? Do you build your own libraries? I guess you have your own tool set, right? Yes.
Scott KleinSo we build our interactives using Ruby on rails and JavaScript and probably a lot of other web technologies that would be obvious to you. We tend to open source components when we can. So if we make a tool that really helps us do something, we'll open source it and hopefully help others do the same thing. One example of that, we worked with another newsroom. It actually turns out that taking, maybe you guys know this, taking structured data like a table of data out of a PDF is this incredibly difficult thing to do because the PDF is essentially position language. You know, where does this character go relative to the lower left hand corner of the page? So even if you take an Excel spreadsheet and print it to a PDF, the PDF language does not know anything about the structure of that data. So to get it back out again.
Moritz StefanerI say PDF is where data goes to die, right? Yeah.
Enrico BertiniSo this is so much data in this PDF.
Scott KleinYeah, it's the biggest, it's really quite incredible. And we do a tremendous amount of this. We do a tremendous amount of taking data out of PDF's, a tremendous amount of web scraping. So a lot of the tools that we build for ourselves and open source are related to those. So the PDF's are a very difficult problem. And for dollars, for docs, there are these multi thousand page PDF's that we have to figure out what to do, we have to cope with. So in concert with another newsroom in Buenos Aires and an organization called Open News, we built a tool called Tabula, which is a little Java app you run on your desktop and you can load a PDF and it will try to find the tabular information on a page and then export it, structure it for you. So we build that sort of thing, try to open source it as much as possible. A lot of times we use things like MySQL and postgres and qgis and a lot of the open source toolkit that I think would be very familiar to listeners. And also we'll build a lot of things ourselves. A lot of times if there's an existing tool to do something, we'll kind of think our way into a more difficult task so that we'll build it ourselves. If something's too easy to do, sometimes we'll try to be more ambitious and build it ourselves.
Enrico BertiniSure. So people who come working with you, do they get some training directly within ProPublica or they are already skilled in all these kind of technical things?
Scott KleinLike I said, at the top there are sort of three legs to the stool. There's design to be a great news app developer. So there's design, there's software development, and there's sort of journalism, data journalism, and I find actually that. But the easiest of those three to teach is the coding part. The hardest part. I mean, design requires talent, and journalism isn't something that everybody can get used to and thinking journalistically and calling people on the phone and asking the questions they don't want to answer. These are things that not everybody can do. Coding not only because I think the people who work for me are amazing and brilliant, but because we're writing code typically to a framework, a lot of things have become much easier and we're able to do much more quickly. So rails or django or backbone have made really good decisions about things. So they are coding to that framework. They can start just doing what coders call kind of the business logic, rather than thinking about the data model and which you're making a lot of low level decisions that make you slow. You don't really need to do those anymore. If you have rails or you have Django or backbone or things like that, and your code is mostly to kind of encapsulate a real world thing.
Enrico BertiniDo you use r a lot? I know that r is very popular in New York Times, for instance. Do you have the same in your group?
Scott KleinIncreasingly so for data analysis, we tend to have a much more freewheeling environment. What people use, what that they're most comfortable with. So it's a much wider set of things. So there's a lot of r in the wider newsroom. We've got plenty of reporters here who are terrific data journalists in their own right, and so there's a lot of spss, especially among the wider newsroom. Lots of r. There are people who will use the Python statistical libraries as well. Python notebook is super cool, and I think I can see us using that a lot more, too. There's no sort of rule that you must use this unless it's in production and we have to worry about keeping it alive, then the choices go down. But if it's about analyzing your data or visualizing your data, just sort of pre publication visualization, you sort of can use what you like.
There's No Easy Path to Becoming a Data Journalist AI generated chapter summary:
There are lots of jobs out there for data journalists and visualizers. What matters most is that you have interesting URL's. Without a tremendous amount of experience behind you, it's the work that matters most.
Enrico BertiniSo I want to ask you, I'm sure that some of our listeners are people who want to become data journalists. So do you have any suggestions for someone who wants to become a person like those that you have in your team? So is there any kind of path that people can follow or. It's totally random, I don't know.
Scott KleinI mean, there's way more than there used to be. When we first started looking for people, they're just. The schools weren't yet kind of awake to this. We kind of people had to sort of make themselves and we had to find them where we could. Schools. Columbia has a terrific program that Mark Hanson does who's really smart, I think, is starting to really teach people how to think like journalists, with statistics and code and things like that. I also teach at the new school. I think we're trying to train people how to do this sort of thing or to think this way. But the easiest path, I mean, there's lots of jobs out there. A lot of people who are trying to hire data journalists and visualizers, to be frank, remains just doing the work. When I have a stack of resumes I'm looking at. I look at the resume last. The very first thing I look at is the URL's. Show me your portfolio. What have you done? You know, if it's something I've never seen before, something that's just very artfully done or carefully done, then I look at the rest of the application. It's the URL's that matter most. It's the work that matters most.
Enrico BertiniYeah. I think this is a very interesting general trend that portfolios tend to count much more than whatever kind of degree you have. I don't know.
Scott KleinYeah, well, there are no degrees for data journals, or at least not.
Enrico BertiniOh yeah.
Scott KleinSo. And the resumes, you know, it has, I mean, the kind of data journalism I do, I'm an amateur historian, so data journalism has been around for hundreds of years. But the kind that we do is new enough that there's nobody has been doing it for, you know, a decade. So, you know, what matters most is that you have interesting URL's. And if you have them on your own blog versus having them in a big newsroom, it doesn't matter to me.
Enrico BertiniSure.
Moritz StefanerSo if somebody wants to get started, you would say, just start an investigation and write a medium article. Is that.
Scott KleinOr. Yeah, or I mean, if you want to do visualization, I think that, you know, learn enough D3 and take a data set from data dot gov or from a municipal data contest or any of these other places, these government and corporate sources that are putting data out and find the story that's hidden inside it and either tell it narratively on medium or visually using D3 or even in Tableau. And I think that you can show what kind of visual and what kind of journalistic think you are. Without a tremendous amount of experience behind you.
What to Read Before Starting Data Journalism AI generated chapter summary:
There are many resources out there for a person who wants to know more about journalism. From proPublica. org to open news, these are great places to start. All of the most sophisticated newsrooms are not secretive about their work.
Enrico BertiniSo are there any interesting resources out there for a person who has a technical background? Maybe, I don't know, a major in computer science who wants to know more about journalism, investigative journalism? What do you suggest to this kind of person who has already the skills, the technical skills to do all these things, the data analysis, the visualization part, but doesn't know much about journalism in general?
Scott KleinSo I would start. So there's a, we have our own blog that we maintain on our site. It's proPublica.org nerds, which I recommend everybody read. And it's where we put a lot of our sort of nerdier things, but also our methodology posts and things like that. But also there's the organization I was talking about a little while ago, open news, which I think might formally be called night Mozilla Open News. They have a site called Source, which I think we can put in the show notes. But that is a terrific website where lots of practitioners will write up very technical explanations about what they do and how they do it. And I think that a technical audience that would understand the statistics and understand the code and understand the visualization techniques would be right at home reading this. And it's often written very smart and funny and self deprecating and ways that only nerds can write. And it sort of explains these. They did these things, and people from the Times write in it all the time. People from the LA Times, the Washington Post write in it all the time. Those are all terrific data newsrooms, and these are great places to start. There's also an email list put out by NICAR, which is the National Institute for Computer Assisted reporting. So there's an email list called Nikarl that has, it's very active, and all of the news nerds are on it and arguing with each other and helping each other.
Enrico BertiniIs there anything like books that you would suggest?
Scott KleinWell, there's the data journalism handbook, which I know that you guys have talked about on the show, which is really terrific. I also would, I would focus really hard on just the work. I mean, follow, you can, you know, I had to out there, right? It's there, right. I mean, I had to assemble a sort of list of really good dated newsrooms that talk about their work a lot. Lot for the class I taught last semester. And, you know, there are tumblrs and Twitter lists and pinterests and, you know, all of the data visualization teams in the most sophisticated newsrooms are not secretive about their work and are sort of explaining what they did. And the New York Times has one where they will put up first drafts of their visualizations, which are often hysterical, often beautiful, all themselves.
Data stories: The process of storytelling AI generated chapter summary:
Scott: Impact is a big thing. We keep discussing it like each of the last five episodes has some reference to that. And I have to say that we didn't really have so far an episode talking explicitly about data journalism and the process. So I'm really glad, Scott, thanks a lot for being here.
Enrico BertiniOkay, I think I would conclude here. Moritz, do you have other questions you want to ask?
Moritz StefanerNo, but it's been super fascinating. I think that's super interesting. And I really love the work you're doing at pro public. I'll continue to follow that.
Enrico BertiniYeah, I have to say that.
Moritz StefanerAnd we need to follow up in a year or so talking about the impact and more measures measurements of this.
Scott KleinYes, absolutely.
Enrico BertiniImpact is a big thing. Yeah, yeah, absolutely. Absolutely.
Scott KleinYeah.
Moritz StefanerWe keep discussing it like each of the last five episodes has some reference to that.
Scott KleinYeah, yeah.
Enrico BertiniAnd I have to say that we didn't really have so far an episode talking explicitly about data journalism and the process. Right. How it looks like. So I'm really glad, Scott, thanks a lot for being here. That's very useful, and I'm sure that it's going to be useful for our listeners.
Scott KleinThanks for having me.
Enrico BertiniData stories is brought to you by click, who allows you to explore the hidden relationships within your data that lead to meaningful insights. Let your instincts lead the way to create personalized visualizations and dynamic dashboards with Qlik sense, which you can download for free at Qlik clique is Qlik. Don't forget the data stories part.