Episodes
Audio
Chapters (AI generated)
Speakers
Transcript
"I Quant NY" Finding Surprising Stories in NYC Open Data with Ben Wellington
This episode of data stories is sponsored by Quadrigram, a web based application designed to bring data stories to life. You can create and share interactive data stories without the need of any coding skills. Enrico: I had a nice little holiday project that gained some traction.
Ben WellingtonI think that's one of the scariest things about data science, which is you see something and you come up with an explanation and you kind of trust your explanation. But in this case, the explanation was a data error.
Moritz StefanerThis episode of data stories is sponsored by Quadrigram, a web based application designed to bring data stories to life. With Quadrigram, you can create and share interactive data stories without the need of any coding skills. Check it out@quadrigram.com. hey, everyone. Data story 66. Hi, Enrico. How are you doing?
Enrico BertiniI'm doing great. How about you?
Moritz StefanerExcellent. Yes, very good. I had a good start into the year, I can say.
Enrico BertiniI saw it.
Moritz StefanerYou followed me on Twitter.
Enrico BertiniYeah, of course. Yeah.
Moritz StefanerNo, I had a nice little holiday project that gained some traction over the year change. It was great fun. So I did one thing. I took a list of journals, German place names, like all the towns and villages and so on, and I was looking for the most common endings. Like in Germany, we have a lot of Hausen and bergs and ingens, you know, like the typical endings of the names. And I made like, heat maps across Germany where these endings would occur most prominently. Super interesting. A little fun experiment. And I just put it up on the web and on GitHub, and then people started to pick it up and remix it. And somebody made a Romanian version, somebody made a slovakian version. There's a us version now. I get pull requests on my GitHub repo, you know, it's like one of these things that starts small and people just pick it up and do something with it. It's a great experience.
Enrico BertiniIs there any Italian version yet?
Moritz StefanerNot yet. So you can totally do that.
Enrico BertiniOh, yeah, sure.
Moritz StefanerThere's a little. Yeah, I think it can be done in maybe an hour or something to do a new localized version so we can give it a shot.
Enrico BertiniYeah, absolutely.
Moritz StefanerYeah, yeah, yeah. And it's a nice thing. And it's sort of these things, they excite me because all these data sources out there, we can work with them and just do something with them and find something out about the world, even if it's just small things like place name structures. Yeah. I think that brings us also to our guest because he does really interesting things with open data as well. Welcome to the show. Ben Wellington. Hi, Ben.
Open Data in the World AI generated chapter summary:
Yeah. I think that brings us also to our guest because he does really interesting things with open data as well. Ben Wellington. Welcome to the show.
Moritz StefanerYeah, yeah, yeah. And it's a nice thing. And it's sort of these things, they excite me because all these data sources out there, we can work with them and just do something with them and find something out about the world, even if it's just small things like place name structures. Yeah. I think that brings us also to our guest because he does really interesting things with open data as well. Welcome to the show. Ben Wellington. Hi, Ben.
Enrico BertiniHi, Ben.
Ben WellingtonHi. It's nice to be here.
Moritz StefanerGreat to have you. Ben, can you tell us a bit about yourself? Who are you? What are you doing? And so on?
iQuant: The New Hip Hop in Data Science AI generated chapter summary:
Ben Kohn: I'm an assistant visiting professor at the Pratt Institute in Brooklyn. He says he has a vision for statistics for urban planners. Kohn also runs a blog called iquant New York, where he quantifies public data. How did that blog come about?
Moritz StefanerGreat to have you. Ben, can you tell us a bit about yourself? Who are you? What are you doing? And so on?
Ben WellingtonSure. Yeah. So I'm a. I guess the word now is data scientist, right? I did.
Moritz StefanerCurrently it's still data scientists.
Ben WellingtonThat's the new hip thing. So I have a background in natural language processing. I did my PhD at NYU, and now I'm an assistant visiting professor at the Pratt Institute in Brooklyn, where I teach urban planners about statistics, which is kind of fun. And also I work in the financial industry along the way. And lastly, run this blog called iquant New York, where I seek to quantify the data that New York City and state puts out that kind of affects the lives of New Yorkers.
Moritz StefanerYeah, I saw that. And so I also saw your target visualize and some of your projects. I was really fascinated how much value you can create from these, like, public data sources by looking at them, asking questions, and going on these little data investigations. I think that's very fascinating. How did you start that blog? Or that tumblr tumble log, or how you call it? How did that come? Did you just think, could be fun to do it, or did you have the plan to make it as big as it has become now? How did that come about?
Ben WellingtonDefinitely not planning to make it big when it started. It goes back to me marrying an urban planner. My wife, Leslie, she was at Pratt and was taking a statistics class, which was a typical statistics class. You have your textbook, you have your problems, you're learning about t tests and correlations. But she'd come home every day after class and say, why do I have to learn this? This is kind of silly. And I'm like, no, it's not. This is so important. So the conversation around the dinner table and what I realized is that we're teaching our urban planner statistics from a textbook. Along the same time, we're releasing public city datasets in the cities they're learning in. So I realized there's a big disconnect here with the way we're teaching students and the reality of today's time. So I went to the school and I said, hey, my wife goes here. You haven't met. You may not know me, but I have a vision for statistics for urban planners. Let's use open data and public data and make it applied so that instead of learning about fake problems, they're learning about the problems that they're interested in locally, whether it's complaints to 311 lines or stop and frisk or whatever is happening current events. So to make stats maybe just a little more fun. And then from there, I had to make fun homework problems and those homework problems. I started to search for little nuggets of interesting findings in public data. And as I did that. I thought maybe more than just the twelve people in the class might be interested, and hence the blog. I started putting my findings from homework on the blog, and every time I did, they got picked up by a media organization, and I was like, whoa, that was cool. And then I got hooked. So here I am today.
Enrico BertiniThat's an amazing story. So, can you tell us a little bit about few of the favorite posts or investigations that you had? I know that you have quite a few ones by now. Are there any specific ones that you, you want to mention?
Favorite Posts of 2015 AI generated chapter summary:
There are some of your favorite posts. They range from sort of fun to maybe policy and more serious things. Find the fast food chain in New York that had the worst health inspection scores versus the best. Also find the person who lived farthest from a Starbucks in Manhattan.
Enrico BertiniThat's an amazing story. So, can you tell us a little bit about few of the favorite posts or investigations that you had? I know that you have quite a few ones by now. Are there any specific ones that you, you want to mention?
Ben WellingtonThere are some. They range from sort of fun to maybe policy and more serious things. On the fun side were things like finding the fast food chain in New York that had the worst health inspection scores versus the best. And the finding there was that White Castle had the cleanest restaurants, which was a shock for New Yorkers. If you know White Castle, that would be a little bit strange. So that was kind of fun. Or I quantified how far the average person in ManhAttan is from a Starbucks. And it turns out that about half of the city is within four blocks, which was half of Manhattan, and then went and found the person who lived farthest from a Starbucks in Manhattan. I didn't go and confront the person, but I did find them. I said, you live farthest. So those are the kind of things that may not change everyone's lives. But then on the flip side are things I started to look at the taxi industry, and I guess one of my favorites there was exploring tipping behavior. And when I dove in, I realized that the computers in the back of New York City cabs, which, by the way, when you pay with credit card, a little box comes up and it says, would you like to tip 20%, 25, 30 or other? So those are your four options, 20%, 25, 30, or other. And so I found most people click 20%. But what was strange was that when people clicked 20%, different things would happen. It turned out that half of the fleet in New York City was programmed to tip on top of tolls and taxes, and the other half wasn't depending on the vendor of the machine in the back. So two different programmers kind of calculated tip differently. So half of our fleet was providing three to $400 more a year to each cab driver if they had that computer in the back versus the other, which was just surprising in such a highly regulated industry. So when I wrote about that, the city actually worked with the vendor, and they reprogrammed half the cabs in New York within a few weeks.
The Upside of Tipping Cabs AI generated chapter summary:
Benjamin: A blog post exploring tipping behavior in the taxi industry was really interesting. It turned out that half of the fleet in New York City was programmed to tip on top of tolls and taxes. Two different programmers kind of calculated tip differently. Those are the kind of things that can end up making changes.
Ben WellingtonThere are some. They range from sort of fun to maybe policy and more serious things. On the fun side were things like finding the fast food chain in New York that had the worst health inspection scores versus the best. And the finding there was that White Castle had the cleanest restaurants, which was a shock for New Yorkers. If you know White Castle, that would be a little bit strange. So that was kind of fun. Or I quantified how far the average person in ManhAttan is from a Starbucks. And it turns out that about half of the city is within four blocks, which was half of Manhattan, and then went and found the person who lived farthest from a Starbucks in Manhattan. I didn't go and confront the person, but I did find them. I said, you live farthest. So those are the kind of things that may not change everyone's lives. But then on the flip side are things I started to look at the taxi industry, and I guess one of my favorites there was exploring tipping behavior. And when I dove in, I realized that the computers in the back of New York City cabs, which, by the way, when you pay with credit card, a little box comes up and it says, would you like to tip 20%, 25, 30 or other? So those are your four options, 20%, 25, 30, or other. And so I found most people click 20%. But what was strange was that when people clicked 20%, different things would happen. It turned out that half of the fleet in New York City was programmed to tip on top of tolls and taxes, and the other half wasn't depending on the vendor of the machine in the back. So two different programmers kind of calculated tip differently. So half of our fleet was providing three to $400 more a year to each cab driver if they had that computer in the back versus the other, which was just surprising in such a highly regulated industry. So when I wrote about that, the city actually worked with the vendor, and they reprogrammed half the cabs in New York within a few weeks.
Moritz StefanerWow.
Ben WellingtonSo today, everybody tips on top of tolls and taxes. Yeah, I know. But at least it's equitable. At least I don't have to wonder what the computer is in the back of the cab. So those are the kind of things that can end up actually making changes, which is really, really fun.
Moritz StefanerI think that blog post was really interesting because it started from you being curious about why there would be such a peak in the data at that point, or why the data would be skewed in that strange way. And so just inspecting the raw data led to this unfolding of, well, how did that come about? And I think many people would have stopped at that point. Just said, yeah, tipping data is funny. I think it's great that you actually, you look for the cause behind that. Right? And this makes the whole story in the end.
Ben WellingtonAnd I think what was really neat about it is it actually started with a magazine article that another publication had put out that basically made a histogram of the tipping amounts. And I noticed that everyone was tipping 16%, but no one was tipping 14% or, sorry, excuse me, it was 21% versus 19. So I understood why everyone tipped 20 because there was the 20 button. But all the people who hit other, why were they all choosing 21 instead of 19 when they did the math in their head?
Moritz StefanerIt doesn't make sense.
Ben WellingtonIt didn't make sense. So these are these gut checks. When you're doing data science, you look at a data set and you have to always check in. Does this make sense to me? Or is there something that's worth pulling a thread and seeing where it goes? I saw that in a publication and I was like, something is wrong with their math. Because no one just randomly rounds up to 21 and down, and it was ten times more people tipping 21%. It turned out they had made a math error because they hadn't realized that the computers were different. And that's where it led me to this finding. It wasn't that I went out seeking it. It's that you notice that little, you know, that little sign that something's not exactly like you'd expect, and you start pulling the thread and things unravel to really interesting stories.
Enrico BertiniYeah, that's what I really like of this kind of blog post, Ben, because so, first of all, I have to say that living in New York makes me. When I read this post, I can relate so much more than if it were something different. So I think that's a huge component there. But what I really like is that sometimes the way you write this blog post is at the beginning, you show something and then you ask the reader, did you see anything strange here? And it's so fascinating, I think, especially this one that you were just mentioning when I reread the blog post this morning. And it's like there is this chart and there are clearly a couple of peaks which you can clearly explain. And then you ask, is there anything weird here? And I think it's a fantastic kind of game and way to engage people.
Ben WellingtonYeah. And in the end, so much of data science is that sort of weaving in and out and iteratively doing analysis and then seeing where it brings you and then doing more analysis and seeing where it brings you. It's a back and forth. You don't always know where you're heading from the start. So that's kind of fun.
No More Data Scrills: Unsolved Mysteries AI generated chapter summary:
Do you also have any unsolved mysteries? Oh, man. That's in the data. There's some interesting peaks in. certain types of crimes. You see a data anomaly and someone says, why is that way? It's post hoc rationalization.
Moritz StefanerDo you also have any unsolved mysteries?
Ben WellingtonOh, man. Unsolved mysteries. That's in the data.
Moritz StefanerYeah. Where you had something funny in the data and you still can't tell why it would be so funny, like the distribution or something.
Ben WellingtonThere's some interesting peaks in. Sometimes you look at the time of day of certain types of crimes or things like that and you see strange little bumps. And you always ask yourself, is that noise or is that real? And I'm asking myself all the time. So I'm constantly coming across mysteries and I don't have, like a single one that's, you know, that one that got away from me.
Moritz StefanerThe fun thing that keeps you awake.
Ben WellingtonAt night, I don't have one of those, but I am constantly saying, that's weird. You know, if I can't explain it, then unless it's super, super, super obvious, I sometimes just say, all right, well, maybe I'll kind of put that to bed and not make something out of nothing.
Moritz StefanerYeah. That's also something I tell my clients always, like, step one is don't trust the data.
Ben WellingtonOh, yeah.
Moritz StefanerIt's like the first step in your data science process. What's great about it, usually it's a good heuristic.
Ben WellingtonWhat's great about this particular article is that they went and interviewed the head of the New York City Taxi Workers alliance because another finding they had was that people tip more during the evening rush hour in this article. And so they went to the taxi worker alliance and said, can you explain why people tip more during the evening rush hour? And she was saying, hmm, I think people are very connected during rush hour to their taxi drivers, and they have an emotional connection.
Moritz StefanerEmpathy.
Ben WellingtonYeah, empathy. And I'm like, you know, that explains maybe 1%, but not 10%. And it turned out that it was the same problem. There was a rush hour surcharge that was being tipped on top of. And so the numerator and denominator was off on the math. And so no one was tipping more at rush hour. The 20% button was tipping more at rush hour. And so you have one of these things that's so easy to explain. You see a data anomaly and someone says, why is that way we as humans are going to be able to explain it. Right. It's post hoc rationalization. I think that's one of the scariest things about data science, which is you see something and you come up with an explanation and you kind of trust your explanation. But in this case, the explanation was a data error. And yet here we are interviewing people about why it's there and them kind of making statements. So that's always scary.
Moritz StefanerYeah, everybody's speculating. Yeah. There's one more project I found really interesting. It's about parking spots and parking tickets, and it's a good story about impact as well, I think, because, yeah, you actually achieved quite something. Do you want to tell us a bit more about this project?
Parking Tickets and the Impact AI generated chapter summary:
A new blog project focuses on parking spots and parking tickets. Two fire hydrants in New York together were making almost $50,000 a year in tickets. The Department of Transportation responded by repainting parking spots. The blog is about making cities better, slowly but better nonetheless.
Moritz StefanerYeah, everybody's speculating. Yeah. There's one more project I found really interesting. It's about parking spots and parking tickets, and it's a good story about impact as well, I think, because, yeah, you actually achieved quite something. Do you want to tell us a bit more about this project?
Ben WellingtonYeah, sure. It kind of starts with my fascination with parking tickets. They seem kind of mundane, but the way I kind of run the blog is I take an idea, like a dataset. Sometimes I have an idea and I go look for data, but often I have a data set and I just go hunting for ideas. And this is a great case. You have parking tickets and you start asking questions. Where are the most gold cars in New York? Are they in rich neighborhoods or poor neighborhoods? I don't know. I haven't looked that up yet. But that's fascinating. Are there quotas? Are there more tickets being given at the end of the month? Are police really random when they give out tickets? Or can we predict where they're going to walk? And it turns out you can predict. I was able to use that to my advantage in my neighborhood. But one of the easiest things to do is just to count any sort of aggregation on a column in a dataset. In this case I was looking for in New York, if you park within 15ft of a fire hydrant, you get a very expensive ticket and possibly towed away, which can cost you many hundreds of dollars. So I was curious which hydrant in New York had the most parking tickets associated with it? That's a couple of lines of code to do an aggregation. So it turned out that there were these two hydrants in the lower east side that together were making almost. I think it was like, $50,000 a year in tickets, which is pretty amazing. And as I investigated, it just became clear that it was just a confusing situation for people parking. There was what appeared to be a bike lane. So imagine you have a curb with a hydrant, and then you have kind of a bike lane along the curb and then parking spots, kind of like those protected bike lanes that you see in different cities. So in that case, can you park in front of the hydrant? You're not really in front of it.
Moritz StefanerYou're not blocking it.
Ben WellingtonYou're not blocking it. So the answer is, if that was actually a bike lane, you can. But on this street, the thing with all the bikes on it that people use to bike is not a bike lane. It's called a curb extension, which is meant to widen the sidewalk and slow down traffic. Now, you wouldn't know that as a parker. So people park there. And for.
Moritz StefanerThat's a fairly subtle distinction.
Ben WellingtonIt's a very subtle distinction. And for, you know, for 510 years, I don't know exactly how long, at least five years, the NYPD would ticket the spot because the Department of Transportation painted a parking spot, and the police department disagreed with this designation, and they fought about this on the windshields of New Yorkers for a very long time. So in this case, I wrote about it. I think one of the hydrants was making over 33,000. The other one was making 25,000. These are hydrants making more than minimum wage, just being hydrants. Right. It's like. It's pretty amazing. So I wrote about this, and the media loves this stuff. It was first in the post, and then it made its way to the London Daily Mail. So it was the hydrant heard around the world. Why? Because it was collecting tickets. And the Department of Transportation, to their credit, acted, within a few weeks, repainted the parking spots, put some zebra stripes, or whatever you want to call them, to stop people from parking, and the problem was solved. So this is like taking data at a very local level and finding things in a local community that maybe aren't on the radar of everybody, but are truly making our cities better, slowly but better nonetheless. And it's actually changing our streets, right?
Ben WellingtonYeah.
Enrico BertiniThat's an amazing story. I so much like the fact that starting from a few data points, a little bit of an analysis, and a blog post, you manage to make a very interesting and impactful change. This doesn't happen very often. I think I've. I found myself in the past debating with some people on give me examples of data or data visualizations that have an impact. And what is impact, right? And then you read this blog post and it's clear, I mean, here we go. That's impact.
Ben WellingtonSmall impact, but it's impact on the last.
Enrico BertiniIt is, it is, right?
Ben WellingtonAnd it's a sign of what could come, right? I mean it's small because look, so much of data is still closed off, so it's hard to have an impact when you don't have access to any of the raw data. So as things start to open up, I think we'll see more and more of an impact. And so this is to me, I was really excited in the same way you were in that I saw it as a sign that, look, a person who has no connection with government and just some data skills can take open data and change a neighborhood. So I was very excited when they painted that street. I got to tell you, I think.
Enrico BertiniThat's one of the most exciting aspects of working with data. Absolutely.
Ben WellingtonYeah.
Quadrigram.com AI generated chapter summary:
Quadrigram.com is a web based application to create and share these types of data stories on the Internet. Users can merge graphic elements such as texts, images, videos and data visualization modules into a single data story. Free and you just need a Gmail account to start building and sharing your data stories.
Moritz StefanerThat's a good time to take a little break and talk about our sponsor this week. As you all know, modern life is complex and this creates the need for digital creators to support their arguments with facts and figures. A data based narrative which intertwines annotations and media elements with data visualizations, is the perfect way to communicate complex realities. It's not only important to understand and process lots of information, but we also need to have the tools to communicate findings in a structured and nice way. Now, Quadrigram.com is a web based application to create and share these types of data stories on the Internet. Its intuitive interface allows users to design interactive narratives by merging graphic elements such as texts, images, videos and data visualization modules into a single data story. And you can then publish your work as a fully functional website or interactive slide presentation without the need of any coding skills. Readers can browse the story and discover their own findings, basically create their own unique synthesis. Quadrigram.com is a product by Vestiario, a design firm with more than ten years of experience in the wonderful field of data visualization. Quadrigram is free and you just need a Gmail account to start building and sharing your data stories. So check it out@quadrigram.com. that's quadrigr am.com. and now back to the show.
The Process of Writing a Data Science Blog AI generated chapter summary:
Benjamin: Ideas just kind of come from living in the city and thinking about things that are interesting. You look for outliers. Each one can spawn an entire blog post. Given a dataset, I could write 50 blog posts on it.
Enrico BertiniSo Ben, I would like to talk a little bit about the process. How do you come up with the idea in the first place? And how do you retrieve or find the data, analyze the data, and then how do you design a very nice blog post with some visualizations and all the rest. Can you walk us through a little bit of what are the steps that you follow and what can happen there at each step?
Ben WellingtonYeah, I think there are. So we'll start with the idea. And they come from two places, those which are sort of hypothesis driven and those that are data driven. I think that's sort of a commonality in data science more broadly. Are you setting out to find, you have a specific hunch that when I look at this, I'll find this. In the case of, are there parking ticket quotas? Right. That's an idea. That's a hypothesis that you can go and explore. So those come up. You live in a city, you walk around and you notice things. And I'm often just saying to myself, can I quantify that? Can I understand more about what I'm looking at? So ideas just kind of come from living in the city and thinking about things that are interesting. So that's one line of ideas and the other is, as I mentioned earlier, sort of data exploration on a new data set when a new dataset's released. For example, just a week or two ago, New York City released its crime data for the first time. This is incident level, so you can see where each individual robbery took place, or burglary. And that's a case where I just take the dataset and I just start exploring it. What do you look for? You look for outliers. It's funny because people say, how do you know what to write about? You take a column of data and you look for the most of anything, or the least of anything. And sometimes there's a story there. The most blocked driveway in New York. Why not the area with the most crime? Which block has the most cars stolen, which block has the most burglaries? Every one of those is its own explanation. And to me, each one can spawn an entire blog post. And so I never feel sort of constrained. Given a dataset, I could write 50 blog posts on it. And in fact, I always tell myself, don't, because people will get bored of the same data over and over again. But you just take. Sometimes it doesn't have to be super complicated. It can be a single column or two where you're looking. A great example of that, by the way, is city bike data. You can do all sorts of things. The data set has locations and people doing all different things. But I just was exploring which stations have the most males versus the most females. And so that's an example of taking one column of data and then splitting it by location, and you get a beautiful map of the city based on the gender of city bike riders, which shows that midtown, where all the offices are, is predominantly male, riding up to 80 or 90%. And then if you want to find females, your best chance is heading over to Brooklyn, where they're more likely to ride the bike. So that just came up from just looking at the data and taking one column at a time.
Enrico BertiniNice. So you have some sort of data question templates, right?
Ben WellingtonYeah, I'd say so, I think. Yeah. You can describe it like that? Yeah. And once you kind of come up with the way you find things, it's easy to write some. It's relatively easy to write some code to start to replicate things for spatial data, I'm always interested in how it affects different neighborhoods. So I can have a little utility that can take any. Any longitude, latitude and join neighborhoods and police precincts and council districts, because these are the things that people really connect to. And so when you start to get kind of into a flow, you can actually turn these data analyses around very, very quickly. The example of the hygien is a few lines of code. It's descriptive statistics. It's just counting things. I have yet to do any sort of sophisticated modeling. I mean, I do that more broadly in my life, but on the blog, I haven't put out a predictive model yet. It's just describing things, and that's a lot of fun.
Moritz StefanerAnd, I mean, you have, because you limit yourself to New York, you can sort of reuse the same techniques over and over, which is very smart, too. Right. So you don't have to start from zero every time. So that's a smart choice.
Ben WellingtonIt's also important to understand what you're working with. Right. When I see crime in New York, I understand it. When I see crime in Seattle, I need to call my friends and say, why is this area like this? That context, when you're exploring something, helps a lot in data analysis. If you don't know and understand your variables and what you're looking at, then those hunches won't follow through.
Moritz StefanerYeah. Just technically speaking, which tools do you use? Or also which tools can you recommend, maybe to people who would like to do similar things for their city?
Ben WellingtonYeah. Well, so one dirty secret, which I'll share with you if you don't tell anyone, is that I do use Excel, and I'll tell you why. Let me justify my choice. I don't use it for everything, but I'll tell you why. An Excel pivot table is really fast and good accounting.
Beyond Excel: The visualization tools you use AI generated chapter summary:
An Excel pivot table is really fast and good accounting. The best way to learn something is to have a story to tell. Using GIS software empowers you to do really, really fun and cool things. Are there any other or specific visualization tools that you use in the Process other than those guys?
Ben WellingtonYeah. Well, so one dirty secret, which I'll share with you if you don't tell anyone, is that I do use Excel, and I'll tell you why. Let me justify my choice. I don't use it for everything, but I'll tell you why. An Excel pivot table is really fast and good accounting.
Enrico BertiniOh yeah, absolutely.
Ben WellingtonI mean, I don't know what, it's just less time than me typing in the name of the file in Python and doing like, I can double click on my CSV drag and drop and get counts of of anything.
Enrico BertiniYou are a sinner, Ben.
Ben WellingtonI know, I'm embarrassed to admit this, but as long as you don't tell anyone, we're good. So I'm not afraid of an Excel pivot table for quick and dirty analyses that are small and can be handled with that. Excel can't do something like a median, God forbid, in a pivot table. So it's not perfect for everything, but for counting things, it's great. If I'm not using that, then I'm likely using ipython and using pandas within that, which is that data science python toolkit. I love notebook environments, I really really do. I don't know where I'd be without them. And ipython notebook is awesome for me in my workflow. Then lastly, QGIS, which is spatial analysis software. It's free, it's open source, people are intimidated by it. But the best way to learn something is to have a story to tell. Because if you have a story to tell, you go on stack overflow and you figure out how to tell it. So I taught myself GIS and QGIs, and when I'm stuck I can email friends or look around on the web, and it's allowed me to do such fun things. Planners are used to the software, they use it all the time. GIS software. And I'm a computer scientist, I'd never really gotten into it, but the operations are so natural to me. Intersections and unions and voronoi, polygons and nerdy things abound in the software that if you think kind of mathematically, using GIS software empowers you to do really, really fun and cool things.
Enrico BertiniSo are there any other or specific visualization tool that you use in the.
Ben WellingtonProcess other than those guys? And then lastly, I should mention cartodB. So I'm not, absolutely, I'm not a. So in fact, you started the conversation mentioning the visualized conference. And you know, to be honest with you, I was a bit terrified to be there. I mean, I'm not artistic. If you go to my site, you'll see kind of scrappy visualizations here and there. Some of the maps that, the maps that are cool and look good I made in cartoo DB. I don't have web, I've never taken the time to learn web tools and things like that, so it's just not in my sweet spot. So I like telling stories and I use ugly sometimes charts to show what I'm trying to say. Thank God for CartooddB, because I can upload shapefiles and other geographic data from GIS, where I'm doing it locally, into the web, where I can craft these dynamic maps. And so that's a great way to learn too, as long as you have some sort of longitude, of latitude, or some sort of polygons. Cartoodb allows you to be a kind of novice web guy like me and still make a dynamic map. So that's huge for me.
In the Elevator With Data Scientists AI generated chapter summary:
How does that become the narrative and the blog post that you present in the end? Is it more like you take the best three plots and make a nice story around them? Thinking about your story and the narrative is probably one of the most important things about my work.
Moritz StefanerAnd so you have a question, you have a dataset, you have a question, or sometimes you just explore, you make a few maps. Now, how does that become the narrative and the blog post that you present in the end? So is the blog post usually more a recapping of all the steps you took? Is the order of things presented in the blog post often the order of your actions? Or is it more like you take the best three plots and make a nice story around them? What's the usual process leading up to the text of the narrative?
Ben WellingtonRight, so the taxi example we mentioned earlier was more kind of a narrative of my process, though I would say that's less often what I'm doing. There's a lot of great work being done on blogs and data science, but I set off to make mine accessible to everyone instead of interest to programmers or data scientists making that choice. I almost never get technical at all. In my posts, I found that thinking. Thinking about your story and the narrative is probably one of the most important things about my work. Thinking about storytelling, relating to people, finding the things that they care about, and putting. Sometimes I have little stories like running for the subway or my bike was stolen, or other things that I put in there to just help people relate and say to themselves, oh, actually, data can be cool. If I write about pandas and ipython, I will lose 99% of my audience on the first few sentences. I'm very careful to try to not make it technical. And so yeah, I do a lot of analysis. Usually I'll find the most interesting thing and then I will work that into a narrative. Not necessarily the way I found it, though there are a few exceptions where I think they're like good data science lessons I sprinkle in there.
Is Uber Raising Traffic? AI generated chapter summary:
There's a big fight between the mayor of New York, de Blasio, and Uber about congestion. The administration commissioned a study, which is going to be released any day now, and it cost $2 million. I'd love to see the city use public data that they're releasing and use that as a money saving technique.
Moritz StefanerDo you have a ratio of how much exploration you do and how much individual charts you would produce, and how much. Then make it into the blog post. Do you have a feeling?
Ben WellingtonI mean, it can be. I mean, it can be anywhere from. There are times where it's 90% storytelling, 10% analysis. To be honest, the hydrant's a great example. It's like, I found that it didn't take long. Now what? And then there are times where the analysis is a lot more arduous. Large data sets. The taxi data for taxi pickups and drop offs in New York is a very large dataset.
Moritz StefanerThat's a huge one. I love that dataset. Has hundreds of millions of riders.
Ben WellingtonThat's very cool.
Moritz StefanerWow.
Ben WellingtonOne thing it reminds me, there's a big fight between the mayor of New York, de Blasio, and Uber that's been going on for the last year and a half about congestion. And there is a question where the de Blasio administration was blaming the uptick in Ubers for slowing traffic down. Manhattan. In New York and around Manhattan, they were saying they were going to put a cap on the number of Ubers licenses they were going to give out for hire vehicles. And this became kind of a fight. So the administration commissioned this study, which is going to be released any day now, and it cost $2 million. Okay, $2 million. Wow. Now, if you take a step back and you understand that along the way, this taxi data has been released, and some of the Uber data has been released through freedom of information requests. And so online you can get access to this data. And if you start to look at some of the data journalism, I did a post in the New Yorker, but so did 538. Did an excellent article asking this exact question, is Uber raising traffic? And they do great, great analysis, and it tells really compelling answers to these questions for free for the government. Right. So we're seeing that. Now, do they pay attention? I don't know. But I do know that they're going to spend $2 million. And what the media is saying is that they came to the same conclusion that all of these journalists and bloggers did, where the city should be taking more advantage of that. I'd love to see them take this public data that they're releasing and listen to people's responses and use that as a money saving technique instead of spending $2 million to have someone else do the exact same thing. So I'm waiting to read the report, but all the media that I've read on it basically says that it's going to say the same thing that we're seeing in the blog.
Moritz StefanerYeah.
How do you start learning data science? AI generated chapter summary:
Benjamin: Having a story to tell in some sense is a big part of it. The most important thing is having a goal. I'm working on a book which is also going to be a little bit more international. Is it called iquant Planet?
Enrico BertiniSo, Ben, I have a different kind of question. So I'm pretty sure many of our listeners are aspiring data scientists and or visualizers. So do you have any suggestions of people who are just starting or didn't even start yet, but they love this field. They are excited about the idea of working with data. How do you start? Do you have any tips in this sense?
Ben WellingtonYeah, you know, I think that the having a story to tell in some sense is a big part of it. Right. So you need to motivate yourself to go do something. And I don't know about you, but I was never, I mean, I didn't do my summer reading as a kid. Don't tell my mom. I don't sit down with a textbook and read it end to end. There are people who do that. It's just not me. And so, as much as I want to learn in the world, I'm not the type of person, I'm always distracted by things, and I'm not the type of person that sits down and just studies something for the sake of studying it to increase my knowledge. I wish I were, but I'm not. But when I found that I had a story to tell and I just needed to learn x to tell it, that gets me up, that gets me learning these tools. And so I found that to be just incredibly motivating to get started. Like having that motivation, because it becomes, you know, you have your goal in mind and you iteratively learn, because you don't need to learn everything just to tell that story. You need to learn a few things, and then the next story you have to tell, you need to learn a few more things. And as you go, you know, you learn more and more and more and more. But, you know, to do the. To recreate the hydrogen analysis I did, you know, you need to learn how to do an aggregation, right? It's. You can find that in sort of any tutorial on a pandas or rich or even excel pivot tables. So I think that's, to me, the most important thing is having a goal. It doesn't have to be a story, but having something you want to know that is going to drive you from there. You just install some software and you start failing. You fail enough times and you read enough posts and you ask enough people for help until you start to make a little bit of progress. And each time you do it, you'll get faster and faster and faster. Until today, there are times that, like I said, I can turn analysis around in 510 minutes of something that I know I'm looking for and it's just because of experience of doing this type of work for so long and so many times.
Moritz StefanerI think it's a great tip to start with an interest. And now that you set this great model for New York, maybe we will soon see iquant daily or I quite Paris.
Ben WellingtonI've seen. I've seen.
Moritz StefanerThat would be great.
Ben WellingtonI've had a lot of people reach out, actually internationally who told me they were inspired by the blog and have been starting their own. And so that's been super cool. I've been super touched by that. And actually I'm working on a book which is also going to be a little bit more international with Riverhead, which is sort of a division of penguins. So that book is going to be taking international cities and doing the same thing, finding public datasets and then saying what it tells us about the way we act as humans in cities and the funny things we do. That's going to be a lot of work, and I have to write it, finish it by the end of this year. So I've got a lot of work set out for me.
Moritz StefanerOh, wow.
Ben WellingtonYeah, I can keep you busy.
Moritz StefanerIs it called iquant Planet?
Ben WellingtonYou know, my working title. I wrote iquant the world as a working title, though whether that's going to pass, I don't think that's going to pass. The scruff.
Moritz StefanerI love it.
Ben WellingtonOf a publishing house. I like it. But that's so far ahead. I got to actually write the darn thing. But I'm very excited about that. And that'll be a more outside of New York look at things.
Moritz StefanerThat sounds really amazing.
Ben WellingtonAnd it's led to. One of the other great things is that some of the interest has led to. I just recently had a. A little son, he's five months old.
Enrico BertiniAnd congratulations.
Ben WellingtonThank you. And my blogging has slowed down a little bit, I gotta be honest with you. He takes up.
Enrico BertiniYou're gonna be more productive afterwards.
Ben WellingtonDon't you worry.
Enrico BertiniYeah, absolutely.
Ben WellingtonKeep going.
Ben WellingtonBut it's been such a blessing. And along with the blog, like I said, there's been some international interest. And so I've gotten invites to speak in different countries, but we have this little guy, but we said, hey, let's go as a family. So I got him a passport when he was three weeks old. He's been to Tokyo, I mean, not Tokyo. Excuse me, backing up. He's been to Taipei, in Amsterdam along with me, just to give talks and stuff. So that's been a blessing. It's been really really cool. All just from doing some analysis on a blog. Right?
Enrico BertiniAmazing. Amazing. I mean, that's really amazing. So if you are listening to this, start a blog, find some interesting data, and write about some amazing stories.
Data stories for non-quantum people AI generated chapter summary:
Ben: So much of the world has yet to be quantified. People aren't looking at them in depth. So if you are listening to this, start a blog, find some interesting data, and write about some amazing stories.
Enrico BertiniAmazing. Amazing. I mean, that's really amazing. So if you are listening to this, start a blog, find some interesting data, and write about some amazing stories.
Ben WellingtonYeah. I mean, it doesn't have to be amazing, but so much of the world has yet to be quantified. These datasets just sit there?
Enrico BertiniOh, yeah, absolutely.
Ben WellingtonPeople aren't looking at them in depth. And so you take a city that just released their parking ticket data set and you'll find a million stories in there if you have the interest in drive. It's a matter of the fact that these don't seem. I don't think these datasets really fascinate people. And so they kind of sit unexplored, and so there's so much to find, and I think that's part of it.
Moritz StefanerThat's a great tip.
Ben WellingtonYeah.
Enrico BertiniBen, thanks a lot for coming on the show. That's a really amazing story.
Ben WellingtonOh, it's such a pleasure.
Enrico BertiniIt's really fascinating what you are doing. So if our listeners want to know more about you, where can they find more information?
Ben WellingtonThey can do, I guess iquant NYC will bring you to my site, which is a tumblr. The other one is iquantny dot tumblr.com. but iquantanyc, check it out there. I've got blog posts used to be weekly slowing down to maybe once a month or twice with the new baby, but lots more. And a mailing list on there, too, to stay in touch, as I kind of sometimes go back and forth with city officials. So there's some fun stuff going on, too, and sometimes just kind of find quirky things and more to come.
Ben WellingtonYep.
Enrico BertiniWell, thanks a lot. Thank you, Ben.
Moritz StefanerSuper inspiring stuff. Thanks, Ben.
Ben WellingtonThank you. Thanks so much for having me. Bye bye.
Enrico BertiniBye bye bye.
Ben WellingtonHey, guys, thanks for listening to data stories again. Before you leave, we have a request if you can spend a couple of minutes rating us on iTunes, that would be extremely helpful for the show. I also want to give you some information on the many ways you can get news directly from us. We are, of course, on twitter@twitter.com. Datastories. We have a Facebook page at Facebook, Facebook.com data stories podcast, and we now also have a newsletter. So if you want to get news directly into your inbox, go to our homepage, datastory es and look for the link that you find on the right. One last thing I want to tell you is that we love to get in touch with our listeners, especially if you want to suggest way to improve the show, amazing people you want us to invite or projects you want us to talk about, so do get in touch with us. That's all for now. See you next time. Thanks for listening to data stories.
Data Stories AI generated chapter summary:
This episode of Data Stories is sponsored by Quadrigram, a web based application designed to bring data stories to life. If you can spend a couple of minutes rating us on iTunes, that would be extremely helpful. We love to get in touch with our listeners, especially if you want to suggest way to improve the show.
Ben WellingtonHey, guys, thanks for listening to data stories again. Before you leave, we have a request if you can spend a couple of minutes rating us on iTunes, that would be extremely helpful for the show. I also want to give you some information on the many ways you can get news directly from us. We are, of course, on twitter@twitter.com. Datastories. We have a Facebook page at Facebook, Facebook.com data stories podcast, and we now also have a newsletter. So if you want to get news directly into your inbox, go to our homepage, datastory es and look for the link that you find on the right. One last thing I want to tell you is that we love to get in touch with our listeners, especially if you want to suggest way to improve the show, amazing people you want us to invite or projects you want us to talk about, so do get in touch with us. That's all for now. See you next time. Thanks for listening to data stories.
Moritz StefanerThis episode of Data Stories is sponsored by Quadrigram, a web based application designed to bring data stories to life. With Quadrigram, you can create and share interactive data stories without the need of any coding skills. Check it out@quadrigram.com.