Episodes
Audio
Chapters (AI generated)
Speakers
Transcript
Visualizing Bitcoin with Dan McGinn
Are you missing out on meaningful relationships hidden in your data? Unlock the whole story with Qlik sense through personalized visualizations and dynamic dashboards. Download for free at Qlik Datastories.
Dan McGinnI'm interested in the interaction between human behavior and algorithmic behavior.
Moritz StefanerData stories is brought to you by click. Are you missing out on meaningful relationships hidden in your data? Unlock the whole story with Qlik sense through personalized visualizations and dynamic dashboards, which you can download for free at Qlik Datastories. That's Qlik deries.
Surfing Stories AI generated chapter summary:
Moritz He. Enrico just returned from Portugal. Now his only goal is to quit working and learn how to surf. Maybe next year we do a surfing podcast.
Enrico BertiniHey, everyone. Welcome to a new episode of Data stories. Hey, Moritz.
Moritz StefanerHe. Enrico.
Enrico BertiniIt's been a while.
Moritz StefanerYeah, I'm just back from Portugal. I had some real sun on the beach. Now my only goal is to quit working and learn how to surf. That's my new primary objective.
Enrico BertiniSame here. Same here.
Moritz StefanerSo maybe next year we do a surfing podcast.
Enrico BertiniYeah, I spent some of my summer in San Diego, so I'm all for it now.
Moritz StefanerNice.
Enrico BertiniOh my God.
Wonders of the World: 2017 Recap AI generated chapter summary:
We published some documentation for a big tool I've been working on together with Christian Laesser in studio NAND for Deutsche Bahn. Maybe we should organize an episode around it. I think it could become an interesting case study.
Moritz StefanerSo what's new for you?
Enrico BertiniAll good. Yeah, the semester is. We are in the kind of like middle of the semester now. I'm doing a lot of teaching recently and. Yeah. Reasoning a lot about some these fundamentals. I hope to publish something soon. I have a few reflections on that.
Moritz StefanerYou wrote something on word clouds. It's the year of the word cloud in an unexpected turn of events.
Enrico BertiniWhat's going on?
Moritz StefanerSo we used some in the, in the election project. Now you write an article. It's been big on IEEE VIS. There were a few papers around word cloud. So there's a big render, I don't.
Enrico BertiniKnow, resurgence of the revenge of word clouds. Yeah, yeah. I just published a kind of like, as Lynn Cherni said, non academies summary of our, of our paper on word clouds. And it's, it's a post on medium. And yeah, if you guys are curious about that, you can read it. It's just five minutes read or so. Yeah.
Moritz StefanerAnd you found they can work if you use them, right.
Enrico BertiniHopefully they can work. In some cases they are much better than I expected. I'm kind of disappointed. What is really interesting is that simple lists work really well. Just use a list and people can get it.
Moritz StefanerSee the surfing career gets closer and closer.
Enrico BertiniYeah, why not? I'm all for it. Let's do that. How about you? What's going on?
Moritz StefanerYeah, things are good. We published some documentation for a big tool I've been working on together with Christian Laesser in studio NAND for Deutsche Bahn, the German railway company. It's like a super hardcore applied analytics tool. Lots of different views of the data, all web technology and developed also in a very agile way, but lots of data and also prediction data, like machine learning data behind it. So I think it's a super exciting project.
Enrico BertiniYeah, I love it. I love it.
Moritz StefanerYeah. Documentation is on my website. There's an article on fastcode design. And yeah, I'm super happy about this project. And now it's actually in use and we get to see how people use it. We get to measure results and how it affects workflows. So I'm really, really excited about something practical.
Enrico BertiniYeah. When you showed it to me the first time you gave me a preview, I was like, that's how visual analytics should be. I really like it. That's great. Maybe we should organize an episode around it. I think it's a really great.
Dan McGinnWe could.
Moritz StefanerYeah, yeah, maybe we should. I think it could become an interesting case study. Fingers crossed.
Enrico BertiniAnd Christian is kicking ass. Yeah, definitely. Just got our data stories visualization presented somewhere. What was that in the.
Moritz StefanerYeah, so there's an art of networks exhibition in Boston. I think it's a recurring thing. And the visualization he did of our past 100 episodes was selected to be shown there as an art piece. So I think that's fantastic. So shout out to Christian.
Enrico BertiniYeah. Awesome. So, okay, one last thing before we start. I think it's always good to remind everyone that our Patreon initiative is still on. So if you enjoy the show and you want to show your love, you can donate some money. We are still trying to switch to this new system. We didn't reach our goal yet. So if you want to help us reach our goal, go on Patreon and, yeah, and give us some bling bling, little one.
Patreon AI generated chapter summary:
Our Patreon initiative is still on. If you enjoy the show and you want to show your love, you can donate some money. We are still trying to switch to this new system. We didn't reach our goal yet.
Enrico BertiniYeah. Awesome. So, okay, one last thing before we start. I think it's always good to remind everyone that our Patreon initiative is still on. So if you enjoy the show and you want to show your love, you can donate some money. We are still trying to switch to this new system. We didn't reach our goal yet. So if you want to help us reach our goal, go on Patreon and, yeah, and give us some bling bling, little one.
Moritz StefanerYeah. Would be much appreciated. And yeah, and thanks to all who already, like, chipped in. And yeah, we haven't started really switching over. So you're not being charged yet. It's at the moment more a symbolic gesture, but we really appreciate all the contributions already.
Enrico BertiniYeah. Okay, I think we can start with our episode. It's been a long, long intro today, so today we talk about a topic that I'm so much interested in right now. So we talk about visualizing bitcoin. I don't know how many of you are familiar with bitcoin, so in the show we're gonna talk a little bit about what bitcoin is. And we have a special guest to talk about this person who's been developing, together with other colleagues, a very interesting visualization of the blockchain. This is Dan Magin from London, from Imperial College. And welcome, Dan. Hey there, how are you?
Visible Bitcoin: A Data Visualization AI generated chapter summary:
Today we talk about visualizing bitcoin. This is Dan Magin from London, from Imperial College. He's been developing a very interesting visualization of the blockchain. Use bitcoin as a toy dataset to do some bottom up experimental data science.
Enrico BertiniYeah. Okay, I think we can start with our episode. It's been a long, long intro today, so today we talk about a topic that I'm so much interested in right now. So we talk about visualizing bitcoin. I don't know how many of you are familiar with bitcoin, so in the show we're gonna talk a little bit about what bitcoin is. And we have a special guest to talk about this person who's been developing, together with other colleagues, a very interesting visualization of the blockchain. This is Dan Magin from London, from Imperial College. And welcome, Dan. Hey there, how are you?
Dan McGinnHey, Enrico. Hi, Moritz. I'm good, thank you.
Enrico BertiniSo we typically start by asking our guests to introduce themselves. So can you tell us a little bit about who are you, what's your background, what you're working on, what's your position?
Dan McGinnSure, sure. So slightly unusual to data science. For twelve years I was a financial derivatives trader. And then in 2012 13, I got interested in bitcoin during its first price spike. Trying to understand it, I was trying to read the source code, realized I couldn't, came back to college to do a master's in computer science, and I've stuck around since using bitcoin as a toy dataset to do some bottom up experimental data science on.
Enrico BertiniAwesome. So you got the bitcoin fever a few years back?
Dan McGinnYeah, you could say that.
Enrico BertiniGreat. Yeah, I think when I started looking into bitcoin, the first thing that I realized was like, oh, I'm too late anyway.
Dan McGinnMoving too fast.
Enrico BertiniIt's the first thing that I realize is like, okay, I'm too late. But it's cool. It's really cool.
Bitcoin at the Blockchain AI generated chapter summary:
Bitcoin is a peer to peer electronic cash system. The blockchain is one of the components of the bitcoin system. It's a chain of transactions grouped together and each block links to the previous block. There are lots of APIs that you can use just to connect with it and work with it.
Moritz StefanerFor our listeners who might not be fully aware of what bitcoin at the blockchain is in detail, can you give a brief rundown, just the basics?
Dan McGinnYeah, I mean, I think it's probably best described in the original paper by Satoshi Nakamoto back in 2008. It's simply a peer to peer electronic cash system. It's got a number of things that were added together. They were all invented at the time, but the way he combined them together has quite elegantly produced something that's now worth $6,000 of bitcoin. The motivation behind it to start with was to just digitally affect the cheap transfer of value anywhere in the world, 24/7 without any central authority or counterparty that could censor anyone's participation. And it's stood the test of time since 2009.
Moritz StefanerAnd what is the blockchain, if you.
Dan McGinnCan explain, is one of the components of the bitcoin system. I think people will try and dazzle you with blockchain and distributed ledger technology. I just explained to people, it's just a database. It's a novel and quite cumbersome database technology. It's got some pretty elegant ideas in it. It's just a chain of one block in the blockchain is a collection of transactions grouped together and each block links to the previous block. So you can securely guarantee that the data is continuous and ordered and immutable.
Enrico BertiniYeah.
Moritz StefanerAnd this is basically like the definite source of truth of who has transferred money to whom, like at least the identifiers of these people, right?
Dan McGinnYeah, it's immutable. The elegance is that everyone shares the same view of the data. There's no inconsistency in the data. Everyone's looking at the same guaranteed data without having that need for any central authority, managing participants identification or granting access to the system. The only rules to play with the database are you've got to play by the rules. And if you don't play by the rules, you can't play with the database.
Enrico BertiniYeah, yeah. And there's a whole protocol people can play, play with, and it's open source and you can. Yeah. There are lots of APIs that you can use just to connect with it and work with it. Right?
Dan McGinnYeah. So when people say you own a bitcoin, what do you actually own? You own a single or a collection of write permissions on that database, giving you the authority to transfer that value to someone else.
Enrico BertiniYeah. Maybe we should briefly also mention what a miner is and what mining is. So there are different kind of actors in the network.
What Is a Miner in Bitcoin? AI generated chapter summary:
The bitcoin system is also a peer to peer network of computers. Miners race amongst themselves in a lottery based competition to publish blocks to the blockchain. It's how the bitcoin economy is inflated, how bitcoins come into existence. There is no good visualization of the blockchain or bitcoin in general.
Enrico BertiniYeah. Maybe we should briefly also mention what a miner is and what mining is. So there are different kind of actors in the network.
Dan McGinnOkay, so apart from the blockchain, the bitcoin system, if you like, is also a peer to peer network of computers, all exchanging protocol conformant messages amongst themselves. Miners are simply specialist operators in that peer to peer network whose job it is is to verify transactions, make sure that they conform to the protocol, and they race amongst themselves in a lottery based competition to publish blocks to the blockchain. If they're the winner and they're the first publisher finding a solution to a block, then they're able to claim some bitcoins for themselves and get compensated for the work that they do, having proved that they've done the work to validate the transactions.
Enrico BertiniSo they are kind of minting bitcoins, right?
Dan McGinnYes. It's how the bitcoin economy is inflated, how bitcoins come into existence. But people argue that, and I mean, bitcoin's famously got a geometrically reducing number of bitcoins that will ever be minted, 21 million by the year, 21 40 something. And people argue that that is essentially deflationary because you've got a reducing number of bitcoins and it leads to people hoarding them in anticipation of them becoming an ever more restricted supply and going up in value, which may explain the $6,000 price tag today.
Enrico BertiniYeah, it's a super clever technologies. It just blew my mind when I looked into the details of how this works. I think the way this episode started, the idea is that me and Moritz were like, oh, my God. Is there any good visualization of the blockchain or bitcoin in general. So we did a little bit of research, and surprisingly, there's not much around. I don't know why, but it's such a.
Moritz StefanerThere's a lot of real time stuff, a lot of aggregated numbers, but no patterns or no, like, structures. Right. And it's such a fascinating data source, like, how many transactions are there by now? It must be billions of transactions by now. Right.
Dan McGinnRight. Now I did write that down. We got about 150 million there were by block 425,000, which is where my database goes up. So now we're at block 490,000. So when you consider that each transaction has one or more inputs and one or more outputs, if you start looking at the whole transaction graph, it's over a billion nodes to look at with some very interesting behaviors embedded in there.
Enrico BertiniMaybe that's because nobody has done it before.
Dan McGinnYes.
Enrico BertiniYeah. It's massive.
Moritz StefanerYeah. I also played around with it a bit, and I think what's kind of difficult about it is you just get hashes of everything, like the people who send money. It's just a hash. It's just a hash. And you're like, where is this person? Who are they? It's very cryptic and mystical, and at the same time, you feel like there's so much out there in terms of things we could learn now. Right. So I think there's an interesting tension. Yeah. So you did like a longer or a continuous project, I guess, working with this data and developing different visualizations and analytics tools. Can you tell us a bit what the process there was, where, how you developed these visualizations, what the main findings were maybe, and what you plan to do in the future?
Visualizations of the bitcoin bubble AI generated chapter summary:
Aims to bring bitcoin to life for visitors to Imperial's data observatory facility. Also interested in the interaction between human behavior and algorithmic behavior. Developed visualizations and analytics tools.
Moritz StefanerYeah. I also played around with it a bit, and I think what's kind of difficult about it is you just get hashes of everything, like the people who send money. It's just a hash. It's just a hash. And you're like, where is this person? Who are they? It's very cryptic and mystical, and at the same time, you feel like there's so much out there in terms of things we could learn now. Right. So I think there's an interesting tension. Yeah. So you did like a longer or a continuous project, I guess, working with this data and developing different visualizations and analytics tools. Can you tell us a bit what the process there was, where, how you developed these visualizations, what the main findings were maybe, and what you plan to do in the future?
Dan McGinnYeah. So my motivation really was twofold. One was to just kind of bring bitcoin to life for visitors that we've got to our data observatory facility, which is 64 screens for visualizing knowledge datasets. And secondly, I'm interested in the interaction between human behavior and algorithmic behavior. So if you think about the stock exchange, it's pretty much dominated by algorithms these days. But there's no top down view, there's no radar view on how these algorithms are operating and how they're interacting and if they're hitting resonant frequencies. So I figured if I could do a bit of visual signal processing to start with to see if indeed the algorithms did have certain signatures and features that could be detected, then you can start to filter out the algorithms, or at least spot the anomalous algorithms when they start to go wrong and try and avoid some of the flash crashes. Type stuff that we see. So that was the motivation behind it. And that was in the course, was lucky that we were having this data observatory being built at Imperial. And I figured having a complex data set with this interesting behavior that I suspected was going on would be a useful thing to visualize.
Black Spot in the Bitcoin Network AI generated chapter summary:
The first visualization shows the locations of the participating nodes in the network. The only interesting thing that really shook out of that was how difficult it was to see the concentration of nodes in China. It's quite an interesting and complex network full of evolving and different behaviors.
Enrico BertiniSo can you maybe describe some of the visualizations that you created? I know it's always a little hard to describe with words, how a visualization looks like, but I think you have at least two or three different visual representations there that represent different things. Can you walk us through these three different screens that you have?
Dan McGinnThe first and simplest is I started trying to view the peer to peer network. I was trying to look at which computers were connected to which computers and who was originating transactions, who was mining transactions. I thought that would be an interesting thing to look at. As it turned out, it wasn't that interesting. But the first visualization I did was just a globe geo ip ing the locations of the participating nodes in the network, the nodes that form the backbone of the network. So that was a web crawler that just went round and found the addresses of the computers that we could connect to. Typically that was somewhere between 6000, it's about 10,000 nodes now which form the, the backbone. But there's a lot more computers participating which are behind NATS or not accessible. They can't make incoming connections. So it was just a way to show the distribution of the nodes. The only interesting thing that really shook out of that was how difficult it was to see the concentration of nodes in China. Obviously they're behind the great firewall of China. So we had this big black spot on the globe compared to really hot spots on the east and west coast of the United States and central Europe.
Moritz StefanerYeah. And probably have a lot of just correlation with where general Internet usage and infrastructure is. Right. And so I can imagine it's kind of difficult to find some new insights just from that.
Dan McGinnYeah, yeah, but we knew most of the activity was happening in China at the time. We could just see that on the exchanges. But it was surprising how few computers were actually forming the kind of public service infrastructure that forms the backbone of the network.
Moritz StefanerSo would people use VPN's then, or is it just inside China that the traffic is happening?
Dan McGinnIt's the fact that they don't accept incoming connections over the great firewall of China. So it's all filtered at the geographic boundary. So then when I started looking at the data that was bouncing around in the network, which is predominantly transaction messages, that's how these write permissions are granted on the database and how they're transferred to people. So I started looking at the transactions bouncing back. It took me a while to figure that was actually a transaction network. It was a graph. And as soon as I made that leap, then I figured that would be a great way to visualize the transaction data because you'd be able to see the structure of each transactions and also how they're associated with each other, like.
Moritz StefanerWho is sending how much money to whom? And are there some people who send a lot of money to different people, or is it always the same person? Stuff like this?
Dan McGinnYeah, of course, it's kind of a pseudonymous facility, so it's not necessarily whom, but you're able to see repeated patterns of behavior which you can then infer are the same. So I thought that would be pretty cool and a pretty good way to manifest, to physically manifest the 200 bytes of data for people to understand what was going on and how active the system was and how value was effectively transferred.
Enrico BertiniYeah. Maybe we should mention here that every transaction in the blockchain has a number of inputs and can also have a number of outputs, right?
Dan McGinnYes, each transaction has inputs and outputs. You can think of the outputs as sockets. They're open sockets, waiting for people to connect you when you spend. And you spend them by creating some inputs where you cryptographically prove that you own the previous outputs. And in that way that allows you to spend your bitcoins. So you have this never ending chain of spends from output to input, and that forms the transaction graph and quite an interesting and complex network full of evolving and different behaviors.
Moritz StefanerAnd what are some of the patterns you found, or what was the overall shape of the network? Can you tell us a bit about what you learned from looking at the visualizations?
Dan McGinnSure. So it was pretty boring to start with. We could start to see some interesting things, how the different wallets that people were using, you could start. Traditionally, bitcoins are indivisible. You destroy some bitcoins and create new ones with each transaction. So if you're not spending the full amount of the bitcoins, then you need to receive some change back. And you create that change, and you typically take that back into the same address, but then obviously that becomes, it reduces your anonymization and hierarchical deterministic wallets, as they were known, became created where you would create a new address for each transaction. And then you see this small change in user behavior as people switch from a regular wallet to an hd wallet. That's not very interesting. What we started to see, as I was still working on it, we started to see these incredibly connected and regular patterns forming on the screens as we were watching them. To start with, I thought it was a coding error. I thought some recursive link that I was creating, but turns out that wasn't the case. Effectively, what we were watching was a denial of service attack, and that was a whole series of algorithmically generated transactions with a very regular and high frequency repeating pattern, all trading amongst themselves with very isolated components in the graph, simply to fill up data space on the blockchain to effectively force up against a hard 1 mb limit that was being coded into the bitcoin protocol for some time. That was in the context of the time, there was a big debate about the scalability of bitcoin. And is it really a payment system, and can it accommodate a similar number of transactions as a regular payment processor like Visa or Mastercard? And there's two schools of thought. One's that it was, one that it wasn't, and if it was a regular payment service, then we would definitely need to increase the 1 mb limit, because that forces is a restriction on the transaction rate to about three or four transactions a second. And that was nowhere going to scale unless that 1 mb limit was raised. At the time, in 2014, we were nowhere near that limit. But someone took it upon themselves to attack the network with very small amounts of bitcoin. These attacks were very cheap. They were only eight to $10 to fill up a block full of data to press upon that 1 mb limit.
Moritz StefanerSo it's basically about us gaming this whole system and playing a bit with it as a. Yeah. Like, what can you do with the whole transaction system that is set in place and all the. Yeah. Can, can you force it to behave in a certain way or not? Yeah. In the paper. So there's a journal article we will, of course, link from the blog post. There's a few other patterns. Like there's, for instance, like a lattice like structure, or like a very. Like a fabric almost, of nodes and edges, which seems if you look at it and you immediately see it's somehow constructed. Right. It's not an organic emerging thing. Right. Like somebody built that.
Dan McGinnThat's right. And that's what's, I think the useful thing about it is even you don't have to be an expert in bitcoin to know that there's anomalous behavior going on here. You can see it happen immediately and in real time, and you know that it's worth looking at I mean, the whole thing grows organically, like a petridish anyway as you watch it. But then when you see these worm structures starting, you're able to see them. You know, they grow really quickly, they're very regular, and you can even start to parameterize the algorithm that's underlying them. You may not know who's creating this denial of service attack, but you're able to see the basics of the algorithm that they're using to generate the transactions. And then you're able to see when the attacks stop, or when there are multiple attacks occurring at the same time with two different algorithms, or when they go to bed in New York time and come back in New York morning.
Moritz StefanerYeah, it's interesting. It's like a little detective game you can play, right? So you see something interesting and you're like, okay, what could it be like? Why would somebody do that? And who is it with which motivations? It's quite fascinating.
Enrico BertiniYeah.
The bitcoin denial of service AI generated chapter summary:
When you know that all the data is public and you can see the data, then you start to see exactly the problems of anonymity that bitcoin doesn't guarantee. Can observing these behaviors help figuring out ways to mitigate or even solve these problems?
Dan McGinnAnd it's always been described as pseudonymous, and it is because your identity is mashed by a simple token. But when you know that all the data is public and you can see the data and you can see the relationships between the data, then you start to see exactly the problems of anonymity that bitcoin doesn't guarantee.
Moritz StefanerAnd once one address is de anonymized, you can go back all the way and see all the transactions. Of course. So it's sort of all the way.
Dan McGinnBack to the beginning of 2009.
Enrico BertiniExactly, yeah. One thing I like in the paper is that you have these images with like this one initial parasitic warm transaction rate attack. And I like the way, because in the visualization, the way that the network configures itself in a sort of warm. Right. So I was, I was wondering if you expected to see these kind of structures at the beginning, or it just happened to be so visually salient. Right. It really communicates that there's something going on there. Right. So in the normal, so when nothing special is happening, you just see a kind of like traditional graph with some clusters here and there. But when this attack is happening, you see this elongated warming shapes, right? Do you expect to see these kind of shapes? Or you just, at some point you were looking at the screen like, whoa, oh, my God, what is going on here?
Dan McGinnIt was a question of luck and right place, right time, really. It was definitely not expected. I mean, I knew there was algorithmic behavior. The main point of, or the only thing, one of the only things you could do with bitcoin at the time was to interact with Satoshi dice, one of the gambling websites. And that was just an algorithm that would give you a return or not. So I expected to see some algorithmic behavior, and I expected to see that forming different patterns to regular human activity. But then when this denial of service attack started, we were quite lucky that we were able to see the attack evolve. We started to see when the algorithm was operational, when it wasn't, when the algorithm had been tweaked to become more pernicious, and when other actors came in to join the attack on the network using a different algorithm because we had the parasitic worm structure that we started seeing at the beginning. And then you could see an entirely different algorithm come in later on, which we call the cancerous tumor because it was much more dense and much more pernicious than the worm attack because it just took up so much space on the blockchain in terms of the data capacity that it could tolerate that it was very obviously a different algorithm. And you've got to infer that that's a different actor.
Enrico BertiniSo I'm curious, does observing these behaviors, can it actually help figuring out ways to mitigate or even solve these problems, or you can only observe them?
Dan McGinnI think it can. I mean, since these attacks, the protocols or miners at least have enacted some heuristics which prevent these structures from forming. Now they just limit the number of transactions that happen in quick succession and be linked together with such high degree. So what it does do is it allows you to start to filter on the, and reverse engineer how these algorithms were created and create the filters to fight against them, which is in some way, it's not in the spirit bitcoin, because you're censoring a user's participation in the system. But for the survival of bitcoin, it's pretty necessary.
Enrico BertiniSure, sure. So maybe one thing that we didn't explain yet, and I think we should briefly mention it, is that this visualization is a real time. Visualization is animated, right?
The 64-Screen Bitcoin Visualization AI generated chapter summary:
Bitcoin visualization is a real time. Visualization is animated. All these visualizations are displayed in a large display wall. The main motivation behind it was a space where people could collaborate and work together. One of the challenges with large screens is how do you interact with them?
Enrico BertiniSure, sure. So maybe one thing that we didn't explain yet, and I think we should briefly mention it, is that this visualization is a real time. Visualization is animated, right?
Dan McGinnYes.
Enrico BertiniAnd you are basically visualizing what is called the mempool. Maybe you want to briefly describe that and explain what it is. So you're not visualizing the whole transaction history, right. You're only visualizing what is happening in the last few. What is that, hours or so? Yeah. What is the time window there on the mempool?
Dan McGinnIt's minutes. Minutes, just because of the transaction rate that we're seeing. We're seeing three or four transactions a second. So it very quickly gets up to two or 3000 transactions. On the visualization, the mempool visualization is real time and live. So you can start seeing these patterns forming in real time, and you start to see the real time associations between transactions. But then we also started to apply the same visualization to the historic data across the blockchain. So now you can look at any block in time again, looking back all the way in history to 2009 and seeing how within a particular block, these patterns form.
Enrico BertiniYeah. And so all these visualizations are displayed in a large display wall. Can you briefly describe the configuration, how this looks like?
Dan McGinnYeah, so this was all in a box. When I started this project. I knew we needed some content for this 64 screen visualization wall. It's 64 screens.
Moritz StefanerIt's a nice problem to have, like.
Enrico BertiniA kid with an new toy.
Moritz StefanerExactly. Like what with these 61 screens?
Dan McGinnBitcoin, of course.
Moritz StefanerYeah.
Dan McGinnSo it's 16 columns of four screens arranged in an arc, almost an all encompassing arc. It's 313 degrees around, and that gives us 130 megapixels of screen real estate to look at the big datasets and presentations all in one place. The main motivation behind it was such that it was a space where people could enter and collaborate and work together. Rather than huddling around a screen or zooming in and zooming out of data, we could just throw it up there on the 130 million pixels and walk around it and explore it in a team fashion.
Moritz StefanerLooks really gorgeous. Like the whole space is very envious.
Dan McGinnYeah, we supplement it with some good sound. So we've got some data sonification projects going on. We've got several Kinect motion sensors around, so we can have physical interaction with the data on the screens. And we've even got a portable EEG that we use to explore how people's brains react to different stimuli, presentations of their data.
Enrico BertiniCan you briefly describe how you interact with it? Because one thing I always find really complicated with large screens is how do you interact with them? Because if it's a touchscreen, you need to get close, but you can no longer see everything because you are too close, right?
Dan McGinnThat's right.
Enrico BertiniBut if you are far now, you have to use some awkward kind of interaction method. Some people use iPads for indirectly, I think some people have a reproduction of the screen on the iPad so that you interact with the iPad, or some people use controllers. So what do you use there?
Dan McGinnYeah, you're right. I mean, I wasn't involved in the design, but I know touchscreens were explored as an option and it was decided against just because simply there's too much space to touch yeah.
Enrico BertiniAnd you can see when you are too close, you can see everything.
Dan McGinnExactly, yeah. So this particular visualization, it had a tablet app which would allow you to pull data off the screen and send data back to the screen and apply filters to the screen. So you could filter by address or transaction size, and then you could start to filter out some of the noise from the signal that you were interested in. To be honest, it wasn't very successful. It was very finickety to use and difficult to interact with. So now it's pretty much, I mean, it's dynamic, but it's not very interactive. Yeah.
Enrico BertiniI'm not surprised. I think that's a problem that many people have. It's one of the unsolved problems, I think it's hard.
Moritz StefanerYeah. And you have to really have to take the exact measurements and scales into account when you design for these things. And interaction is tough, but as you say, a lot of the typical interaction, like zooming and overview and detail and so on, can be solved by just moving around and using active perception, basically. Right.
Dan McGinnYeah. That's why we where, I mean, we are having some success with the Kinect motion sensors and getting people, because it's a fun way to interact with the data and it's now pretty accurate for the gross moves like zooming into the data or scrolling the data. People love acting as if they're in a movie, but, yeah. If you're constantly referring to a tablet, then it kind of defeats the point of having 64 screens to be looking at.
Enrico BertiniYeah.
Moritz StefanerInteresting. Yeah. And so, yeah, so you have this running there, and now you're monitoring the blockchain, the bitcoin space. What's next for you? Or what were the biggest, let's say, unsolved challenges or which types of things? Like. Yeah. What's most interesting to you right now in that space? Can you tell us a bit?
Bitcoin Cash: Data Visualization AI generated chapter summary:
This visualization is directly applicable to other cryptocurrencies. Nobody can see what is actually happening in such a cryptocurrency system unless you visualize it. There could even be a meta visualization task, or visualization analyzing all these cryptocurrencies. If any of our listeners wants to start playing with this data, what would be the easiest way?
Moritz StefanerInteresting. Yeah. And so, yeah, so you have this running there, and now you're monitoring the blockchain, the bitcoin space. What's next for you? Or what were the biggest, let's say, unsolved challenges or which types of things? Like. Yeah. What's most interesting to you right now in that space? Can you tell us a bit?
Dan McGinnThis was a bottom up approach. It was quite speculative in what we'd find in this visualization. And I think it's a very important function for visualization itself. It can be one of the first stages is to just throw some kind of visualization at it and see what information is in your data and what is interesting to look at. So now we can take a step back and we can go do some numerical heavy lifting and apply some machine learning to find the patterns and the types of patterns that we know are there and we know are interesting to research more. And then hopefully we'll come back to visualization at the end to present the results once we've done that intermediate step of doing the maths on the dataset.
Moritz StefanerThat's a very interesting approach to first eyeball a few interesting structures and then say to the machine, okay, can you find more of these worms, or can you find more of these interesting chains or something like this? Yeah, that's interesting.
Dan McGinnYeah. We had no idea which features would be most important. You'd suspect that the reuse of the same address would be the most important feature to get interesting behaviors and associate them together. But it turns out that it's not. It's actually for algorithmic behavior, it's just the degree of the transaction, because no one codes their algorithms to be stealthy.
Enrico BertiniAnd I think you mentioned to me offline that you are also trying to look into other cryptocurrencies, is that correct?
Dan McGinnYes. So this visualization is directly applicable to other cryptocurrencies, like ethereum is one we've looked at. I mean, there's been some research recently that suspects 50% to 60% of all ethereum transactions are related or controlled by a single actor. And we can see that when we just directly apply this same graph visualization to the Ethereum blockchain. And that's a real thing. It's like the cancerous tumor that we saw in the denial of service attack. It's just transactions in a very isolated island with few connections to the real transactions going on outside of it.
Moritz StefanerAnd there's a whole zoo by now of cryptocurrency. So there could even be a meta visualization task, or visualization analyzing all these cryptocurrencies and comparing them against each other, because there's like millions of them by now. I feel.
Dan McGinnWe'Ve done a bit more work on bitcoin because this visualization is limited to just looking at the transactions as they come in and the blocks in history. But we've taken a zoom out and started to abstract the data so that we can now actually take a visual look at the entire blockchain, which is, it's like 130gb now, and we can look at the associations between blocks simply as an adjacency matrix. It's quite a difficult job to get that data out, but once it's out, you start to see more interesting patterns of association and periods of time where curious things are going on.
Moritz StefanerYeah, I think it's such a great case where data visualization is absolutely needed because you have this contract system and the rule system and then people use it, but you never see really what's going on. It's just, just messages being exchanged between computers. But nobody can see what is actually happening in such a cryptocurrency system unless you visualize it. And as you say, you wouldn't even know what to look for. If you were just to look for specific patterns, you might have some guesses of, okay, let's look for a very asymmetric or very symmetric or very regular or very irregular transactions or something like this. But just to get a sense of what you might find, you need to visualize. So I think it's a great case for visualization.
Dan McGinnAnd because the data is so clean.
Moritz StefanerIt's perfectly machine readable.
Dan McGinnEverything's protocol conformance, it's perfectly machine readable, no sanitization required, and you can see the anomalous activities immediately.
Moritz StefanerYeah, it's very hard facts like no dispute about what's in the blockchain. It's a great case for visualization. And I also played a bit with the blockchain dot in for API, and it's quite easy to get a live feed of all the unconfirmed transactions happening right now, or getting the last block or block number x. It's very easy. And so I can only encourage people to play a bit with this data source because it's just fascinating to dip a toe into that data stream and just speculate a bit about what's going on there. It's quite interesting.
Enrico BertiniYeah, that's what I was about to ask then. So if any of our listeners wants to start, start playing with this data, what would be the easiest way to start?
Dan McGinnWell, you're right, there are a ton of free data providers out there. Bitcoin dot in fo are our friends. And that's how I started assimilating this data together. But I would recommend that it's worthwhile digging into the actual protocol specification. Obviously, these data providers parse the data themselves and they add metadata to it. Quite interesting to see on a byte by byte level exactly what's going on with the data. But there are tools, these kind of tools are lacking for the other cryptocurrencies. We've only started looking at Ethereum recently, but Zcash, which is a lot more anonymous, if not perfectly anonymous, than bitcoin. It'd be an interesting case just to see what's going on there and see what algorithmic behavior is going to be obvious.
Enrico BertiniOkay, well, thanks so much, that's so fascinating. I'm really glad that somebody decided to visualize the blockchain. I'm blown away by the old system and by these visualizations. I really encourage the listeners to take a look at our blog post and see the images because they're stunning. And also, I think we're going to post a link to, you also have a video that shows how this real time visualization works and how it looks like in your, I think you call it data observatory, this whole system of big screens.
Dan McGinnThat's right.
Enrico BertiniAnd, yeah, it's really cool. It's a great example of really, really interesting and useful visualization. Thanks so much, Dan, for coming on the show.
Dan McGinnYou're welcome. Thank you for inviting me.
Moritz StefanerThank you.
Enrico BertiniThank you. Bye bye.
Moritz StefanerCheers.
Dan McGinnBye bye.
Moritz StefanerData stories is brought to you by Qlik. Are you missing out on meaningful relationships hidden in your data? Unlock the whole story with Qlik sense through personalized visualizations and dynamic dashboards, which you can download for free at Qlik Datastories. That's Qlik deries.