Episodes
Audio
Chapters (AI generated)
Speakers
Transcript
The VAST Challenge: Visual Analytics Competitions with Synthetic Benchmark Data Sets
Enrico: How was IO? IO was fantastic. Have heard fantastic things about your workshops. We worked on swarm intelligence and collective intelligence things. Summer is coming.
Enrico BertiniHi, everyone. Data stories number 24, if I'm correct. Hi, Moritz. How are you?
Moritz StefanerI'm good. How are you, Enrico?
Enrico BertiniGood, good. Everything is fine and more relaxed than usual. Summer is coming.
Moritz StefanerYou recover from my visit in New York?
Enrico BertiniDid you recover from your visit? Yeah. You had to deal with a few kids in my house, right?
Moritz StefanerYeah, it was nice.
Enrico BertiniIt was nice. How was IO?
Moritz StefanerIO was fantastic. Again, it's just my favorite conference. It's always so heartwarming and inspiring to be there. So I had a blast.
Enrico BertiniYeah. Have heard fantastic things about your workshops. Insect smarts.
Moritz StefanerInsect smarts, that's right. So we talked about. We worked on swarm intelligence and collective intelligence things. Yeah, it was nice. I have a few ideas of how to improve it for the next time, but it was the first time and I'm happy.
Enrico BertiniOkay, cool, cool, cool. Okay, so no more trips in the summer or what?
President Trump on his trips to Rio AI generated chapter summary:
Last week I was in Rio for a tour. That was amazing. Amazing place. Now I want to go back there in winter. Nothing big lined up right now.
Enrico BertiniOkay, cool, cool, cool. Okay, so no more trips in the summer or what?
Moritz StefanerYeah, I mean, you know, I'm always. But nothing big lined up right now.
Enrico BertiniWell, last week I was in Rio for a tour.
Moritz StefanerFantastic.
Enrico BertiniThat was amazing. Amazing place.
Moritz StefanerThey never invited me to Rio.
Enrico BertiniWell, you know, it was fantastic. Really, really fantastic. I spent five, five or six days there. It was amazing. Amazing place. Now I want to go back there in winter, you said. Well, okay, let's start. So we have another special episode. Today we are going to talk about the vast challenge, which is a very special challenge and we will tell you why it is special. It is organized every year at the this conference, formerly known as Visweek Conference. And in order to discuss about that, we invited two special guests. We have Professor George Greenstein from University of Massachusetts Lowell. Hi George.
The VINTAGE Challenge AI generated chapter summary:
Today we are going to talk about the vast challenge. It is organized every year at the Visweek Conference. We have Professor George Greenstein from UMass Lowell and Celeste Paul from the National Security Agency. They will discuss their research on visual analytics.
Enrico BertiniWell, you know, it was fantastic. Really, really fantastic. I spent five, five or six days there. It was amazing. Amazing place. Now I want to go back there in winter, you said. Well, okay, let's start. So we have another special episode. Today we are going to talk about the vast challenge, which is a very special challenge and we will tell you why it is special. It is organized every year at the this conference, formerly known as Visweek Conference. And in order to discuss about that, we invited two special guests. We have Professor George Greenstein from University of Massachusetts Lowell. Hi George.
Georges GrinsteinHi Enrico.
Enrico BertiniAnd we have Celeste Paul from National Security Agency. Hi Celeste, how are you?
Celste PaulHello. I'm good.
Enrico BertiniSo what we normally do, we let our guests introduce themselves. So maybe. George, you want to start? And then Celeste.
Georges GrinsteinSure. I'm a professor of computer science at UMass Lowell and I head the Institute for Visualization and Perception Research. I got my PhD in mathematics from University of Rochester. Most of my work centers around visualization, whether it's perceptual and cognitive foundations, high dimensional data theory and applications. For the last eight years I've co chaired the IEEE vast challenges and contest in visual analytics. I've taught radical design, of course on how to develop radical new products instead of evolutionary one. I'm a member of the Homeland Security Center Cicada, and I direct the development of Weave which is an open source, web based interactive collaborative visual analytics system which incorporates lots of modern research.
Enrico BertiniI guess you could go on for another 30 minutes, George, perhaps.
Georges GrinsteinPerhaps.
Enrico BertiniI'm so happy to have an academic again on the show. It's been a while more.
Georges GrinsteinThat's true.
Enrico BertiniThat's true.
Moritz StefanerNo, that sounds really exciting of what you're working on. I need to check that out.
Enrico BertiniYeah. And Celeste.
Celste PaulHi, my name is Celeste Lynn Paul, and I'm a computer scientist at the National Security Agency and the Research directorate. And I do research on visual analytics with a focus in human centered computing and cybersecurity. We are a funder for the vast challenge. I'm also about four weeks away from defending my dissertation at the University of Maryland, Baltimore county. So.
Georges GrinsteinAwesome.
Enrico BertiniAwesome. Sounds great. And I guess it's a thesis on visual analytics, I guess, right?
Celste PaulActually, no, it's more focused on human centered computing. It's actually on interruption management and with visual interruptions. So it is visualization in a sense, not analytics.
Enrico BertiniOkay. But it's very much related. Yes. Fantastic. So we have a whole display of academics today. Fantastic. I'm excited. Okay, so we want to talk about the vast challenge, because the vast challenge is so, there are many contests and challenges around, and I have myself always been championing for the vast challenge because to me, it looks like, kind of like a little different than. Than the usual stuff we see around. And I think the main difference is that it's mainly organized by academics. And as far as I know, it's the only contest or challenge in this area that has been organized by academics. And there are also some other details that make it really interesting to me, especially the fact that the organizer spends quite a lot of time thinking about how to make the vast challenges, how to evaluate the entries that are submitted to the vast challenge, which is not easy at all. So I would actually like to start with George. George, can you give us a little bit of perspective on how the vast challenge was born and how it developed in time, why you organized it in the first place? So a little bit of an historical perspective.
The Vast Challenge AI generated chapter summary:
The vast challenge started in 2006. It's the only contest or challenge in this area that has been organized by academics. The goal is to embed ground truth in synthetic or quasi real data sets. The aim is to measure the capabilities of visual systems, as well as analysis.
Enrico BertiniOkay. But it's very much related. Yes. Fantastic. So we have a whole display of academics today. Fantastic. I'm excited. Okay, so we want to talk about the vast challenge, because the vast challenge is so, there are many contests and challenges around, and I have myself always been championing for the vast challenge because to me, it looks like, kind of like a little different than. Than the usual stuff we see around. And I think the main difference is that it's mainly organized by academics. And as far as I know, it's the only contest or challenge in this area that has been organized by academics. And there are also some other details that make it really interesting to me, especially the fact that the organizer spends quite a lot of time thinking about how to make the vast challenges, how to evaluate the entries that are submitted to the vast challenge, which is not easy at all. So I would actually like to start with George. George, can you give us a little bit of perspective on how the vast challenge was born and how it developed in time, why you organized it in the first place? So a little bit of an historical perspective.
Georges GrinsteinSure. It actually started in 2006. And prior to that, there had been other the challenges. I ran network intrusion, one about a decade before. The infovis community had a challenge at that time that I was co chairing with Catherine Plaison and Jean Danyel Fecate, and the KDD cup. The knowledge discovery and data mining cup was also running at the time, but there was no theme in the evolution of these in general, except to build a challenge test. And there were a lot of issues about the fact that we needed a broader analytical perspective on these problems. We wanted to somehow be able to measure eventually the capabilities of visual systems, as well as analysis, as well as insight into the problems. So Jim Thomas was the original founder and program manager at, and he gave basically instructions to the team. The data has to be heterogeneous. Ideally, the problem and data should be difficult for a team to solve without tools, so that somebody couldn't read the data and sort of say, ah, I come up with a solution and it really ought to involve the human in the loop, human analysis aided by tools. So that sort of says that, aha, we have to start having visual analytics. And how could you make an evaluation without ground truth? So the goal is to embed ground truth in these synthetic or quasi real data sets. And that's how it started, and we evolved a great deal. PNNL were the initial developers of the first data sets. Then UMass Lowell got involved, and for these last couple years, PNNL became again with the various funders, NSA and others, enrichers of the ground truth within larger and more complex data sets.
The VINTAGE Challenge AI generated chapter summary:
The vast challenge is searching for having this ground truth. We generated one of the first data sets with text as part of our first challenge. Now, when we have text data, there's no such need for us to generate intermediary datasets. You can see the rapid evolution of technology.
Enrico BertiniOkay, so, yeah, this is exactly what I wanted to mention. It looks to me that the special thing about the vast challenge is searching for having this ground truth. Right. I don't know what's the situation, how you evaluate the entries right now, but I think I remember from the very beginning that this was the main goal of the challenge, was having a way to evaluate the entries in relationship to how much of the ground truth is covered by the entry. Right? Is that correct?
Georges GrinsteinThat's part of it. The other part was to select problems that are difficult and that we would like the community to be aware of. So we were one of the first data sets. We generated one of the first data sets with text as part of our first challenge, where people had to analyze news reports, voter registration, phone call logs. And at the time, I remember that many people didn't even know how to tokenize data, so we actually generated a tokenized database for people who didn't know how to do that. Now, when we have text data, there's no such need for us to generate intermediary datasets because everyone has those capabilities. So you can see the rapid evolution of technology, and I think in some fashion, the vast challenge has contributed to making people aware of that.
Enrico BertiniSure, sure. Another thing I was wondering, did you take inspiration from some other kind of, like computer science based challenges, like the KDD cup or similar kind of challenges? Because so what I. I was always frustrated by the fact that for other disciplines in computer science, it looks to me that it's somewhat easier to define what's the metric there, and also to calculate the metric itself. So if you have some sort of analytical algorithm that has to have some kind of predictive power, then you can very well, you can establish a couple of measures and compare the entries in a very systematic way. But in visualization, or visual analytics is much, much harder because the output is stuff that is coming out of the brain of people. Right. And everything gets much more complicated. So I was wondering, did you, did you, what's the initial motivation of the vast challenge doing, organizing the vast challenge in this format? Did you get inspiration from this kind of other challenges, like the KDD cup or similar?
The VIBM Challenge AI generated chapter summary:
PNNL takes inspiration from other computer science based challenges like the KDD cup. Can you estimate how much time goes into constructing these datasets? I could imagine must be like hundreds of hours, right?
Enrico BertiniSure, sure. Another thing I was wondering, did you take inspiration from some other kind of, like computer science based challenges, like the KDD cup or similar kind of challenges? Because so what I. I was always frustrated by the fact that for other disciplines in computer science, it looks to me that it's somewhat easier to define what's the metric there, and also to calculate the metric itself. So if you have some sort of analytical algorithm that has to have some kind of predictive power, then you can very well, you can establish a couple of measures and compare the entries in a very systematic way. But in visualization, or visual analytics is much, much harder because the output is stuff that is coming out of the brain of people. Right. And everything gets much more complicated. So I was wondering, did you, did you, what's the initial motivation of the vast challenge doing, organizing the vast challenge in this format? Did you get inspiration from this kind of other challenges, like the KDD cup or similar?
Georges GrinsteinYes, there was trek the text information retrieval one and so on. There were many. The difference is that these were specific problems that could be solved either algorithmically or visually. I think what we did is we made a drastic move to identify a scenario, and that is very different. So we spent months ahead of time identifying what the scenario is and building a data set that matches, or data sets that matches that scenario so that the problem looks real. If you look at every single challenge that's been out there, there's a story, and that's a very different aspect. So we initially identify what is the target research problem that we want to focus on. And sometimes we offer one or two or four, and then we combine a scenario that connects them so that you can solve a particular problem or combine it with different aspects, one of which might have video, another one text, and another one logs from network intrusions, for example.
Enrico BertiniOkay. Okay.
Moritz StefanerIt's very interesting. It's almost like writing a movie script or something, like coming up with a whole situation, and then.
Georges GrinsteinThat's correct.
Moritz StefanerThe data, that's what we really do.
Celste PaulSit around a table and talk about the story before we think about the data at all.
Moritz StefanerDo you have rooms, like, covered with stickies and lines between them, like the mad scientists we did here?
Georges GrinsteinYes. And I think. And I think PNNL also does at times.
Moritz StefanerI can imagine, yeah. Can you estimate how much time goes into constructing these datasets? I could imagine must be like hundreds of hours, right?
Georges GrinsteinOh, I would suspect even much, much more than that.
Celste PaulOh, yeah. So, for example, with this year's mini challenge three, it is two weeks worth of data. And so we generated that data in real time. So the final data set took at least two weeks to generate. But then we also did several test runs, and sometimes the test runs were only a couple hours or a couple days, but then we were getting into, you know, we need to generate the entire week, two weeks, because we were running into problems where a couple hours or a couple days of data generation, that's fine, that's normal. Two weeks worth of data generation for the detailed data that we were trying to create, we ran into a lot of problems. And so there are many, many, many iterations of us testing our tools, testing data sets, doing test runs, in addition to all of the data structure, planning that we had, setting up the simulations and that sort of thing, generating the data has definitely been an interesting challenge. We choose these challenges because we want to test the research community and give them something that is meaningful and is a little bit further out than what we're comfortable with. But in order to generate data sets that we can actually generate reasonably and have some set of ground truth in, we need tools to be able to verify our data, but those tools don't exist yet. And so it's a lot of iteration, it's a lot of manual analysis to check to make sure that the data is what we think is, and then we can give it to people.
Georges GrinsteinAnd let me add that in some cases, like one of the virus mutations that occurred in the past, and one of the challenges, we generate the data set, and then we test it out. Just like Celeste pointed out, we find flaws, so we have to regenerate. And then if the mini challenges are integrated into a larger story, you often have to change it because all of a sudden, it doesn't connect with the previous one or the one that's talking about text. And all of a sudden, we have to do modifications and regeneration. So it's an ongoing process all the time.
Celste PaulYeah, these are very complex data sets. We're not just generating one type of log with one type of events. We're trying to coordinate sometimes four or five different data sets so that they match up together in addition to, you know, regular noise and reality. So that these aren't purely computer generated data sets, because if you have a too artificial of a data set, it's not really a challenge anymore. But we can't. We need to. It's been such a challenge, especially with this last challenge. So last year, people just asked where.
Moritz StefanerWe were setting up the thing.
Georges GrinsteinWell, yeah, we have an extremely humorous story to tell about the tweeter data that we had last year. We were generating flu data. And was it last year or the year?
Celste PaulIt was 2011.
Georges GrinsteinYeah. So it was two years ago. And we took tweet data to try to analyze flu, you know, pick ones that had flu in it and so on. And, my God, they had to be sanitized. And so we wrote programs to clean the data up. And even after we had run all those programs, I had to divide my whole lab and give everyone 1000 of them to go by hand, because the, the programs that we had written just didn't at all clean them completely. And another time we used text from old newspapers and we had to go through and change dates and names, and we all, of course, made errors. One of the first data sets we generated, we had voter registrations. And without realizing it, we had forgotten that we had one individual register newly every day. And so the dates for registrations were consecutive.
Enrico BertiniIt's amazing.
Georges GrinsteinI mean, it's just, you can't cover every single thing. It's very, very interesting.
Enrico BertiniSo I guess that the tools that are available around just cannot support this kind of deep analysis. Right.
Georges GrinsteinIn general, they're not quite there. If they were, then we'd be able to generate data. Also by reversing those tools, reverse engineering them. In general, it's much, much harder. Think about video recognition, automatic video recognition. The tools are there in terms of video minute. Well, were in the past there in terms of minimal. But any complex problem required people to still have the human in the loop to be able to solve problems. And I think we still believe that complex problems require humans.
Enrico BertiniSure, sure. So can you, can you clarify to us what, what do you mean exactly when you say generate the data? So are you starting from, from some real data sets and manipulating them, or it's all synthetic data?
Network Security: The quality of the data AI generated chapter summary:
George: Are you starting from, from some real data sets and manipulating them, or it's all synthetic data? Both. We do set up virtual networks to generate the data. The data sets we generate will be used for many, many years by classes and other researchers and companies.
Enrico BertiniSure, sure. So can you, can you clarify to us what, what do you mean exactly when you say generate the data? So are you starting from, from some real data sets and manipulating them, or it's all synthetic data?
Georges GrinsteinBoth.
Enrico BertiniBoth, yeah.
Celste PaulIt's a hybrid approach. So I can give an example.
Enrico BertiniYeah, I think an example of security.
Celste PaulData that we're using this year and for the past few years. So it's all synthetic data in the sense that it's not coming from the real world. We generate every bit. However, we do set up virtual networks to generate the data. So there are actual machines, virtually like a cloud cluster, going out and running services and doing things. So for example, if we have a virus propagating through a network, the virus is actually moving through the virtual network and infecting machines and generating certain types of data. And then we have sensors on the network that are typical commercial tools that are picking up the packets and picking up the logs. And that's what we provide to participants. We don't manually insert data, we don't generically create records and then, you know, put them in the data. This is all part of a live network. It's just that we create the conditions for the network.
Enrico BertiniOkay, so you basically simulate it, right?
Celste PaulYes, yes.
Enrico BertiniAnd is that true for every other mini challenge or. No, no. Okay. Okay. Do I remember correctly, George, that you also had one challenge included some video analysis, as you mentioned before?
Georges GrinsteinYes.
Enrico BertiniDid you want to go around with a camera and record yourself or stuff like that?
Georges GrinsteinThat one was actually quite interesting. That was a camera outside a building that was called an embassy, rotating regularly and just collecting hours and hours of data. And we actually had two individuals in there walk and meet and exchange briefcases. And that was the anomalous event that we wanted people to identify automatically. Okay. So there was the, in this particular case, an injection of real data, the two people walking exchange within non synthetic, because it was real video that was being done. So that's what I meant by in some cases, there's purely real data where you embed something all the way to purely synthetic, where you simulate and model in a situation occurring.
Enrico BertiniOkay, okay.
Georges GrinsteinWhich, by the way, if you think about it, makes the evaluation process much, much more complex, because you could find something that we did not intend to be discovered that might be naturally in the data.
Celste PaulYes. This is especially true in the network simulation data, because they're actual virtual computers on the network. We could have induced a event that we didn't intend, and if people find it, that's valid and they get credit for that in our scoring.
Enrico BertiniSure, sure. So all of a sudden, it came into my mind that the same thing could actually be done by. Well, probably it's very hard, but it could be done by everyone, right? I mean, if I think of trying to do the same thing as you did, once I know the rules, I could generate my own data set and post it online and create my own challenge, right?
Georges GrinsteinThat's correct. And make sure you have a whole year plan for your time.
Celste PaulIf it were simple, we'd have lots of high quality data sets to do research with.
Georges GrinsteinI mean, I can tell you that we took one mini challenge in 2011, and we had a lot of tools because we had been working with PNNL for a long time, and we built our own tools, and it took easily five to six months of three students of mine to work in great detail, making sure the data set is correct and so on. It's just a really large effort if you want quality. And the data sets we generate, we expect that they will be used for many, many, many years by classes and other researchers and companies to do some good, interesting evaluations.
Celste PaulYes. I mean, that brings up a great contribution of the challenge outside the, you know, the single year cycle, where we provide an interesting, relevant problem to the research community. These are very high quality data sets that many of them have already shown up in publications not related to the challenge.
Enrico BertiniOkay, well, that sounds really benchmark data.
Moritz StefanerSets where you know what you can pull out of them and you can really compare well, what different tools deliver, right?
Georges GrinsteinThat's correct. And in fact, that's what we call them, the benchmark. The vast benchmark data sets.
Enrico BertiniOkay, so are all the data sets available somewhere online?
Georges GrinsteinYes.
Enrico BertiniOkay.
Moritz StefanerAnd you could go back, download the past datasets and also see the solutions.
Georges GrinsteinYes. And the people's papers and videos? Everything is available.
Enrico BertiniSo is there a specific link we can mention so people can go there and see how it looks like?
Georges GrinsteinYes. You're putting me on the spot.
Enrico BertiniOkay, I will add it on the, on the blog post. If you can send me a link later.
Moritz StefanerI think nobody will type in the ULA.
Celste PaulThe current challenges are on the visual, the VA community website, and the older challenges are hosted by University of Maryland.
Georges GrinsteinThe human computer interaction lab. Yes. And if you search for visual analytics benchmarks or vast benchmarks, they all lead you to where they are.
Celste PaulYeah, we can get you the link.
Enrico BertiniOkay. So you can get the data sets and also the solutions if you want.
Georges GrinsteinYes.
Celste PaulThe challenge problems and the solutions and.
Georges GrinsteinEveryone'S entries, including their briefs, their two pager or more, and their videos.
Enrico BertiniOkay, so other than actively participating to the vast challenge, people can just go there and have fun with these data sets, right?
Georges GrinsteinYes.
Enrico BertiniFantastic.
Moritz StefanerCould be nice for teaching as well, you know, if you have a course and just present five different solutions to.
Celste PaulThe same data set class submission. So last year, I forget the professor's name, but he has several. Yes. Oh, okay.
Enrico BertiniI think there are several professors who are using the vast challenge datasets. I used them myself.
Celste PaulYeah, the Universidad de federal in Brazil. I think we had three or four student team submissions from them as well.
Enrico BertiniOkay, and do you also have the entries that people sent in the vast challenge of the last editions on the website? So can people access them as well?
Georges GrinsteinI'm not sure what you mean. In other words, anyone who submitted something, the answer is yes.
The Art of Sudoku AI generated chapter summary:
In other words, anyone who submitted something, the answer is yes. Do you have, for the past editions, do you also have the solutions? Not the solutions, the entries online. So people can go there and also see what the others have done.
Georges GrinsteinI'm not sure what you mean. In other words, anyone who submitted something, the answer is yes.
Enrico BertiniDo you have, for the past editions, do you also have the solutions? Not the solutions, the entries online. I mean the solutions that people sent.
Georges GrinsteinYes.
Enrico BertiniOkay. So people can go there and also see what the others have done.
Georges GrinsteinThat's right. There were some days sometimes where we had 60 or 80 entries and you could actually go there and see them all.
Enrico BertiniOkay, cool. Fantastic. So why don't we talk about the vast challenge 2013 I mean, I think most people will be interested in hearing what they can do there, how to participate, why to participate, or stuff like that. Can you give us a brief introduction to the, to the vast challenge 2013 and what kind of mini challenges you have there this year?
Top Executives: VIBM VISION 2013 AI generated chapter summary:
The vast challenge 2013 has three challenges. One is more predictive, analytic visual analytics, and the other two are cybersecurity related. The challenge has been running for the past six months. There have been close to 100 registrants.
Enrico BertiniOkay, cool. Fantastic. So why don't we talk about the vast challenge 2013 I mean, I think most people will be interested in hearing what they can do there, how to participate, why to participate, or stuff like that. Can you give us a brief introduction to the, to the vast challenge 2013 and what kind of mini challenges you have there this year?
Georges GrinsteinCeleste?
Celste PaulSure. So this year we have three challenges. One is more predictive, analytic visual analytics, and the other two are cybersecurity related. George, do you want to talk about the predictive analytics challenge?
Georges GrinsteinSure. The concept is very simple. You have movies that are coming out every week, and the idea is simply, can you predict which one is going to make the most money, or can you predict how much they're going to make? What can you predict about them? And this is actually an interesting problem because we're running it every two weeks, so people can continually participate. So guess the box office gross of all the new movies. And the data comes from Twitter, Bitly, and the IMDb, the movie database.
Enrico BertiniSo this is not synthetic data. That's real data?
Georges GrinsteinThat's right.
Enrico BertiniOh, cool. So that's new.
Celste PaulYes.
Enrico BertiniYou never did that before?
Georges GrinsteinWe've never done that before.
Moritz StefanerAnd how does it work? So you collect data on these movies for two weeks, and then the data set is out, and then people can, like, quickly analyze it or how does it work?
Georges GrinsteinGo ahead, Celeste.
Celste PaulOh. So the challenge has been running for the past six months, and so in the first round, they provided some sample data or training datasets, but the rest of the data sets have been live challenges, and people have been using each round of the challenge to refine their analytics.
Enrico BertiniOkay.
Moritz StefanerOkay.
Georges GrinsteinIt started in January, so they have.
Enrico BertiniTo predict which movie will be more most successful among those that are coming out that week.
Georges GrinsteinActually, they predict the dollars of all. Yes, the dollars.
Enrico BertiniOkay. This sounds interesting to me because it looks like the kind of stuff that you could approach completely. You can use a only algorithmic approach. Right. Without any, any visual or human interaction. Right.
Moritz StefanerYou could also use external data, like, I don't know, have like ten movie critics and factor in their reviews, stuff like that. Would that be allowed, or is it. Do you have to use the data provided?
Celste PaulI think that the data. You have to use the data provided. You can't use external sources. That's mostly to make it easier for us to judge.
Moritz StefanerSure.
Georges GrinsteinRight. And they've been close to 100 registrants, just to give you a sense of it. Okay.
Enrico BertiniWow.
Georges GrinsteinAnd a lot of student teams.
Enrico BertiniSo are all the participants using a visual analytics approach or some of them are submitting purely algorithmic solutions.
Celste PaulIt's a visual analytics challenge. And so in order to get full points, they have to show how they created or designed visualizations to support the analysis for predicting these things. Because you're right, this is also a very algorithmic problem. And what we wanted to explore was how you could use visual analytics to supplement or even provide additional or better predictive analytics.
Enrico BertiniOkay, so are there people comparing the purely algorithmic approach to the algorithmic approach plus visualization?
Georges GrinsteinI don't think so yet. I don't think that was one of the requests.
Enrico BertiniOkay, that would be interesting. Okay, so we have two other challenges.
The 2013 Network Data Visualization Challenge AI generated chapter summary:
The challenge focuses more on the human part of the visualization problem than the data and the processing part. This is the first year for the challenge. With the way that the challenge is designed, we have no idea how many people might actually participate.
Enrico BertiniOkay, that would be interesting. Okay, so we have two other challenges.
Celste PaulOne is new and one is a continuation of a theme. I'll start with a new one. So visualization tends to focus a lot on the data and the analytics, and then we use visualization in order to reveal what is in the data or whatever the analytics reveal in the data. And so this is a very traditional process whenever it comes to visualization. But it also seems like we've been going down the same path for a long time, and there haven't been too many really innovative or different visualizations that have come out recently. And so we wanted to create a challenge that would inspire people to think differently about the problem from a design perspective and focus more on the human part of the visualization problem than the data and the processing part of the visualization problem. We also wanted to engage more designers and artists who have a lot of creative experience to participate in our community so that we can start matching the technically competent visualization engineers with the highly creative visual artists. And so with the design challenge, we provide a written scenario. It's a two page story that describes the life of a typical network operations manager. And we tell them about some of the problems that she has, and we simply state, we need you to come up with a situation awareness visualization that will help her do her job. The network she works on is very large, it's very complex, and there are many things that happen on it, but we don't tell you how large the network is. We don't tell you anything about the complexity of the network or the events on it. We didn't want to create any type of constraints. We wanted people to go out and think of something crazy and bring it back into the realm of visualization. So it's a very new type of challenge, especially for a very computer science oriented community. But with some other activities that we've done, we've gotten some very interesting results.
Enrico BertiniMore. It sounds like something you should participate.
Moritz StefanerTo, although I really have trouble working without data. So you know, usually I say, listen, people, I need some data, otherwise I'm in trouble here. But it sounds like a fascinating task. Definitely. And, I mean, there's a clear description of how the situation is, what the I. What the problem to solve is. And.
Celste PaulYeah, imagine a movie script. Imagine you were a subject matter expert for a movie, and they said, we need a cool looking display that looks realistic, and this is what the actor is doing. This is what the actor is trying to find. Go and design it.
Enrico BertiniYeah, no problem.
Celste PaulSomebody bringing.
Enrico BertiniYeah, it's not my phone. Yeah, it can happen. Don't worry.
Celste PaulOkay.
Enrico BertiniYeah, go ahead. Go ahead.
Celste PaulI mean, imagine if it was a movie script and they asked you.
Georges GrinsteinSorry about that.
Enrico BertiniYeah, no problem. No problem.
Georges GrinsteinKeep going. Go ahead. I don't know how to shut it off here.
Celste PaulMaybe the story isn't worth it.
Moritz StefanerNo, it's just on George Straxer.
Celste PaulYes, but imagine a movie script where you know what the actor is doing, you know what the actor is going to get, and you get to design anything that is convincing but also a little bit science fictiony. What boundaries can you really, you know, pass if you didn't have the restrictions of today's data or today's analytics and.
Moritz StefanerYou can think about or today's hardware, maybe, you know.
Celste PaulYeah. What can we do in five or ten years? Exactly. Exactly. So this is the first year for that. With the way that the challenge is designed, we have a good idea of how many people have viewed the page, but since there's no data to download, we have no idea how many people might actually participate. So it'll be very interesting. Judging will be interesting, too.
Enrico BertiniSo you're not providing any data sets with this one?
Celste PaulNo data at all.
Enrico BertiniNo data at all.
Celste PaulNo data at all. It's a two page story about a data scenario and the person who is in it.
Enrico BertiniSo that's the madness session of the vast challenge. Okay. So you expect very creative, bold kind of solutions, right?
Celste PaulYeah. So we've recruit, we've passed on the challenge to some art schools and some other information visualization professors who were very interested in the type of challenge. So we're looking forward to seeing the types of results we get for this year.
Enrico BertiniOkay, and how are you gonna evaluate this? I mean, it looks complicated.
Georges GrinsteinWe have some specifications on what they're gonna submit. A video, an image. We also suggested storyboards, and it's very similar in terms of having for the classic, vast challenges. We have analysts and experts who participate heavily in evaluation along with the committee. In this case, we'll have designers also, and Infiz designers and others participate as well.
Celste PaulYes, it's a subjective visualization, but artistic critique is pretty common. And we've provided some very good guidelines so that everything will be consistent. So, you know, novelty, transference of ideas from one domain to another. How much complexity did they try? The more complexity that they try, they get more points, even if it seems to, you know, might be too much. How large is the network that they tried? How many objects are they, are they working with? There are a lot of things I don't want to give away too much of the judging criteria on purpose. Maybe we should strike that. It'll be interesting.
Enrico BertiniOkay, but you have a specific committee for this challenge, right?
Celste PaulYes. So we'll have subject matter experts who have experience in cybersecurity and situation awareness, and then we'll also have the technical experts, and in this case it will be designers and artists who are volunteering to review. And this is very similar to other challenges that we have where in the past we would have the subject matter expertise, say last year's challenge was cybersecurity related, but then we would also have visualization experts as well, rate every single submission.
Enrico BertiniAll right.
Georges GrinsteinAnd we do we have told the participants that they can connect with cybersecurity individuals if they want to.
Enrico BertiniYes.
Celste PaulWe definitely encourage any type of domain or user research, not just in the design challenge, but also with the technical data challenges.
The Cybersecurity Conference 2017 Mini Challenges AI generated chapter summary:
Third mini challenge is the really big data challenge. It's the third challenge in a series of cybersecurity situation awareness challenges. This year's challenge in terms of size is 91 million records. We're very interested to see how well people deal with the complexity.
Enrico BertiniOkay, and what about the third mini challenge?
Celste PaulYes, so the third mini challenge is the really big data challenge. It's the third challenge in a series of cybersecurity situation awareness challenges. So if you remember, in 2011, we had a small situation awareness challenge where it was 150 computers with firewall logs, packet captures, intrusion detection logs, nessus scan, syslogs, lots of different things, but for a small set of computers. And this was really to introduce the community to volume and complexity of this type of data. Last year it was focused, both challenges were focused on cybersecurity, and we looked at depth and breadth. So last year we had a near million node network that we generated high level network health data for. So that was definitely a volume challenge for a lot of people. But we also created an in depth situation awareness and analysis challenge using 5000 computers with firewall logs and intrusion detection system. And so that was moderate complexity, but it was still relatively easy for participants. This year's challenge in terms of size is 91 million records.
Moritz StefanerOh, wow.
Georges GrinsteinWow.
Celste PaulYes. And this is a combination of Big Brother records, which is a, it's similar to health and status, but it's an actual product. So these are very realistic logs, and it'll tell you whether or not a policy deviation has happened. It has full net flow data, which are all the records of one computer talking to another. And then we also have one week of intrusion prevention system logs. The entire challenge takes place over two weeks, real time. And it is a lot of data, and it's not just volume this year. It is also very complex. There's a lot of stuff going on. So we're very interested to see how well people deal with the complexity in addition to the volume, but also exactly what everyone finds, because we have some interesting things hidden in the data.
Enrico BertiniOkay, so that's the most traditional one. Right. So the structure of this one is the usual one where you have lots of data, very clear, analytical task, and you injected some ground truth, right?
Celste PaulYes. It's very large and very complex data, and we know what we put in it, but there might be other things as well.
Georges GrinsteinAnd I do want to point out that mini challenge one is also somewhat traditional, because the ground truth is whatever happens that day.
Enrico BertiniSure, sure. The ground truth is embedded in the design, right?
Georges GrinsteinI mean, well, in number one, yes. It's whatever the public decides to go see.
Celste PaulChallenge number one is real data. Challenge number two is no data.
Enrico BertiniOkay.
Celste PaulAnd challenge number three is high quality simulated data.
Enrico BertiniSo you have the old spectrum this year.
Georges GrinsteinYes.
Enrico BertiniI guess it's the first time you have that, right?
Georges GrinsteinYeah.
Celste PaulSuch diverse challenges. Yes, that's definitely true.
Georges GrinsteinI think so. We've had diverse in terms of data and problems, many challenges that were separate. But this also brings it back in some way.
Enrico BertiniRight.
Georges GrinsteinIn fact, this is very similar. You can only work on mini challenge one if you want to and not deal with the others.
Enrico BertiniSure, sure.
Georges GrinsteinAnd so on.
Enrico BertiniOkay. And the mini challenges, of course, are no longer connected as they were last year or previous years. Right. Previous years, most of the mini challenges were connected. Right. So you could actually participate to the grand challenge and connect all the three pieces together, is that correct?
Georges GrinsteinThat's right, yes.
Celste PaulBut as our data sets became more and more complex, we were having, it was a big enough challenge to create a coherent story in one mini challenge, because really we have three types of data in the mini challenge. So in a way, that's its own little grand challenge, if you analyze each type of data separately, and that's true for the other data sets as well.
Georges GrinsteinAnd I think that one thing to realize is that the vast challenges are continually evolving as we try to solve more and more complex problems and actually get some fantastic results from people, as Celeste was describing it, especially on many challenge two. One of the challenges we had was to analyze in the past gene mutations to identify the most virulent mutation to be able to quarantine it. And one of the solutions that came in, most of the solutions use typical text gene blast differencing to try to identify where the mutations occurred from the bioinformatic side. But one group used plagiarism tools, and that was brilliant, very creative. And so those are the kinds of things where communities go out, know a particular area, solve a problem in a way that's tremendously creative. And by having these challenges available to everyone, that's what we're hoping that communities talk to each other.
Celste PaulYes, exactly.
How do you evaluate the Data Science Challenges? AI generated chapter summary:
We're pretty broad on evaluating a submission from many, many different aspects. We focus on the justification through the analysis and not just finding something. We want to encourage people to take that approach whenever they're creating their analytics.
Enrico BertiniOkay. And another thing I wanted to ask you. How do you. So, for instance, for the mini challenge three, that is the most traditional one, how do you evaluate the entries? Is this just counting the. How much of the ground truth is covered, or you have several different kind of criteria?
Celste PaulWell, certainly finding the events that we inject into the data is beneficial. Sometimes people might find an event, but they don't quite explain it the way we expected them to. If they can justify that with the visualizations that they created, they often still get the points. Sometimes they find things that we did not intend or we did not put in the data. But if they can justify that with their analytics, that will definitely give them credit. We look at things like, how can.
Enrico BertiniYou get negative scores for something you do? False findings, false positives.
Georges GrinsteinIt really depends on what is found in the approach. So we and the awards we give are not just related to ground truth. They could often be approach integration issues, being able to scale even more than 90 million, but maybe to a trillion. I mean, those are the kinds of things we look at. We're pretty broad on evaluating a submission from many, many different aspects.
Celste PaulYeah, someone might come up with a really interesting visualization for one type of data problem, but they don't really get any of the other parts of the challenge. If it's a really interesting, really creative way to solve that one tiny part of the problem, we may recognize that. But in terms of judging, finding the right answer, or finding things in the data and describing the analysis, like George said, is just one part of everything that we consider. But we focus on the justification through the analysis and not just finding something. If you find something, but you can't explain how or why you found it using your analytic, it doesn't mean much to us. One of the challenges that we had in 2011 is that people took a more forensics approach to looking at the data than an active analysis approach. And so we try very hard to create active analysis situations because if these tools were being used in the real world, they're not going to be used after the effect. They're going to be used in the moment because they're situation awareness problems. And so we want to, we want to encourage people to take that approach whenever they're creating their analytics and not focus too much on post hoc forensics.
How to participate in the Vast Challenge AI generated chapter summary:
In order to participate, not necessarily, you have to build your own tools. It's certainly possible to use existing tools to solve the problem. But building your own custom tools allow a lot of flexibility in terms of how you process large amounts of data.
Enrico BertiniOkay, I guess so in order to participate, not necessarily, you have to build your own tools, right? So you could actually use existing tools and you will be still judging positively entries that discover the ground truth and explain how they did it and what's the process, right? Is that correct?
Celste PaulOh, yes, that's true. Last year we had many submit very good submissions of Tableau worksheets. We've also had people just create very simple analytics using R and D3 and Mysql database. However, those tools are already available to a lot of people, and so the limits of those tools have already been explored. It's certainly possible to use existing tools to solve the problem in a new and interesting way, and perhaps even better than somebody who comes up with a custom tool. But building your own custom tools allow a lot of flexibility in terms of how you process large amounts of data, because not all commercial tools, very few commercial tools, actually. And I'm not sure if there are any commercial tools that can handle 91 million records unless you do streaming data, can handle that much data or be able to tweak the visualization for the data or for the analysis process for the scenario or for the user that you're tuning it for.
Enrico BertiniSure. Sure. Well, this is what I noticed when I was using the bas challenge for teaching when I was in Konstanz in Germany. I think I've been teaching a visual analytics course for a couple of years, and the whole course was organized around the vast challenge. So basically the project assigned to the students was participating to the vast challenge, right. And one thing that I noticed that I really liked is the fact that the students end up using a lot of different tools in order to solve the problem. So they start with some, typically they start with some data processing tools. They start with one, then they understand that there is a limit, so they switch to another one. So I remember, for instance, last year, they had a lot of problems with the data size, as you mentioned before. So they start with some kind of SQL stuff? No, I think they start with some kind of text editing tool or something really simple. Then they moved to SQL, then they moved to something else again. And then they started thinking, how does this data look like? And then they started using Tableau, somebody else used r. Then at some point they discovered there are some limitations in this tool and think, oh, maybe I should come up with my own design. And then they design something themselves. And at the end of the day, at the end of all this process, you've learned so much because you've been through all these phases, you've been trying all these tools. You know exactly how, what you can achieve with these tools and where the limits are. So I personally found it really, really enriching and as a process. So did you get similar feedback from other professors?
Georges GrinsteinOften.
Enrico BertiniOften, yeah. So the only thing is I notice is that now that I'm in the US, I can no longer organize the course around the vast challenge because the deadlines are no longer aligned. So in Europe they are aligned. And I find it really interesting because the students love the fact that they can actively participate to the challenge and at the same time do the course right. And, yeah, so speed of, it's my little protest against, every year we say.
Celste PaulWe'Re going to get the data out early this year. Early, early. But we also forget that we decided to come up with a larger and even more complex and even more realistic data set. And no matter how hard we try, we need that extra month or two.
Enrico BertiniEven if you would be able to put the data sets out earlier, I think it's very nice that the deadlines fall within the schedule of the course and that's not going to happen here. I think that's, that's pity, actually.
Georges GrinsteinYeah.
Celste PaulWe don't know in the future.
Georges GrinsteinYeah, we keep trying. But on the other hand, mini challenge one did start in January.
Enrico BertiniOh, yeah, sure, sure. Yeah. I could have done that.
Celste PaulYes, a challenge. And we are working out ways that we could possibly change that because, I mean, one of the challenges is we're coming up with bigger and crazier data sets. People are going to need more than three or four months to work on them.
Enrico BertiniYeah, yeah, yeah, sure. And do you know of anyone that keep, keeps working on these data sets after the challenge? Does this happen?
Georges GrinsteinYes, yes.
Celste PaulWe've seen the data sets and we've seen iterations on submissions show up at conferences the next year and sometimes other publications.
Enrico BertiniOkay. So I was wondering, and in fact.
Georges GrinsteinI want to point out that there's a special issue. Of course, I'm going to block out where it is information visualization. Information visualization. There's a special issue where some of the past challenge participants are actually giving updates on their research.
Enrico BertiniOh, fantastic. Is that in the information visualization journal?
Georges GrinsteinYes.
Enrico BertiniOkay. Is it out already?
Celste PaulIf not, it's coming out very soon.
Enrico BertiniIt's coming out very soon. Okay, good.
Georges GrinsteinYeah, they were. I think we had three or four additional papers. I think the papers came from Danyel Keim and Chris north and Catherine Pleasant on evaluation. And there's a whole bunch of them. I'm just picking out a few, but I think they were 345, basically.
Enrico BertiniOkay, good. So I guess some of our listeners now, probably some of them are thinking, should I participate? Or maybe it's too easy to answer. I'm sure many people will be undecided. So that's the time to advertise the vast challenge. Why should someone participate to the vast challenge and why to the vast challenge and not other challenges? Assuming that the vast challenge is in competition with other challenges, why choosing this one?
The VISION Vast Challenge AI generated chapter summary:
The vast challenge looks at analysis, visualization, and the human as a system to work together. Every single submission gets reviewed by at least two people, one subject matter expert and one visualization expert. Can you engage with the other participants before submitting and with the committee?
Enrico BertiniOkay, good. So I guess some of our listeners now, probably some of them are thinking, should I participate? Or maybe it's too easy to answer. I'm sure many people will be undecided. So that's the time to advertise the vast challenge. Why should someone participate to the vast challenge and why to the vast challenge and not other challenges? Assuming that the vast challenge is in competition with other challenges, why choosing this one?
Georges GrinsteinI think that there are. I think that this is the world. This is the time for challenges. They're occurring everywhere. They're all very interesting. They all have, many of them are providing prizes and money. The vast challenge actually is probably the only one, in my view, that looks at analysis, visualization, and the human as a system to work together. And that's quite different. I think that's a really key factor. And if you start extending that, you start thinking, oh, my goodness, we have to start thinking of human factors, design issues, cognition, and so on. And that's really what the vast challenge offers in a way. So if you want to just do analysis, you can use it. If you want to just do visualization, you can. If you want to combine all of them and have the human drive or steer computation, you can. So it really provides lots of opportunities to do different things. So that's just on the data and problem side. But there's also the community. We have a workshop, for example, a whole day workshop where people can come, hear speakers, exchange, present what they're doing, discuss it with other, hear other ideas on these complex problems, and then finally, you get rewards. Students can get publications, can rub shoulders with the top researchers in this area. So I think it's an extremely enriching experience.
Celste PaulYeah. Another thing I would like to mention is that every single submission gets reviewed by at least two people, one subject matter expert and one visualization expert. And so if you have an idea that you're, you know, maybe still working out, and this is the first time you've run it through a data scenario. This is a really great place to test out your ideas and get really good feedback because all of our reviewers are so awesome and they spend so much time giving high quality feedback to many, many submissions that we receive.
Georges GrinsteinRight. We're talking about actual analysts or design artists or information visualization experts, and the committee also reads most of them, if not all.
Celste PaulSome of us read all of them.
Enrico BertiniOkay. So if I want to participate, how does it look like I just go to the website? Do I have to register or anything?
Georges GrinsteinYes, if you want to get updates, I think. But otherwise you can just download the datas.
Enrico BertiniOkay.
Celste PaulYeah, it's at the VA community website. It should be on the front page and we can give you a link for that. We should have the links for that. You can register for updates, which is always good in case we add corrections or updates to the data. Sometimes we clarify instructions, but otherwise you just download the data and look at the requirements for submission and submit your solution.
Enrico BertiniCan you engage with the other participants before submitting and with the committee?
Georges GrinsteinWe recommend that all the time, and there are people who coordinate together or ask questions to the committee, but we usually don't get that many questions, actually.
Enrico BertiniOkay. Okay. And of course, if you're looking for.
Moritz StefanerA collaborator or so, I mean, is there like a forum or some place where people like, maybe you're alone and you think, like, it would be, would be nice to team up?
Georges GrinsteinWe did in the past. We got so little response from that.
Celste PaulYeah, that's interesting you mentioned that because many people during the workshop said, hey, it would be great to have an online community to coordinate. And I'm pretty sure there is a forum on the VA community website for the challenge, but there's not been a whole lot of activity.
Moritz StefanerYeah, I just realized that in the moocs, you know, in these online courses, that a lot of the actual, the interesting stuff happens among the participants. So. But, yeah, maybe because it's a competition, it's a bit different.
Celste PaulAnd, you know, I think a lot of interaction happens at the workshop where people meet with each other and they discuss things and then they exchange email addresses and they begin collaborating that way. Last year was the first year that VizSec workshop was at Visweek, and that was great because these were all of the subject matter experts who were dabbling in information visualizations, meeting all the information visualization experts who were dabbing in the cybersecurity realm. And so them getting together to be able to discuss problems and solutions was really valuable. And that's what we're hoping for this year with the design challenge, where now we're going to introduce artists and designers to this community so that they start working together.
What is the VISA Challenge? AI generated chapter summary:
The deadline for the contest this year is the end of the second week of July. Many people continue working on their submissions well past the deadline date. The vast challenge event lasts a full day. Do you give any kind of financial support in case it's needed for traveling?
Enrico BertiniOkay. And so assuming you get an award, what happens next? So you get invited to the.
Georges GrinsteinYou get invited, you become world famous. You get everything that comes with the award.
Celste PaulSo many of the award recipients, we invite them to give a talk, and depending on the schedule and depending on what their award was for, you know, maybe it's a ten minute talk, maybe it's a 30 minutes talk, they'll be able to walk through and give a presentation on. On their contest submission. And often many people continue working on their submissions well past the deadline date. So the deadline for the contest this year is the end of the second week of July. But, you know, this week isn't until October. That's many months that people can continue working on their. On their software. And sometimes some of the advancements that they've made over the past few months are really, really interesting.
Georges GrinsteinAnd they can show demos. We have places where they can bring a laptop or more and then alongside a poster if they want to actually show live demos.
Enrico BertiniFantastic. So the event at this, I mean, the vast challenge event is it lasts, what, half day?
Celste PaulA whole day workshop?
Georges GrinsteinIt's a full day workshop.
Enrico BertiniOkay. And do you give any kind, any sort of financial support in case it's needed for traveling?
Georges GrinsteinWe used to in the past. I don't think this year we've planned for that. No, I don't think we've planned for that this year.
Enrico BertiniOkay, so the vis is gonna be in Atlanta. Right. So I'm just trying to give some basic information to people just to. And the deadline for submission is when?
Georges GrinsteinSecond week in July.
Enrico Bertini2 week in July. Okay. So it's almost. It's basically one month from now, right?
Celste PaulYes.
Enrico BertiniIf somebody wants to participate and didn't know anything about it before.
Georges GrinsteinThat's right.
The Competitors' Challenges AI generated chapter summary:
It's not uncommon for commercial companies to submit. If you offer a visualization tool, I think it's perfect showcase if you manage to do something meaningful with it. It sounds like something, if you really dive into it, it can be very rewarding.
Enrico BertiniOkay. Moritz, is there anything else you want to ask?
Moritz StefanerNo, it sounds really interesting. I mean, I can only comment, you know, I'm an ex academic, so I can only comment from this perspective that I always wanted to take part. But now, you know, it sounds like a lot of work. You have to invest, and then even if you win, you have to fly there yourself. So it makes it a bit for me, it's difficult to factor in the time, but it sounds like something, if you really dive into it, it can be very rewarding.
Celste PaulIt's not uncommon for commercial companies to submit. Last year, business forensics participated we've had Tableau and some other commercial products.
Moritz StefanerIf you offer a visualization tool, I think it's perfect showcase if you manage to do something meaningful with it.
Georges GrinsteinI think there are lots of companies and business groups that do participate because clearly receiving an award in one of the many challenges highlights the strengths for that particular company. And we've had companies participate many years in a row because they keep improving their tools using these data sets.
The VINTAGE Data Challenge AI generated chapter summary:
The vast challenge has real problems. It's a scenario with people that could happen out in the real world. How does it help making progress in visual analytics in general? Could be a whole genre, like telling fictional stories through data.
Enrico BertiniOkay, well, this sounds like a great outcome of the vast challenge. I mean, this is something I actually wanted to ask you. So what's the impact of the vast challenge beyond the challenge itself? I mean, how does it help making progress in visual analytics in general?
Georges GrinsteinI think that as people look at the tools and what they've solved, they get ideas in their evolution of their tools. Discussing with others, how did you handle that? What did you do about the large data? Oh, you presented it this way. Oh, I see. You use this design idea and so what you're doing is having a mechanism by which people can exchange extremely creative solutions and what works, what doesn't, what scales, what, how well does your system integrate this and so on. That's a rarity that's very difficult today to read in a paper. But it's terrific if you're there and you actually see that happen.
Celste PaulAnd I think that happens because the vast challenge has real problems. And it's not just problem one, problem two, problem three give us answers. It's a scenario with people that could happen out in the real world and the data is very real. It's much easier to transfer a realistic scenario to other applications than, you know, a highly academic discussion.
Enrico BertiniSure.
Georges GrinsteinYeah.
Moritz StefanerAnd you never have that direct comparison because you might read one paper on one technique for one problem, but then another paper on another technique for another problem. But you rarely see the same tool or different tools used for the same problem, right?
Georges GrinsteinThat's right, yes.
Moritz StefanerAnd I think that's very interesting and very, very instructive. And I find that with all the contests, I mean, actually we started out data stories with sort of a rant on the contest, not the last general, but a few of the others. But one of the benefits really is that you have all these different solutions to the same issue, probably the same problems, and that can be very instructive.
Celste PaulWe're glad we've changed your mind.
Moritz StefanerWe actually, you can go back and listen. We actually, we excluded the vast change, I think, from our end.
Enrico BertiniGood. I had forgotten about it.
Georges GrinsteinAs you can see, it is an extremely large amount of work to generate these data sets.
Moritz StefanerYeah.
Celste PaulThis is not a trivial volunteer project. We spend a lot of time and effort to really deliver quality data sets and a quality challenge because we really believe that it's making an impact on the research community. It's really helping students start off their careers, and it's showing the, you know, the real life applicability of visual analytics research that can be applied to the commercial industry.
Enrico BertiniSo did you ever publish anything on the generation process itself?
Georges GrinsteinYes, PNNL published a paper initially, I think, and several others. So there are techniques on there. Threat generation, stream generation.
Celste PaulYeah. Data generation changes from year to year as we change the focus of the data challenge, but also the scale of the data challenge. So I would maybe keep an eye for newer publications that deal with this cybersecurity data and all of the challenges that we dealt with.
Moritz StefanerOkay. Yeah, it's interesting. There's also a nice, another art parallel because there's this mini genre. It's called data fiction. And there's, for instance, one guy, Kim Asendorf, was like a media artist, and he told a fictional story about what happened in a biology lab and, and outbreak of a virus and so on. But he told it only through charts. And the more you study these charts and diagrams, you understand what happens. But it is sort of like a movie plot. I think it could be a whole genre, like telling fictional stories through data.
Enrico BertiniAbsolutely. Yeah, absolutely.
Moritz StefanerIt's very interesting. And it's. I mean, you are doing it to a degree that probably nobody else does.
Celste PaulYeah. Storytelling is a really interesting way to shift the perspective so that, I mean, just thinking about the problem a little bit differently will make you go down paths that you never would have considered if you were stuck to a database.
Moritz StefanerInverts the whole process. Like, usually you're looking for the story out there, and then you have to, like, come up with that story.
Celste PaulNow it's telling the story. Exactly. And I mean, that's. That's why. Why we've chosen situation awareness for that challenge. Because you don't know what the story is yet. With situation awareness, your job is to tell the story in real time.
Enrico BertiniSure, sure. So in a way, you are evaluating the storytelling capabilities as well, right, in the entries? Yeah, that too.
Moritz StefanerYeah.
Celste PaulTheir analysis process and how everything fits together.
Georges GrinsteinThat's right.
Enrico BertiniDid you hire any screenwriter people like that?
Celste PaulWe did put out a volunteer for amateur science fiction writers a few times.
Moritz StefanerJust within our friends to get some Hollywood people.
Georges GrinsteinLike, I think we're probably all aspiring.
Moritz StefanerMovie writers, so there might be a second career.
Enrico BertiniFantastic. Amazing. Amazing. So the best challenge is always gonna be in the context of the visconference, right? So you never thought about making it a separate kind of event?
The VISION Conference: The Challenge AI generated chapter summary:
The challenge is visual analytics for science and technology. At Viz, there is also a track for vast, specifically. Unofficially, it's linked to the vast track. But it's separate also, we run it completely separately. Next year we'll be in France.
Enrico BertiniFantastic. Amazing. Amazing. So the best challenge is always gonna be in the context of the visconference, right? So you never thought about making it a separate kind of event?
Georges GrinsteinI'm not sure what. I know what you mean.
Celste PaulSo the challenge is visual analytics for science and technology. At Viz, there is also a track for vast, specifically. And so the challenge is attached to that. Well, unofficially, it's linked to the vast track.
Georges GrinsteinYeah, it's part of the symposium. The vast symposium. But it's separate also, we run it completely separately. It both draws a lot of participants to come because quite a number participate in the vast challenge and therefore ten the vast symposium. So it's a part of it. It's just historically, the challenge has grown as its separate entity because of the large size.
Enrico BertiniOkay, sure. But it's always co located with the vis conference.
Georges GrinsteinIt has always been co located. And so next year we'll be in France.
Enrico BertiniOh, that's fantastic. Sure. In Paris, right?
Celste PaulIt's in Paris.
Enrico BertiniSo one more motivation to participate.
Georges GrinsteinThat's right. And maybe we'll get grant money to provide free trips for students.
Enrico BertiniThat would be awesome. Yeah, yeah, yeah, yeah. Okay. I don't know. I think we can stop here unless there is. Is there anything else you want to add about the Bas challenge?
The Baseline Challenge AI generated chapter summary:
Is there anything else you want to add about the Bas challenge? No, I think that we've discussed the impact and all of the activities that are involved. I really hope that some of our listeners will pick it up, even if there is not a lot of time until the deadline.
Enrico BertiniThat would be awesome. Yeah, yeah, yeah, yeah. Okay. I don't know. I think we can stop here unless there is. Is there anything else you want to add about the Bas challenge?
Georges GrinsteinNo, I think that we've discussed the impact and all of the activities that are involved, so I think. I want to thank you. It's for all the questions and your.
Celste PaulYes, you asked all the right questions. We had all the right answers.
Enrico BertiniNo, no. I am myself a big fan of the challenge, so I'm really glad that we managed to invite you and know all the details about that. It's really interesting what you guys are doing. And as we said, it's not easy at all. It doesn't look easy. Moritz, is there anything else you want to add?
Moritz StefanerNo, I'm good. Super interesting, I think. Yeah. Like learning what. What's going on behind the scenes, behind these innocent looking datasets. I think that was really interesting. I will see them with different eyes now, I guess.
Enrico BertiniYeah, yeah. I really hope that some of our listeners will pick it up, even if there is not a lot of time until the deadline.
Moritz StefanerI think it's fine.
Enrico BertiniIt's one month.
Moritz StefanerYeah, exactly. I mean, realistically, you wouldn't start earlier anyways, right?
Enrico BertiniYeah.
Moritz StefanerYou might as well just start now.
Putin on immigration AI generated chapter summary:
Great. Well, thank you so much, Moritz. Enrico. Thank you for having us. Fantastic. Bye bye.
Georges GrinsteinGreat. Well, thank you so much, Moritz. Enrico, thank you.
Enrico BertiniThank you.
Celste PaulThank you for having us.
Moritz StefanerYeah, thanks for joining us. Fantastic.
Enrico BertiniBye bye. Bye. Bye bye.
Moritz StefanerThank you. Bye.