On Saturday March 12th around 400 people gathered in Ballroom B of the Austin Convention Center. With all seats taken and no space left to stand along the walls, the queue outside the doors was still 100 people strong. I am not going to lie here, my excitement rose from minute to minute. With me on the stage were Eric Friedman of Foursquare, Robin Richards of JESS3 and Adam Bly of Visualizing.org. The four of us, all coming from slightly different areas, were about to share our thoughts on social media data visualization.
What it is — How it is done — and most importantly: Why you should care.
I had the pleasure to host this panel and guide the audience through three brief presentations and a discussion afterwards. During the discussion and Q&A we tied together the three presented topics. The gathering of data, its’ visual representation and the resulting social impact. We had a great conversation about the opportunities and challenges that we as visualization practitioners face in our daily work. Once again the saying, that the audience is often more knowledgeable than the presenter proved to be true. Thus, we had a lot of well-founded questions that ignited a healthy discussion. The audience responded very positive to the provided content and luckily was forgiving to my Swiss accent.
Here is the complete audio recording of the panel as well as the Q&A session afterwards.
If you want to read anything more in-depth, you can find the full transcript below.
Good afternoon everybody, and welcome to our panel. We’re here to talk about Social Media Data Visualization: Mapping the World’s Conversation. Just some housekeeping. This session will be recorded so at the Q&A, please step up to the mic so everybody can hear you clearly. The hashtag for this presentation, if you feel like tweeting about it, will be #smvis, which is short, sweet, and sounds a little bit dirty. That’s okay.This panel will consist of three short presentations that cover the field of data visualization, from data mining to presentation to measuring and evaluating the impact of data visualization. So let’s go with the introduction of our panelists.
Good afternoon. I’m Eric Friedman, Director of Business Development at Foursquare. I’m here to talk about a lot of the interesting things we’ve done with some of the information we have available to make it a little bit easier to understand, a little bit easier to use, and partnerships with some of the folks up on stage and some of the things we’ve done on our own.
Hi, I’m Robin Richards. I’m the Information Design Director at JESS3, and it’s nice that you all could attend this afternoon. It’s good to see you also. I currently do the infographics and database output with JESS3 for projects, like you can see here with Samsumg, which I’ll touch on later. And I’ve been working with JESS3 for about a year now.
Hi, I’m Adam Bly. I’m a former scientist who turned from science to communicating science and using design to help people understand really complex scientific issues and recently founded Visualizing.org, a new platform to advance data visualization and open data.
I’m Benjamin Wiederkehr. I’m from Switzerland and the Managing Director of a user experience design company called Interactive Things, and I write a blog which is called DataVisualization.ch, which as you might guess is all about data visualizations. As a non-native speaker and a South by Southwest virgin, it’s my pleasure to be the moderator for today. So, how’s that for a challenge?
All right. Without further ado, I hand it over to Eric to give his first presentation.
Thanks very much. We combined all of our slides together so we can, I guess, just jump right into what I’ll be talking about, which is data visualizations and what we’ve done to try and explain them. The first thing I think that’s really important, a lot of people talk about these types of things, data visualizations or infographics. I think it’s important to back up a moment and just understand the story that you want to tell. From our perspective, we like to think that adding a location layer adds a very interesting element to any of these narratives that you are trying to visualize. And we do that by introducing some things like demographics, popularity, time, things like that. Then, of course, there are various different ways to show a lot of information.
So using all of these services that are out there, Foursquare included, I like to think of the things that you are giving off or leaving behind as data exhaust. What you put into these services hopefully makes them better, both for the aggregate service and for your next time using the service. So, whether it’s Twitter or Facebook or Foursquare, any social media service or really anything that gives you a better experience, you’re really giving off data exhaust. In my world, that exhaust is what we use to turn into understandable visualizations.
So, there are a lot of complicated and large numbers out there, one of which is up on the board here, which is very hard to wrap your head around. This number is almost 400 million, and that’s the number of check-ins that we had last year. The question becomes, how can I portray that to someone in a meaningful way? And the answer is putting each of those check-ins on a map. This is the result of almost 400 million check-ins on a map of the world, and this is part of an infographic that we came up with to allow people to really wrap their heads around how big of a number that is, where we’re getting coverage, and how to really understand putting something out there like that.
Another example, 75,000 unlocks and 23,000 check-ins. What does this refer to and how do you really understand these numbers? This is actually for a project we did in November in association with JESS3, if you go to the next slide. This was on voting day, and we put together a map that showed where people were checking in around the U.S. and how many people were actually checking into these locations.
This was actually a fully interactive visualization, where the map was actually interactive. You could zoom into the state, county, and individual voting location level to really let people understand not only where to vote, but where activity was happening. On the bottom here, you see a graph in terms of time. You can see, like many of you, people got out to vote in the evening. So again, it’s really about understanding the story that you want to tell and visualizing it in a way that someone can understand who might not have a lot of background in your field or industry or expertise, and showing it in a way that is hopefully very easy to understand.
Another quick example, how can you wrap your head around 200,000 things happening in a single day? This actually happened on Super Bowl Sunday, and this is the amount of people that checked into the Super Bowl, that we actually put as a trending venue in Foursquare. This maps to 200,000 people around the U.S. who shouted either Green Bay or Packers or Steelers, the time in which they shouted those things, and where they were in United States. So again, getting out there with a story of 200,000 people participated in this event is visually shown in a way that I think shows how it maps to our service, a lot of activity on either coast, and in a way that’s very easy to understand.
There are a lot of things you can do with this information. One of the things that we’re thinking about is how we can use this information to have better forecasting. This can be used in things like, if you go to the next slide, this is movie theater check-ins by release date. So we looked at all of our movie theater check-in data over the past year, in 2010, and mapped it to the release of some of the biggest movies that are out there. And so you can see there are some spikes on holiday times, like Christmas Day, a lot of people like to go to the movies. Then you can see things that are really interesting like July 4th, obviously when some hit movies like “Ironman 2″ and “Inception” came out. What’s most interesting is for prediction modeling, if people are doing this behavior Friday, Saturday, and even Sunday, this data can actually preclude what the box office can show in terms of picking those winners and modeling out when the release schedule works and showing the aggregate behavior of people.
So I like to think of it as a lot of data plus location gives you context, and this is part of the visualization that we gave at the end of last year, part of our infographic. You see the top six categories and what the most popular places are in those categories. So, when I tell people that 10,000 people checked into art galleries, that doesn’t have a lot of meaning. But when they look at the top three art galleries around the world, they get some context. And I’ll leave this up there, because you guys can read faster than I can speak, but I think there are some really interesting data points in terms of these six examples.
You can see this speaks to where our user base is. Those maps you saw earlier, we have a lot of activity on either coast of the United States, so a lot happening in New York, a lot happening in California. But as the user base grows, this information becomes even more meaningful for some of the things that I mentioned. Thank you.
I’m just going to continue on from Eric there and just give you some insights into visual storytelling and how we at JESS3 go about visual storytelling, and just look at the processes of how that relates to mapping the world’s conversation and how we actually get to the final product with things like “I Voted”. So, to kind of begin, we need to look at the fundamental question of what is conversation. Obviously, it’s verbal, as I’m talking to you now. It’s also visual with the written language. But ideally, when we’re talking about conversation in a social context, we’re kind of talking about the exchange of information. So we’re talking about check-ins. We’re talking about your posts on Facebook. We’re talking about your tweets. It’s just the exchange of information that is generating the conversation. So that’s what we kind of look at in terms of social conversation and then how we go about mapping that conversation.
So, essentially what you are doing is you’re sharing your story, you’re sharing your personal story. On Facebook, it’s a little bit more personal with your friends. On Twitter, it’s a little bit anonymous. I follow lots of people, I’m sure you do, that you never meet, you never will meet. So it’s just you’re sharing whatever your story is. And here, you see an example of the geosocial universe that we did, and just trying to tell the story of how big geosocial is. So we’re looking at these astronomical numbers in relation to each other that are very abstract. Well, then how do they relate to each other? Which services are they using? And just by using simple things like universe helps to tell the story, because then you get an idea of it’s a large, big thing.
So, that’s what we’re trying to get at when we talk about visual storytelling. But then, how do we go about visual storytelling in terms of process? We kind of have different stages, and the first stage is you have to ask questions. At JESS3, we ask like a million questions on every single project. For this one, for example, with the Conversation Prism, which hopefully some of you guys have seen, and it’s available as a poster if you think it’s really that cool. I do. I have one on my wall, so I recommend you do. And we’re looking at where the conversation is happening on the Web, which services people are using, and how they relate to each other. So that is the context of the story.
The first question you really should ask, as Eric indicated, is what is the story? That is your core question. Everything else relates to that. You have to figure out what your story is and then build from there, and then you carry on asking questions. But then, the other thing that we find at JESS3 is where do we get the data from? Those sources shown here are the social APIs, and we use those a lot. Sometimes they’re reliable. Sometimes they’re not. But we use a lot of social APIs. With the conversation prism or the geosocial universe, there’s a lot more research base and a lot more pulling and looking for facts. But you always want to have reliable data, and you always want to have accurate data. Without that as your core, you can’t really tell the story.
Then the other thing we take into consideration is the audience, and then dovetailing in with the audience is your delivery, what means you’re actually delivering the visualization. For example, geosocial universe and conversation prism are prints. They’re meant to be distributed via the Web, so they’re images. Whereas something, which I’ll touch on a little later with the Samsung visualization that you see on the wall here, that’s a big installation. Is it a website? Is it a video? That also dictates how you approach a project.
Now, one of the other things before we actually get into just the fine nitty-gritty of the project, which we like to do at JESS3, and this seems a little odd to have this statement on the screen, I’ll admit. But I think it’s brilliant. I live by it. And it’s just, “Make mistakes faster.” What I mean by that is once you’ve asked all of your questions, you have to have a complete open mind. That’s what we do. We have a complete open mind in that we’d like to make mistakes because we learn from them. If you’re afraid to make mistakes, you’re kind of always stuck in a bubble, and you just tend to repeat yourself. So, adding to making mistakes faster is my personal motto that I live by is, “Don’t be afraid to fail.” And that, I think, is one of the ethos we have at JESS3, and that we’re prepared to take risks. We’re not scared of making mistakes or failing, and so you have to be prepared to do that, and that’s what we did we like to do.
Now once you’ve got an open mind and you’ve worked that out, how do you actually go about the process of coming up with these ideas? What I personally like to do is to just sketch and get it all out of your head. We’re all seeing stuff in this field. We all have reference points. We’ve had discussions when we’re asking the questions and finding the data. But then you want to sketch. Here is just a page out of my sketchbook. I just use very quick thumbnails, and then I also annotate. So I’m having a conversation with myself, so when I look back to it, I can see my thoughts and what I’m thinking.
Then once we finish sketching, and this is for the Intel CES media wall we did, which is very similar to the Samsung one. Once we finish sketching, we then do wire frames. We do a lot of wire frames and look at every single possibility from every different angle that we can do to visualize this kind of stuff. And just to give you an example, for this particular project, we went through about 20 of 30 different ideas at the initial stage. We have been known to do up to 150, which is just immense and very hard work. So, it’s just getting all those ideas out of your head down and looking at all different ways and possibilities of how you tell the story and how that translates to the viewer or the user. Then, we kind of work it up, and like you see here, this looks very much like Intel, obviously because it’s their brand.
Then the other stage that we find is we go through the sketching and the wire framing. But once you get to design, things might change in terms of how color relationships work and how the user looks at a page with different colors and a different flow. So what we tend to do also is we’re not that precious with our wire framing and our initial ideas. Yes, ask are stories. We ask questions. We constantly ask questions for every single stage, but one of the things you want to do is once you go from the wire frame stage to the design stage is you’re not too precious with your wire frames. You have to be prepared to break stuff and to evolve it.
One of the things I think sums this up very simply is Brendan Dawes, who’s a designer out of Manchester. If you don’t know him, I recommend you look him up, because he’s pretty awesome. But he has this design thought of boil, simmer, and reduce. I think that’s the core of what we do in terms of social media visualization in that we have a boiling of ideas, we have numerous ideas. We simmer that down into ones that we think work, and then we reduce it to the core of what the story is. Then once we’ve done that, we build it and we launch it. Here is just an example of a video for The Economist. So it’s just visualizing data through the medium of video. Again, I’m sure some of you guys have probably seen it already, but we build it and we launch it.
The other key thing that we do is we monitor and adapt. So once we’ve put our baby out there in the world, we don’t just leave it to fend for itself. We help it to grow, help it to walk. And this is where I like to bring in the example of the Samsung Media Hub that we’re doing here for Samsung at South By. I’m sure some of you guys have seen the wall that is in the foyer, and this is the accompanying website. So what we’re doing here is we’re curating the best content. That is the story that we’re trying to tell. We’re not trying to tell the story of all of the social media content that’s been generated by South By. We’re trying to tell a story of the best. So we’re actually going in and hand curating it, and we’re selecting it for you guys to find and discover.
Also, if you think we’ve missed something, you can also contribute to that website. But I’d just like to end, again, with the “I Voted” thing which Eric touched on in the site. It gives a very clear example of how we go about social media visualization in that you become the visual storyteller, and that we tell the story of where the check-ins are happening, the different locations. We’ve told the story of the times that they’re telling, and we’ve told the number and the gender. That is the core. That’s what we’ve reduced the entire thing to. All the data sets that we have, we’ve reduced it down to the core of that story. That’s what we want to tell, and that’s the process that we go through in trying to map the conversation.
Thank you very much, and I look forward to your questions.
All right. So, I mentioned earlier, my background is in science, and my interest here, what gets me excited is what happens when you start to visualize these immense data sets. Eric mentioned this data exhaust. When you take this data exhaust and you start creating interfaces to interact with them, what can we learn? What’s the so what here? Beyond creating pretty pictures, and in many cases they are, how can we start to really understand things about various systems in society, complex systems, like cities, like markets, like human behavior, in a way that we weren’t able to before?
This is an incredibly powerful amount of data, an incredibly powerful frequency of data. And now we have, of course, the tools of data visualization. So what I want to share with you are some new insights that are starting to be gleaned in these three areas of cities, society, and human behavior. By taking these amazing projects, using data visualization, using data sets from a variety of different social media tools, and share with you how we’re starting to understand different things and the tools we’re starting to build to understand further things.
So, everything I’m going to share with you today is up on Visualizing.org, which is a new platform that we launched a few months ago together with GE. This is really about creating an open community for data visualization, connecting the open data community with the data vis community and allowing designers to upload and share their work under a CC non-commercial license. So everything that we share with your today is up on Visualizing. You can actually interact with the workup on Visualizing.
So we’re going to jump into cities and share with you our first project. So top left, this is work that was done at MIT at the SENSEable City’s Lab. In most cases, a lot of the work here starts with mobile and using mobile data, because we have more history with mobile data and we have more of it for now. And then starting to develop some insights that are now being translated into further projects using the social media data. So this is a project that is looking at the globalization of cities done for the Museum of Modern Art for an exhibit called “Design of the Elastic Mind”. The aim was to look at calls in and out of New York over a period of time and actually understand to what extent New York was connected to the rest of the world.
What they found is that the patterns that New York displayed were actually representative and could actually be viewed in other cities around the world. So, in fact, when you plot the relationship between the city’s population and the degree to which it’s connected through phone calls with New York, it actually exhibits not a linear relationship, that as cities get bigger their number of calls to New York just get bigger proportionally, it actually demonstrated a superlinear relationship at a certain point of critical mass. And it starts to demonstrate these emergent phenomena. It starts to demonstrate that cities are in fact complex systems.
This dovetailed really nicely with some work that was being done at the Santa Fe Institute at the time about cities as complex systems, and it’s changing the way cities are starting, mayors are starting to now think about the innovation, culture, energy, infrastructures, and other sort of complex issues related to cities. This came about through a data visualization project.
The project over to the right in black in the background, this is a project called Invisible Cities. It’s using the language of architecture and topology to take data from Twitter and Flickr and geotag it and look at . . . really, the aim here was to look at the collective memory of a city. So, what you start to see is basically memory links to places in a city where the peaks and valleys in these visualizations represent data intensity. Where is the data coming from within the city? Then what’s happening is that nodes are being connected to sort of share a path of how stories are unfolding in a city. So we might look at a particular event that unfolds, a particular happening, an accident, something like that. Take all this data, map it into sort of a virtual space, create a parallel city, and start to actually experiment on that and give us a new way of interacting with a city and seeing how a memory is formed for a city.
The project on the bottom is a project called “Tourists versus Locals”. So this is really interesting, using six years of Flickr data to map really the dispersion of people across a city. So by looking at “Tourists versus Locals,” in blue you can see locals, in red tourists, yellow can’t really define. This is based on how many times a user was uploading work photos to Flickr over the course of a month. If they were there for longer than a month, then they were a local. Less than a month, they were tourists. As you start to look at these maps across 130 cities, it gives you a potentially new tool for urban planning and thinking about where locals are distributed and where tourists are distributed across the city.
Then finally on the city’s fronts, far right, is a project called “Crowdsource City.” This is a student project, and it’s really about getting students out into the community using social media tools and using mobile devices to help organizations be able to use that information to better plan social infrastructure. So this is actually a project looking at healthy versus unhealthy food choices in a particular area of a city so that they could perhaps advise a non-profit organization about how to guide locals within the city towards healthier options. So using community data, using local data to provide feedback back into that community using data visualization.
So moving on to society at large, the bottom visualization, bottom left, is a project I’m sure many of you have seen called “Truthy” from the University of Indiana. So this is a system to analyze and visualize thousands of tweets an hour to identify new emerging bursts around memes. The meme here is Tea Party. What they’re trying to study is how data or misinformation is being spread, and be able to actually characterize through the connections between tweets whether in fact something is misinformation, whether rumors are being circulated, or whether it’s actual, legitimate information.
What we’re starting to find is there may be relevance to these kinds of tools after a natural disaster. So it might be interesting to look at this system now in the wake of Japan to see how misinformation spreads and how rumors are spread in the aftermath of a natural disaster to determine whether things are accurate or misinformation. They’re able to identify how certain political organizations are engineering tweets on Twitter to spread misinformation. So we’re starting to understand how misinformation spreads through the Twitter universe.
Two projects here are interesting, unrelated. Top right and top left. It’s looking at Twitter over the course of Egypt and Iran revolutions, and it’s by a designer called Kovas Boguta. What you see in the case of the leftmost visualization, of looking at Twitter, is that really there was a rise of an information elite. We could start to see that five tweets actually started to drive the entire conversation, and the interest in Iran outpaced the ability for a social structure to actually be developed. Whereas, in the case of Egypt, a couple years later, maybe a more evolved Twitter adoption, we saw this connection between Arabic users of Twitter and English users of Twitter, and it suggested that there was a more democratic discussion, dialogue forming around Egypt than there was around Iran.
Bottom right is looking at mapping of the Persian blogosphere by a team at Berkman at Harvard. What they started to uncover by mapping and visualizing the entire Persian blogosphere was that we had the sense that there was really one voice in the Persian blogosphere, and it was about dissent. But in fact when you map the entire richness of the blogosphere, which is arguably maybe the fourth largest in the world, you see that there’s more than just dissent. It’s a much richer blogosphere and a much richer discussion. They created a tool to be able to go into those discussions, kind of a portal into that blogosphere.
There’s the famous visualization looking at the full 500 million relationships across Facebook, which started to show us a little bit of where the distribution of the Facebook population is, which is interesting. More interesting now is the derivative works that are being produced based on that alpha visualization. This is work that overlays the entire Facebook relationship map on NASA’s illumination study. So this is basically looking at illumination all over the world as a proxy for developments. So we can look at nations around the world and see how they’re developing by looking at illumination through satellite imagery. So the purple is where Facebook is currently represented. The white dots are where there is illumination and not yet Facebook. So this might be an interesting new tool to look at proxies for development around the world.
Finally on this one, topmost visualization is a tool by Johan Bollen at Indiana, and they just uncovered some interesting trends. If you visualize Twitter data and look for calmness using Google’s mood index, you can actually potentially forecast the Dow Jones Industrial Average six days ahead using Twitter data. It’s pretty remarkable stuff.
Then finally on the behavioral front, the visualization in the bottom left is work from the Barabasi Lab. So this is really interesting, looking at the mobility. Again starting with mobile because we’ve got more flexibility with mobile to begin with, this is looking at 50,000 people’s movements over three months. They were able to basically come to a theory that 93% of our movements are predictable. 93% of our movements are predictable. Regardless of whether we’re close to home or veering far away, we can build common theories looking at this data that predicts it to 93% accuracy. So what’s really cool about this, beyond the gee whiz factor, is that a lot of mass infrastructure systems, transportation, energy, are based on a notion of randomness, that we’re going to do things randomly at any given moment, when if fact, we may be able to predict things about our behavior and where we’re going to go in certain instances. This could have revolutionary impact on construction of cities and mass infrastructure systems in cities.
I’ll show you just two more here. You’ve seen, I’m sure, this visualization from David McCanliss looking at peak breakup times. Some interesting insights into when we see shifts in relationship status and what that could tell us about our own behavior over different courses of the year. The pink visualization is looking at predicting our individual interests using Twitter and Facebook, and the study here is about to what extent can this be predictable? To what extent can we use this to determine our individual interests? So that’s some interesting work being done now.
Then finally, two last projects. So the pulse of the nation, this Twitter mood map, the one where you see maps of the country, this is really interesting to look at happiness across the country at any given moment using Twitter data. What we can find by looking at those visualizations is West Coast is happier than East Coast on a general basis, that we’re happier earliest in the morning and late at night. So, what I find interesting about that is that it’s predictable. We know that. We know that people in San Francisco are happier than people in New York. We know that we’re happier at night when we come home than we are in the afternoon. So what? It seems to me this is interesting, because we’re sort of calibrating a new instrument. It’s as if we’ve now calibrated it to zero. We know that it works. We know that we’re able to detect things that we know to be true, that we’ve been able to measure through other devices. Now we can start testing really cool things with that.
The final project I want to share is one of my favorites. This is Notabilia, that many of you have seen. It was released by Moritz Stefaner on the anniversary of Wikipedia. So this is looking at the nature and shape of collective decision making. So every time that there’s a keep or delete vote on Wikipedia, it adds another element to this visualization. What you can start to detect is how difficult it is to actually have something removed or changed ultimately on Wikipedia and how we reach consensus. So this might be really interesting to think about negotiating, how we think about reaching consensus in business or in geopolitics.
What’s most powerful about this, and I’ll leave it here, is that to me this is a case where the choice of the designer aesthetically actually raises the bar entirely on this project. So there are three shapes when you look into this project. You can go onto Visualizing and look at the project. There are basically three shapes. There’s one where there’s a sort of veering of the trajectory in one direction. So clearly, there’s sort of consensus to keep it or delete it. There’s another where it’s kind of straight, because one guy says keep, one guy says delete, one guy says keep, one guy says delete. So it basically stays centered. And then when you see this sort of circular pattern, there’s just immediate consensus. It just keeps turning on and on and on itself until it reaches consensus.
So I think this is a case where the aesthetic choice actually gives us a completely new insight into the data. Beyond just visualizing it, the aesthetic choices are giving us another layer of understanding. All the visualizations, I didn’t want to run through them all. You can get them on Visualizing.org/SouthBySouthwest and interact with them. Thanks.
All right. Thanks for the presentations so far. I am here to ask some harder questions, and I would like to start with one question that I have for, probably best suited for Robin and for Adam. Now that we have the visualization, we’ve gathered the data, we’ve visualized it, we’ve published it for the public, how can we measure the impact of a visualization? Feel free to approach these questions from a more practical point of view, like what tools do you use, or from a more theoretical point of view, like how do you measure acceptance of a visualization or of a representation of a fact that you just visualized in a graphic.
Yeah, okay, I’ll start. I think the difference that I see between what we do in terms of visualizing data and the examples Adam has presented is that we tend to deal more in real-time data, whereas the examples that Adam presented is more analytical. They’re analyzing large data chunks. What we tend to do is we tend to deal real-time, real check-ins, real tweets, and then present that. So in response to your question, it’s a little tricky in terms of us to measure how that is impacted. What we tend to try and do is to focus on the actual value that is added to a visualization and how a user gains insights and value from using that in real-time that you might not get from say a normal Twitter stream with all its noise.
So in terms of the way we measure the impact of a visualization is how much value the user actually takes away from it, in terms of if it’s interactive through the website, does it add value to that experience of that event, and then what new insights we can offer the user that they might already miss, which is slightly different in terms of the examples of Adam, where it’s more analytical. It’s more analyzing past events. It’s analyzing a data set over a year or a period, as indicated in Wikipedia over 10 years. So, we just measure it in what is the value we’re adding to that data and that real-time data.
I would just add I think that it depends on the intent of the designer or the intent of the developer in all cases. So in some cases, the intent of the designer is to elucidate something, is to make something simplified. Data visualization is an incredibly powerful tool to make sense of complex issues. We run challenges on visualizing every month with NGOs, international organizations. We have one now with Circle of Blue around water using open data. And when our juries judge the visualizations, most points go to understanding. Does this visualization help you make sense of urban water or whatever the issue might be.
So I think understanding is a key criteria of the effectiveness. But in other cases, it might be to what extent does this open up new questions. A lot of the tools that I shared are not designed to be finite answers. They’re designed to be bases for you to ask questions now of data and involve data earlier into the thought process, business planning process, policy planning process, whatever the case may be. These allow you to sort of explore data sets. A final example, we just did a project for a gathering, a big brainstorm, and the goal was to use visualization to help find unexpected connections, sort of connect dots that you might have not thought to be connected.
So in some cases, the effectiveness of a visualization can be best determined through what sort of unexpected connections it generates because it’s a new language. We shouldn’t just look at this as replacing something. This is a completely new form of communication of understanding. So, what new things did it spawn?
All right, thanks. Like some critics just call data visualizations beautiful lies, as lying with statistics is very easy. So I have one question, and maybe Eric, you’re like the best one to jump in here. How can we prove the integrity of the data, and what tactics do you have here?
So for us, it’s only as good as the underlying system. You spoke to data reliability in terms of where the source is. I think you have to believe in the underlying data set, and for us, making that available to anyone is a way of crosschecking that information. So in the case of any of our projects, someone could replicate any of these things that were done through our API. It just so happens we worked with specific partners to visualize what we’ve done with the API. But I think allowing anyone to crosscheck that information and replicate it again and check those results is really the best way to be honest, because if you’re not, you’ll get caught.
All right. Well, I think it’s a good point to start with the Q&A. So if you have any questions, please step up to the mic, and we are all happy to answer whatever question you might have.
Is this on? Okay. This is closely related, I think, the mass data visualization with the crowd sourcing concept, and if we put those two things together . . . I guess my question is, are there any studies that show how people react when they, as a group, as a large group see directly the results of their individual decisions in some kind of visualized manner that brings it to a point for them? And does that change their behavior? Is anybody aware of any?
I can share one. Oh, go ahead.
No, you go ahead.
It turns out we just wrote about one, published one, uploaded one on Visualizing. So it’s a project in Latin America where communities are coming together in areas that you wouldn’t think, are using data visualization or using sophisticated tools for data discovery and analysis. But they’re coming together in a simplified form of data visualization and creating a map in a sort of community center allowing people from any part of a local community to come and actually affix data in a very physical way.
So it might be that I saw a particular species here, or the water in this part of the village is X, Y, and Z, or the air quality here is something. They’re creating kind of community maps of their community from data that they’re crowd sourcing. It’s sort of crowd visualization we called it. And then those maps are driving policy changes in these local communities. It’s an amazing organization that’s doing this. It’s up on Visualizing. They’re doing some great work. So that’s an example where we’re seeing it in a more physical way. So physical data visualization within interactive stuff, but an effective tool.
Yeah, I can think of another one. It’s an actual project right here in America whereby they’ve been mapping the health information of this small town for about 40 years. And what they’ve discovered is just a relationship between individuals and their health. So just as an example, they’ve discovered that if you’re slightly obese and if your friends are slightly obese, then you’re more likely to be obese, whereas previously they thought it was more linked to your family. So they’re looking at the social relationships between you and your health. And they’ve mapped this for like 40 years. So they’ve recently just visualized that in many different ways to show the small town and offer insights into how your social activity relates to your health.
Let me follow up briefly on a comment that you made in the same vein. If somebody were to put out some kind of visualization that was based on false data under the premise that in fact it represented the opinion of a certain mass of people, you felt like it would be corrected, like they would be caught.
There have been numerous examples, high profile examples of people being caught. Personally it’s happened to me, which is very embarrassing, I will admit. But in the community, I believe that if something is false, they’re pretty quick to call it out.
But if it were a matter of the opinion of the people, in other words, you were basically putting out a poll that was blatantly false.
Yeah. I think to Eric’s point about data, if the underlying data set has integrity and it’s true, then I wouldn’t have thought you could really call someone out in the way they visualize it. Obviously, you can look at the way they’re showing the relationships and the way they’re visualizing it, obviously. But if the data is not false, then they have the opportunity to visualize it the way they choose. I don’t know if you have any examples of people using false data in terms of Foursquare check-ins and falsifying those that you can . . .
No, I don’t have any specific examples, but I think the integrity of the data and the things you put forth, if there are cracks in that model, it undermines the reputation of the data itself and the underlying company. So I think the onus is on us to always be open and honest, and that is to be out there and have someone else look at the data. And if that can be done, transparency is the answer to get around is this what it says it is.
I guess my point is if the visualization has power to then influence people’s real opinion, then it’s possible that the false opinion that was put out could in fact become the real opinion by virtue of that feedback loop. I guess that’s my . . .
Yeah, I think that would probably fall, as Eric said, onto the integrity of the individual who’s creating.
Two quick questions. One, maybe you could just go through and talk about the tools you like to use. Are you using homegrown tools? Are you using libraries? That’s kind of one question. But the other one really is sort of like a tail wagging the dog threshold sort of question, because what’s interesting is you’re sort of saying, figure out your story. But a lot of times, you don’t want to go into this with a preconceived idea, right? So that’s a real problem. And the other problem which is what I came across . . . I did data visualization for EMI music, mapping a lot of different social and media consumption against sales data.
The problem we had was trying to coalesce very large numbers with very small numbers and looking at changes and rates of changes overlapped and giving people very misleading ideas about well, you know, our followers on Twitter are going through the roof. Well, everyone’s numbers are going through the roof, because the adoption for Twitter is just rapid across the board. So you have to be very careful about how you balance and tweak the presentation of the data so things don’t look too vertical or too horizontal. This gets to what he was saying about you’re seeding ideas merely in the choices you make about what you show. And this is about people who know what they’re doing versus people who don’t really know.
Why would you visualize? You’re visualizing to bring people into the conversation who may not understand the statistics. But if you tweak it visually the wrong way, you’ve now completely caused more chaos than to begin with. And the second question was very long, the first one was very short, but hopefully you can address them both.
I think those are all.
All right. Now, to the second question, does anybody have something specific directly to say? It was about you as a practitioner, you definitely have the capabilities to skew a little bit the visualization to really make a point here. Based on the data set that you use and based on the metrics that you apply to the visualization, you definitely can underline or skew the visualization to make a certain point, and this definitely is a dangerous field. So how can we prevent that?
I’ll offer something. One of the goals that we had in creating Visualizing was because all the work that’s being uploaded ñ designers are agreeing to make it available under a CC license allowing for derivative works to be produced ñ is for derivative works to be produced. It’s to force reproducibility into the system, to force peer review, to force reproducibility. So I think that’s one check and balance that any new system like this needs to have. The other is frankly for designers to recognize that this is an incredibly powerful new tool. This is like saying, “How do we know that advertising about products is authentic and companies are being truthful about the advertising?”
There’s a little bit of a buyer beware. There’s a little bit of a responsibility to historically the ad agency, now the designers/developers who are attacking this. I’m not sure that there’s necessarily an absolute that you can instill here. What we’re trying to contribute to this discussion is some sense of peer review so that if all the visualizations, at least in public domain, so this is excluding commercial work, can be up for discussion, up for commenting, up for derivative works. The source code can be deposited in various places, so we’re working on new tools for that. Then it just creates a little bit of an open culture around data visualization and lets the community sort of start to respond and create derivative works. So that’s what we’re trying to do is sort of solve it a little bit with community and with peer review.
Yeah, and just to add to that, I think the community in terms of just checking the integrity of the work is all important. I think it’s very dangerous at this present time, because you have data visualization exploding, particularly in the media, which I think it’s incredibly dangerous in terms of just the general public outside of the community not understanding data vis and how that can drive a story. They see a pretty graphic, yes, and they think, oh, that must be correct, because that is presented to them. So there is a tipping point. It’s a very dangerous point at the moment, I think, whereby you have that, and it falls to the responsibility and the integrity of the person producing the graphic.
At the moment, I’m seeing most of it in media, and it falls to the integrity of that media organization to be responsible for when they put that out. And the community in general has a great kind of speed and spirit in kind of calling that out. But then that needs to be translated across to the general public, which I don’t think there is a solution at present. But it just needs to fall to the creator of that graphic to have the integrity not to skew it. And that, I don’t think is solvable at this particular moment.
It seems like a lot of the state of visualization is useful for data that’s in the aggregate. So if you use all Twitter data or all Foursquare check-ins. But then on a smaller scale, like if you think about Foursquare badges, that’s sort of a form of visualization in and of itself. And I know like Nicholas Felton does work that’s very personal visualization. There’s Daytum.com, which enables anybody to visualize any aspects of their lives. So I’m wondering maybe, can you speak a little bit about the impact of individual visualization, and if that’s going to become more of a trend and what information individuals can get on their own behavior, and what information other people can glean from the individual acts and representations of their own day to day lives?
Sure, maybe I can start. I think some of the work of the Felton Report has certainly inspired a lot of the personal history that you get when you use Foursquare. And I think that speaks to a product we launched down here called Explore. We finally have something powerful to do with everything, with all the check-ins that you’ve contributed to the service. So I think over time, you’re going to start to see, instead of a history of what you been doing, a visualization of that history, and I think that’s pretty important.
I think in terms of presenting that publicly, there’s a relationship you have with a person who’s on a service, and you don’t want to breach that privacy. So it’s about maybe consuming that information and their own personal visualization on the site within a login environment. I think you see a lot of aggregate information used because it’s presented as fact in terms of how things are doing overall. But when you get down to the individual level, that’s on a one-to-one basis with a person, and it’s very hard to sometimes promote that in not a generic way, unless someone agrees to do so.
Hello. First of all, thank you guys very much. This has been incredibly incredible. I’m going to try to get this into . . . I think it’s easier if I just ask a few real basic aspects of it, and then I’ll give the actual question. I’ll try to make it useful for everyone else, even though it’s very self-serving. Systems operate on core functional values. In the visualized data sets, those are . . . can a visualized data set be broken into basic human traits in a general context for use of comparative values in the abstract form before something actually reaches a point of physical testing in terms of observing humans while they’re doing something?
I’m happy to… I think this is exactly where this is going. I think this is the power, is that when you have all of this data and you can start testing hypotheses, some of these are really obvious. So, what we know now from merging social media and data visualization is a little bit. This is going to climb rather significantly over the next couple of years. But now we know that these tools are viable, and there’s a whole of domain of social sciences now that is emerging. It’s kind of reinvigorating the social sciences and giving a new, empirical tool for social sciences to test hypotheses and use these parallel systems, parallel cities that you saw, all sorts of things like that.
The market example, looking at anticipating what might happen to the Dow Jones Industrial Average. These kinds of examples, that’s really powerful stuff. So I think that’s where we’re going, and that’s where social sciences is going to take it, really over the next year or two.
I think also, to add to that, it’s an important point of the willingness of people to share information, and there’s a growing trend of people just being more willing to share that information. So people check in, people tweet, and they share it. I think that will speak to some of the social science stuff. The data is now being created out of that cultural change that people just want to share it. They don’t want to keep it private anymore.
I would just introduce what Eric said is data exhaust. It’s kind of this notion that it’s a little bit unintentional. It’s kind of the more intentional it is, maybe the less interesting it is. It’s almost better that it’s anonymous, unintentional exhaust data in terms of looking at broader social footprints and things like this than intentional acts. So I think looking at it as that metaphor of exhaust is a good one to think about your question.
Thank you. The reason I’m asking is for a redesign paradigm that gets down into changing the precepts on the values that we used to create the original system, what would your advice be for somebody that’s already in that direction but wants to speed up the amount of time it would take before people will actually listen to information that, in their minds, isn’t probable or possible because they can’t think of it themselves?
I might let the scientist answer that one.
The scientist can’t answer that one either. How long before somebody will change their behavior based on that, is that what you’re asking?
I was asking a personal question. What direction would you recommend I go in, in terms of being able to integrate and start to head in this field, because it’s so new, for the purpose of the redesigned advertising paradigm.
Yeah. So I think on a very practical level, there are varying degrees of sophistication in data visualization now, and a lot of it is designed to really just be really about simplifying something complex. Infographics are part of data visualization in this broad space. Data visualization, there’s some discussion as to what constitutes data visualization. Maybe it is just the immensity of the data, the complexity of the data and trying to render that more simple.
But information visualization, information architecture, these are things that precede us, precede our companies, precede what we do, and I think that there’s some first principles that live in the great information architects and information visualization people that I would recommend looking at. I think we all derive inspiration from those original texts, those Tuftes and Richard Wurmans and thing like that. We probably each have our favorites, but there’s a lot in just basic communication of complex information that doesn’t have to go to this level of complexity that is universally valuable for any design project, I think, that involves information.
Thank you very much.
Okay. So it’s time for us to wrap up. I want to thank my panelists, and I want to thank the audience. It was a blast, and have a good time.
Without a doubt, I learned a lot during this panel. From my co-panelists as well as from the questions from the audience. In regard of the topic at hand, I think our content may has drifted a bit away from the “Mapping the World’s Conversation” part of the title and more towards the role of visualization in the world’s conversation. Personally I believe that this also is the more important question to investigate.
I would be very happy to hear your feedback for both, the discussed content as well as the presentation. What topics would you like to hear more about at such an event?