We were asked by the National Geographic Channel to visualize the history of the National Geographic Bee Contest. They provided us with a dataset containing detailed information about the finals ranging from 1989 to 2011. Our task was to make this data explorable and entertaining for their online readership.
The Geo Bee Story
Every year thousands of schools in the United States participate in the National Geographic Bee using materials prepared by the National Geographic Society. The contest is designed to encourage teachers to include geography in their classrooms, spark student interest in the subject, and increase public awareness about geography. Schools with students in grades four through eight are eligible for this entertaining and challenging test of geographic knowledge.
Exploring the Data
At the start, we analyzed the data to get a feel for the content we had available. We had to decide which data we would use and which combinations would add the most value for the user.
Unfortunately the data wasn’t completely consistent (more on that later), so we looked which facts were available for the whole time period:
Based on this, we started to build up the story we wanted to tell.
Telling a Story
Defining and knowing the story you want to tell is one of the most important steps in making a successful visualization. Thus, instead of trying to visualize every last detail, we narrowed the focus and formulated our story as a question:
“How did individual U.S. states perform compared to each other over the last 23 years?”
It’s one question, but it has three aspects: geography, rank, and time. We didn’t want to tell a linear story, but let the user explore and discover.
To find the appropriate form to visualize the data, we sketched different models and tried to bring them in a suitable form that is easy to understand and invites the user to explore in a playful way.
We decided to go with a triad of visualization methods, each of which answers one aspect of our main question.
Each part of the visualization (rank chart, map, timeline) is interactive, and if a selection is made in one, it affects the other two. The user can dive in at any point she’s interested in.
As a bonus, to make the topic of the contest more tangible and challenge the user in a fun way, we included each year’s final question as a small quiz. Clicking on the question reveals the answer (hey, no cheating!).
The final visualization consists of four views:
All Years View: This is the starting view. It shows a summary of all states in all years.
State Focus: When the user selects a state, its position in the overall ranking gets highlighted, and its rankings in each year get displayed color-coded on the timeline.
Year Focus: When the user selects a year, the top ten states get highlighted and color-coded according to rank on both the map and the ranking. The final question of the year appears.
Answer View: The answers are separated from the visualization and appear only if the user wants to see them.
Shaping and Visualizing the Data
The original data we got wasn’t a database or even an Excel file, but PDFs and Word documents which looked mostly like this:
After fooling around with OCR, we quickly realized it wouldn’t work, and decided that we’d have to transcribe the files manually. We prepared an Excel template and outsourced this task to a relative with great typing skills. After a few days, the first transcriptions came in, and we could have a closer look at the data.
With the help of Google Refine we discovered and cleaned up the typing errors and data inconsistencies (e.g. differing name spellings). As we intended to use the state names for the visualization, it was important that they were consistent. Not only students from the 50 states can participate in the contest, but also from US territories and Department of Defense schools in foreign countries. Using the USPS list of state and territory abbreviations as reference, we brought the state names into a form which we could later parse reliably.
At this point, we had a set of cleaned up data, exported from Refine in JSON format. But before we could actually visualize it, we had to perform a few additional steps (for which we wrote a Ruby script).
- All excess data which we didn’t need for the visualization (street addresses, grades, etc.) had to be removed to save space.
- The data had to be restructured, so it could be understood by Flare, the visualization library we used. For this, we created two new data sets, one grouped by state, the other by year.
- Since we wanted to create an overall ranking, we had to calculate a score for each state, based on its performance across all years. Here we used a scoring system similar to the one used in Formula 1 (1st = 10pts, 2nd = 6pts, 3rd = 4pts, 7–10 = 2pts, finalist = 1pt).
- At the same time we generated a few statistics for the data and ran some checks to make sure there weren’t any loose ends.
Also, since we used Flex, we wanted to avoid any woes with loading data from an external file. So we didn’t export the data as JSON, but directly generated an ActionScript class with data objects inside. In the end, we’d get a single, independent SWF file. (Yes, this could also have been solved with Flex embedding, but it was more straightforward like this.)
We used Flex and the Flare visualization library to build the visualization. The two major points that influenced this decision (instead of building something SVG- or Canvas-based) were browser compatibility and ease of deployment. We didn’t know in what environment the visualization was going to be placed, so going for a single SWF with no dependencies dramatically minimized the conflict potential there.
Describing the coding of the visualization is beyond the scope of this article, and would probably be better placed in a tutorial (if anyone’s interested in Flash tutorials these days).
Discussion & Outlook
Our main goal for this visualization was to come up with an interactive data graphic, which gives the user some essential and basic information about the Geo Bee contest over the past 23 years in an attractive way.
We created a visual landscape which allows the user to focus on different aspects. To achieve this, we looked for different patterns in the data to make accessible and understandable. The user can switch between an overall state ranking, a state map, and a timeline.
One of our favorite features is the addition of the final questions and answers. They enrich the visualization with more content and give the user an idea about the level of difficulty of this contest.
The visualization we created can stand by itself because it only tells one story focused on the ranking of a state in comparison to the others and how it performed since the beginning of the contest.
It could be interesting to extend the visualization with more parameters and explore the dataset from different perspectives. Another story could tell more about the individual contest participants, what a student’s performance during the contest was, or what she has done after the competition. The visualization could also picture the participant’s career and show how Geo Bee can change your life. This would be a more emotional and personal, but whole different story.
Don’t forget to check out the interactive visualization!
This project was realized by Interactive Things. As publisher of Datavisualization.ch we regularly give you insights into the development processes of our own work.