One aspect that the release of the Google Books Ngram Viewer last week shows really well is the power of visualization: instead of offering a huge but abstract data set like back in 2006, Google created a simple visualization tool that shows the data and makes it easily queryable. It’s not as visually appealing as what people like Chris Harrison have done with similar data, but it doesn’t have to be! The purpose of this tool is to give first insights and spawn ideas, which can then lead to a deeper analysis.
What I find most exciting about this project, is that Google enables everyone (no programming skills necessary) to ask questions and dig into a century old corpus of accumulated wisdom in over 5 million books in 6 languages.
While playing with the Ngram Viewer and looking through other peoples’ queries (click on the charts to go to the source), I noticed that there are different kinds of questions people tend to ask, so I came up with this incomplete and unscientific categorization of what the Ngram Viewer is, that I’d like to put up for discussion.
It’s About Comparing Things
A very simple but powerful use case of the Ngram Viewer is to compare ideas, products, concepts, etc. over time. People like to think in comparisons like “good” and “bad”, so this is an ideal entry point for people who don’t quite know what to do with this tool. As a case in point, I wanted to look at how the pie chart stacks up against other visualization methods, and made a first observation: these charts are always opinionated, you can (have to) leave words out, forget them, or spell them differently than others.
Another comparison I wanted to make was about what the development of communication media looks like over the years. Here, I noticed a difficulty: The Ngram Viewer is case-sensitive, so be careful how you spell “Internet”, as there will be fewer results when written in lower-case.
It’s About Patterns
Many people discover interesting patterns, like the occurrence of year numbers. Seems logical, when you see it, but did you think of this before?
It’s About Correlations
If you suspect, that one thing could have an influence on another, just go to the website,try out some terms, and see, whether they occur in literature during the same time periods. This, of course, is not a definite answer, but it’s a good start to investigate.
It’s About Phrases
The term “n-gram” describes words (or characters) that occur in a specific sequence. The Google data is available for n-grams of up to 5 words, which means that it is possible to not only search for single words, but for phrases and sayings.
It’s About Language
Because the data repository goes back to the 17th century, this tool can give us an interesting insight into the development of languages, like in the visualization below, that shows how the medial s (ſ) was superseded by the “normal” s. When looking for insights using this tool, always be aware that words may have been written differently, centuries ago, so they may not show up, if you don’t know what to look for.
It’s About History
Books reflect the history of the world, so I queried the Ngram Viewer for “guerre” (which is French for “war”), a (sadly) omnipresent event of human history. I did the query in French, because a lot of historic wars happened there, and it shows indeed: the French Revolution in 1789–1799, the Napoleonic Wars (1792–1815), the Franco-Prussian War (1870), and then, of course, the two World Wars. If you do the same query in American English, you’ll also notice a strong bump in the 1970s, the Vietnam War, which didn’t have the same impact on France as it did on the USA.
I also made a query for “baïonette” (bayonet), a tool of war, and indeed, it correlates with the wars, and we also see, when it became available, and that it’s less used today (I guess that it still shows up because it’s written about in history books).
This shows another interesting use case for the Ngram Viewer: let a teacher ask her students “what do you see?” They’ll (hopefully) know about the two World Wars, but then they’ll have to go and do some research about what those earlier spikes might mean.
It’s About Society
A last example that I want to go into, is one, that isn’t possible with the current version of the Ngram Viewer: the comparison of societal change within different language areas. I supposed, that “racism” would have had different impacts in different regions of the world, the USA specifically. And indeed, when we superimpose queries in American English, British English, German and French using Photoshop (be sure to adjust the percentage scales correctly), we can see the bump in the late Sixties in American, but not in British literature. Also interesting is the development in France, which is strangely linear, and different from all the others.
I hope you had as much fun and insights as I had while researching this article. I strongly believe, that by making a visual viewer available for this huge data set, Google did a lot of people a great service, who wouldn’t otherwise have a chance to dig into this data at all.
So, go try the tool yourself and post interesting queries in the comments or to the Ngrams Tumblelog. Also be sure to read Google’s introduction to the Ngram Viewer, which has some interesting background information. And don’t forget, that you can click the links at the bottom of the charts, which will take you to the sources in the huge repository of books, that Google has digitized.