Amazon recently released its Good Omens mini-series, based off of the book co-written by Neil Gaiman and Terry Pratchett. Concurrent with its release, I happened to be attending a course at the Digital Humanities Summer Institute on Stylometry with R. In a mini-project, I found a way to combine my love of fantasy literature with my bourgeoning skills in the programming language R. In the course we were learning how to use statistics to analyze style and attribute authorship. I decided to see if I could figure out which sections of Good Omens were written by Gaiman and which by Pratchett.
Gaiman has been asked this question before, and he describes nine weeks of feverish, glorious collaboration filled with writing, rewriting, swapping, and editing of sections. He concludes “People still ask us who wrote what, and, mostly, we've forgotten.” Well, stylometry can help!
Using a training set of texts by Pratchett and Gaiman, I used the R package Stylo to analyze Good Omens. (Specifically rolling nsc classification with 50 features and 5000 words per slice). The figure below shows my results. The words of the novel progress along the x axis. The pattern below the horizontal white line represents the signal from the author to whom the program attributed the majority of the authorship (Gaiman is in red and Pratchett is in green). The top, fainter pattern roughly shows how much signal there is from the other author. Together they add up to 100% in each section of the text.
I tested it against Moving Pictures by Pratchett and Coraline by Gaiman, which the algorithm indeed clasified as exclusively by Pratchett and Gaiman respectively.
I then divided the text into 5000 word chunks and re-read the resulting sections to figure out what was happening in each section. Here is a version of my visualization with a rough description of what is going on in each section. Enjoy! And check out both the miniseries and the book (although if you are reading this post about stylometry it’s probably because you are already a fan of Good Omens).
I had so much fun revisiting this amazing book.