Counting Words and Worlds — Elizabeth Callaway

This is the first of a series of two projects on environmental nonprofits and biodiversity discourse.

I am particularly interested in the kinds of stories nature nonprofits put forward about biodiversity. These organizations actually carry out biodiversity conservation programs, so the biodiversities they define are then getting inscribed upon the world. If they are conserving a biodiversity that is viewed as located only in certain areas, for example, then those areas are the areas whose biodiversity we will be left with. Or if their representations of what biodiversity is do not include local people but do include international ecotourists, then those are the kinds of landscapes and interactions that will be available in the future.

And since NGOs like this actually implement their goals out there in the world, their discourse has real affects on how the world looks.

The WWF is the largest nature nonprofit in the world with over 5 million members worldwide and over 1300 ongoing conservation projects.

Here are their goals and mission:

So, the very first mission is to conserve the world’s biodiversity, and they do this by 1) conserving land (places), and 2) conserving species (probably by conserving their habitat, usually land). But this begs the question: what “outstanding” places do you conserve to conserve biodiversity? In other words: where is the world’s biodiversity?

*Tiger, elephants, and coral reef = biodiversity*

In this screenshot from the homepage of the WWF, you can see the dropdown menu with featured places they work. This menu highlights certain places out of the many regions in which they pursue conservation. The Amazon, Borneo and Sumatra, the Congo Basin, the Coral Triangle, and the Galapagos are all tropical locations. The Arctic, Eastern Himalayas, and the Northern Great Plains are the only non-tropical places (though parts of the base of the Himalayas are covered in sub-tropical forests).

But I wanted to examine a broader swath of WWF discourse and ask where the organization represents biodiversity as existing. What follows is the first half of a distance reading on the publications of the WWF. Here I look at over 3,000 publications the WWF put out over the past 20 years. I eventually plan to include all the publications from the top 4 environmental non-profits. (You can see the R code for the analysis that follows here.)

Starting with a simple word count (I know, I know, the most boring of distance reading techniques), we see that the tropical occurs more frequently than temperate or boreal in their literature.

And these distributions of words, are statistically significant (at least from the expected occurrences I set). I know we digital humanists are wont to be asked about statistical significance, even though I don’t think that scientific standards of significance apply to most of the work we do. I'm with Stephen Ramsay--more invested in asking "is it interesting?" "does it invite new readings?" than "is it right?"

Nevertheless, in the following graph, dark blue is the observed occurrence of each zone term per 100 words, and light blue is the expected occurrence per 100 words. To calculate significance I did a chi squared test comparing the observed distribution of terms to the expected distribution of terms. A considerable element of interpretation comes into my choice of metric for expected occurrence. I chose word frequencies proportional to land surface area of each zone. (So, keeping the overall occurrence of all three terms combined the same, I calculated expected occurrence of each term based on the percentage of land surface area in each of the three zones.) This, I believe is a fair way to calculated expected values for an organization whose first goal is to preserve “the Earth’s most outstanding places.” The WWF works by conserving land, so it is reasonable to expect that they might conserve land proportionally to where the most land is. Species number per zone might be another metric by which to calculate expected value. Here, I didn’t use species number because the number of named species is highly influenced by where people choose to look for species, which is highly influenced by where the discourse I’m analyzing tells people to look for them!

Looking at the relative frequencies of continent names gives a slightly different picture:

Here is a similar graph for continent names. (I included the Arctic, though it is not a continent because I thought it might feature prominently in conservation discourse, considering how the Polar Bear has come to represent what we stand to lose in climate change). But here you see that Africa is mentioned the most, followed by Europe (a surprise to me), Asia, Australia, and the arctic.

Again, if you use land area to calculate expected occurrences, you can see that Africa, Europe, Australia, and the Arctic are mentioned more than expected. Asia, South America, North America, and Antarctica are mentioned less than would be expected. Thus our observed distribution, is once again, significantly different from the expected.

Let’s take a look a the real surprise of the bunch: Europe. Given the WWF’s homepage (reproduced at the top of this post), I would have expected Europe to be underrepresented, but instead it occurs significantly more often than expected based on land-area. What this word-count test has no way of indicating is how Europe and European diversity are conceptualized in this literature. Is European biodiversity figured in terms of tropical biodiversity, for example?

Here, the Danube is explicitly referred to as the “Amazon of Europe” which still defines the Amazon as the reference point when talking about biodiversity.

To look at this portrayal of Europe more closely, I had R print out a “keywords in context” of each time the word “Europe” occurred. Here’s a screenshot with the incidences of “Europe” underlined for quick scanning.

It seems from this that Europe might be getting double billed. Much of the time, especially while in the legend of a graph, North America is mentioned once, Africa once, etc. but Europe is broken up into “Western Europe” and “Central and Eastern Europe” which might be inflating its mentions relative to what I’m trying to test. If the WWF conceptualizes Europe as two distinct areas, both containing the word Europe, and each other continent as one area then my continent name test might be skewed. Additionally, there are many graphs where the legend mentions “Europe” twice—once as the bar label and once to clarify what parts of Europe are included or excluded. On the other hand, this skewing could be no mere artifact, instead an important aspect of the WWF’s conceptualization of the world. If Europe is treated as so important and special and heterogeneous that it simply can’t be lumped together like the rest of the continents, but has to be broken out into smaller regions, then this is an important worldview to critique.

More analysis is obviously needed. Word counts/frequencies are a simple entrance into the realm of distance reading, and if they don’t provide definitive answers to questions, they can at least point to interesting areas for further work. Next up in this project? A pilot topic model run on these same documents.