Chelsea Francis and Christopher Oscarson; Humanities, Classics, and Comparative Literature
While it is not difficult to isolate the first uses of the word “ecology,” it is far more elusive to trace the origins of the idea before the term became commonplace. In this project, Dr. Oscarson and I attempted to map the changing concept of nature in turn-of-the-century Sweden in literary and scientific sources to clarify and define this important paradigm shift, particularly through the works of Nobel prizing winning Swedish author Selma Lagerlöf (1858-1940). Lagerlöf is a popular figure in Swedish literary history, and this project proposes to study her in a new light. Although she has not historically been read for her environmental thought, there are key themes in her works that can be understood as ecological or interested in investigating interdependencies between organisms, identities, and environments. The advantage of “macroanalysis” to perform this reading of Lagerlöf’s work is that we would be able to consider her work as a whole and deduce trends, patterns, and topics that might be obscured in more traditional approaches.
Computer assisted text mining and topic modeling are part of what is now referred to as “macroanalysis” and are powerful tools for understanding large collections of texts in a new way. We implemented the open source MALLET package in the programming language R, following the standard established by Jockers in his seminal work Macroanalysis, to perform the statistical analysis. The program collects the full text of the corpus into batches called topics that are statistically correlated throughout the corpus. We then interpret the words of the topic to see what they have in common. The program can also show where in the corpus a topic is present, allowing a close reading in addition to the “distanced reading” enabled by these digital techniques. Significant preparation of the corpus is necessary before topic modelling can be performed, such as finding good source texts in digital form, creating a list of character names that the program is to ignore (to avoid the novel-specific topics that arise from including character names), and identifying the etymological stems of words so that, for example, multiple conjugations of the same verb or the same noun with different number are considered identical. After this preparation, the computer assists us in identifying clusters of topics in the corpus that we then analyze and statistically evaluate to determine whether they indicated larger patterns about Lagerlöf’s authorship that would be difficult to identify reading all of texts sequentially.
Several topics unique to Lagerlöf’s work have arisen from our foray into topic modeling. Although more work needs to be done to isolate ecological topics (removing characters from works in which animals play direct roles in the narrative, for example), significant topics such as “the eternal feminine” and human reaction to the hostility of nature have been identified. The results of our efforts have been visualized in the form of scatter plots (such as Figure 1). This figure shows the concentrations of words associated with the eternal feminine, each circle corresponding to a 1,000 word chunk of the corpus, with its height equal to the fraction of the words in that chunk that are associated with the eternal feminine, as identified by the topic modelling program. The blue lines show the separations between novels, showing that this topic is indeed one that is typical of Lagerlöf as a writer and not simply unique to a particular work. Sections of novels that are particularly concentrated with a certain topic can then be read closely with that interpretation in mind, and an exhaustive list of passages that touch on a particular subject is immediately available for further research.
This project has encountered numerous setback endemic to working on the cutting edge of a new and emerging field of inquiry. Most, have eventually been overcome and we now have produced a sufficiently large amount of data with enough number of topics that were coherent in order to be assigned topic names. One of the most valuable products of this project is the large body of data that has been produced from the topic modeling that can now be used to more accurately judge to what extent the methods of macroanalysis are useful in answering literary questions, such as the emergence of ecological thought in the work of Lagerlöf, and with the tools we’ve developed and experience we have gained from trial and error, Dr. Oscarson will now be able to expand the project to include additional texts and authors.
We made significant progress in reaching our goals, and our work has opened up new and important doors for future research. There are still additional technical processes that need to be explored which will refine topics and further our understanding of how an author reflects ideas and intellectual currents of her time. Already, there are seemingly “strong” topics – topics that frequently reappear regardless of the program’s parameters – such as the eternal feminine, the domestic home, and the relationship between humans and nature. This research will cast new light on how we understand the work of Lagerlöf particularly in regards to emerging field of environmental humanities and ecocriticism. Dr. Oscarson is currently using the results of our work in an article on ecological themes in early 20th century literature and will present on this research at upcoming conferences, such as that of the Society for the Advancement of Scandinavian Study (SASS). The research will also be used to create multiple articles to post to the upcoming Humanities Center Digital Lab webpage.
Jockers, Matthew Lee. Macroanalysis: Digital Methods and Literary History. Urban: University of Illinois, 2013. Figure 1.