Maciej Maryl

Computational Monograph: Reading and Writing Distant Horizons

Review of: Ted Underwood, Distant Horizons. Digital Evidence and Literary Change. Chicago: The University of Chicago Press 2019. 200 p. [Price: $ 27.50]. ISBN: 9780226612836

Digital approaches have always been met with mixed responses in literary studies. A decade ago, Eric Hayot likened Franco Moretti to a guy bringing liquor to a party of bored literary scholars who had ran out of beer.[1] Some of them are unwelcoming and stay close to their empty keg (»either because they have grown to love it, or because they think there’s still beer in there«[2]), while the others praise the newcomer for saving them from boredom. Moretti’s approach was meant to stir things up and breathe new life into the discipline. Today, Ted Underwood brings his Distant Horizons to the same party and presents it not as a rupture and radical novelty, but rather as an approach that remains in perfect alignment and harmony with the history of the discipline, providing new types of insights for decades-old questions. Actually, it is hard to imagine Underwood believes in ruptures at all, given his utmost dedication to viewing cultural phenomena in a broad, long-durée context.

Two decades have passed since the publication of Moretti’s »The Slaughterhouse of Literature«,[3] one of the essays later collected in the eponymous Distant Reading,[4] which resonates nicely with the title of Underwood’s book. This period saw many developments in the field of digital literary studies (DLS aka cultural analytics, computational literary studies, etc.), which has grown rapidly, with scholars, teams, and projects on all continents working on a variety of topics. During those decades we saw the emergence of new approaches and a new generation of scholars, accompanied by critical debates that shattered the field and forced it to react and constantly redefine itself. It is worth highlighting that over this period Moretti’s work was fruitfully criticized by both digital and non-digital literary scholars, and, for many reasons he is no longer a central reference point in the field.[5]

So, no matter what type of beverage would be adequate to metaphorically convey Underwood’s contribution to the party of literary scholars, this span of almost twenty years provides a good opportunity to look at those changes through the lens of Distant Horizons, and, more broadly, through the author’s approach and arguments presented in other works. His claims may not seem so spectacular at first, but a closer look reveals their methodological soundness, theoretical awareness, and strong connection with the problems of literary scholarship, such as, genres, processes of literary history, and literary prestige. I would like to discuss Distant Horizons as a work somewhat typical of the current stage of DLS and the problems faced by its practitioners and readers.

Computational monograph

Before we get into details, let us start with a meta-commentary on the very form of this book, as I think it conveys a similar message to its contents. Distant Horizons is a monograph, which is the fundamental genre for the humanities in general and for literary scholarship in particular. At the first glance the book seems to be pretty similar to Underwood’s previous oeuvre, Why Literary Periods Mattered, which wrestled with similar literary and disciplinary history issues such as the conceptualisation of literary processes by the scholarly community, the problems of prestige in literary culture, and the durability of its models.[6] However, if we lay both books side by side on the table, interesting differences begin to emerge. Let us compare the tables of contents. Why Literary Periods Mattered has six chapters and an introduction, whereas Distant Horizons consists of five chapters, a preface, and two appendices. This already signals that something has changed; but let us dig deeper. The earlier work has three figures and no tables at all, compared to, respectively, twenty four and four in the recent book. Moreover, the median number of notes per page in the former book (1.82) is almost two times greater than in Distant Horizons (0.95), which hints at different argumentative styles for the volumes: the first one being devoted to discussing other texts, and the second focused on the presentation of empirical results. A text mining of these two volumes may yield even more insights, but, I just wanted to use these examples to show that although Distant Horizons is a traditional monograph it seems to have evolved into a slightly different genre, one which I would call here, a ›computational monograph‹.

The erosion of the genre begins at the level of form, as the traditional printed codex, which has been around for over half a millennium, seem to be ill-equipped to handle and connect heterogeneous methodologies and supplementary materials such as code and data. The discussion of data and method has been quite artificially moved from the main thread to the appendices, as if the main argument was more about the result than the method. It may also suggest that this book invites two modes of reading: the ›regular mode‹ of following the core scholarly argument, and the ›enhanced mode‹ that assumes a more digressive engagement with the method and data. It is important to mention that the latter option invites readers to go beyond the book and consult supplementary material stored on GitHub and safely deposited in Zenodo trusted repository.[7] Actually, the common concept of a ›supplement‹ is quite misleading here as this is rather a ›primary‹, ›basic‹, or ›fundamental‹ collection of code and data used at various stages of the research process along with notes for those who wish to use them. Materials deposited online can be (and have been) updated after the book has been published. Thus, the printed volume is only one of the gates to this research project and its outputs.

Such exemplary transparency in allowing for data and code reuse, and the replication of the results, needs a loud commendation as it is still not common in DLS. I read it as a practical response to the so-called replication crisis and the criticism directed at the first wave of DLS scholars like Moretti and Jockers, who were reluctant to publish and describe the sources of their conclusions.[8] Underwood encourages rigorous replication and testing of his results to ensure the proper development of the field (cf. 174).

The publication of these materials also gives a hint of the scale of the work required to reach the conclusions presented in this book. Hundreds of lines of code and large datasets had to be created beforehand and worked through for long hours in order to arrive at results that could be discussed in the chapters of Distant Horizons. Moreover, some software was also specifically written for data preprocessing (as was BookNLP in the case of this book), to make further exploration even possible.

The authorship is another interesting feature that differentiates the two books in question. While both volumes are based on previously published work, in the case of Why Literary Periods Mattered these were Underwood’s solo outputs, while Distant Horizons dwells on co-authored papers, and collaboratively created software and datasets. However, the final output in both cases is presented in the form of an individual monograph. Thus, Distant Horizons contains previous teamwork that has been subject to the overall (and individual) argument of the monograph and then filtered through it. And I am not suggesting here that Underwood considers this solely his own work – on the contrary, he has acknowledged minutely all of his contributors. Yet, what is interesting is that our current modes of authorship do not handle multiple contributions to a monograph well, which, in the end, reflects the work of an individual scholar, but, on the other hand, would not be possible without the help of others.

Defining the target readership of such monographs poses similar challenges. Digital humanities work has always had, at least, a dual audience: scholars of a particular discipline, who know the context but not the method; and digital humanists, who can critically evaluate the method but not the originality of the contribution for the state of the art – as they specialise in different (sub)disciplines. In Underwood’s case we may also count a growing number of scholars who form a common part of these two groups, that is, experts both in method and subject. However, bridging the gap between these communities is a routine challenge for most writing in DLS (or perhaps in DH). It results in a certain duplicity of argument that has to present computational methodology as a valid way of arriving at scholarly conclusions for non-experts, while giving enough detail for digital-savvy researchers to be able to scrutinize the results of distant reading. It is also hard to keep every reader of an interdisciplinary work satisfied, as Underwood acknowledges: »[c]olleagues from sciences will urge a writer to add more statistical tests. Formalist critics will ask for more close readings. Book historians will ask for a bibliography that separates different editions« (156). A distant reader has to address all these audiences sufficiently while presenting the outcomes of their analysis.

This duplicity also highlights the problems with evaluating a computational monograph. Reviewers are usually tasked with assessing the claims being made, and their contribution to the relevant field of study. They may discuss the method, theoretical assumptions, and interpretive strategy, judging their validity and suitability. However, a review of computational work also invites different kinds of scrutiny, ones that are not very common in the humanities, like the replication of analytical procedures, or testing them on different material. Underwood expects to trigger a different kind of reception and criticism by sharing the supporting material online. He also treats it as a way of avoiding superficial criticism and inviting deeper engagement with the project: »[w]hen an author has shared code and voluminous data, it is no longer enough to draw up a list of scattered errors and omissions« (183). That sets the bar very high and requires a huge amount of additional work, especially if one wants to collect new material to test these conclusions.

Such a challenge was recently undertaken by Nan Z. Da, who reviewed multiple DLS articles in her hotly debated »The Computational Case against Computational Literary Studies« (2019). Da ventured to scrutinise the use of statistics in these papers, trying to replicate their results whenever she could access the data and code. She arrived at the conclusion that there is »a fundamental mismatch between the statistical tools that are used and the objects to which they are applied«.[9] One of the articles she criticised was Underwood’s »The Life Cycle of Genres«, later expanded into Chapter two of Distant Horizons. Da questioned the procedure of model training (although actually Underwood had applied it in a manner she had advocated for), and for using a different definition of a genre than Moretti had, whose work Underwood had endeavoured to scrutinise. She concluded that Underwood’s results are too peculiar to mean anything: »if everyone can agree that something is changing – even Underwood concedes that genres evolve – but you have devised one way that concludes that it does not, it does not necessarily mean that you have found something. It just means your instrument of measurement might be too weak – your method might have too little power – to capture this kind of change«.[10] Underwood’s subsequent rebuttal resonates with other reactions by DLS scholars taking part in the discussion on the Critical Inquiry online forum: »Da’s own argument remains limited by its assumption that statistics is an alien world, where humanistic guidelines like ›acknowledge context‹ are replaced by rigid hypothesis-testing protocols. But the colleagues who follow her will recognize, I hope, that statistical reasoning is an extension of ordinary human activities like exploration and debate«.[11]

I find this exchange elucidating, for what constitutes a computational monograph, together with the previously discussed issues of form and authorship. A computational monograph relies on the productive tension between the confirmatory approaches of statistics, and the exploratory applications of those methods in digital humanities work. And so, this mode of analysis entails the qualitative interpretation of quantitative insights, which has to be anchored in a particular context or field. In other words, quantitative approaches provide some hints, the meaning of which is elucidated by qualitative explanations. For Underwood, statistical methods serve as tools that enable literary insight and help answer truly disciplinary questions. This seems to be the spirit of the computational monograph, which is both singular and plural at the same time: it collates the evidence but channels it through the individual viewpoint of a scholar, who then forms it into a narrative.

Distant perspective

This dualism is present in the very title of this book, Distant Horizons, which – aside from having the same acronym as Digital Humanities – refers to the limitation of a singular perspective. »A single pair of eyes at ground level can't grasp the curve of the horizon, and arguments limited by a single reader's memory can't reveal the largest patterns organising literary history« (x). Only if we position ourselves above ground level may we notice the curve. And the same goes for literary studies: computational methods give us a different scale of reading; it enables a processual, rather than a discrete understanding of literary history. The foundational claim of this book is that literary studies have a rich literary-historical knowledge, within »certain chronological limits«, that allows the author’s movements and periods to be characterised; but, their application on the macro-scale of broader processes poses serious difficulties (cf. 8). In other words, Underwood focuses on the mid-range claims of literary scholarship, for example, those dedicated to a particular period, and raises them above ground level to explore their validity as long-range phenomena spanning centuries along a continuous timeline (cf. 20).

In many ways, Distant Horizons continues Underwood’s quest for computational-savvy literary studies, as imitated in the last chapter of Why Literary Periods Mattered, yet, with a certain twist. He used to believe that »[q]uantitative methods will be easier for us to assimilate when they conform to our preference for discontinuity«;[12] that is, to a way of looking at the history as a series of graspable, discrete periods. However, in Distant Horizons, instead of having these methods adapted to the sequential perspective of literary studies, he would rather expect literary studies to embrace the continuity.

So, why does literary scholarship fail to recognise these methods and, so, remain largely sceptical? Underwood seems to locate the reason for this slow expansion in disciplinary specificity. Firstly, computers have made textual modelling possible only recently, which may explain textual scholars’ earlier lack of interest in the use of statistics and thus its smaller uptake. Given that only a handful of humanists mastered statistics, we cannot expect massive engagement with such scholarship. The second reason, closely interlinked with the first one, is that DLS results cannot yet be fully appreciated and may be ahead of their time, as »researchers have begun by tracing phenomena that don't yet have literary significance for a community of readers« (16f.).

There are, therefore, certain findings that raise our curiosity, but that are hard to embed in broader literary historical knowledge. Underwood stresses that although the computational results may sound superficial or simplistic, they demand a proper interpretation. For instance, some interpretation is required of the fact that certain terms are clearly gendered, as in the analyses described in chapter four that reveal that fictional »[w]omen smile and laugh, but midcentury men, apparently, can only grin and chuckle« (124). Just as in the humanities, which are sometimes called out to prove their usefulness to society, DLS (and more broadly DH) is subjected to the same treatment: to show its relevance to the humanities. This is why chapter two is not so much about defining genres and their characteristic features, but rather about the usefulness of genre models for solving theoretical debates. On a broader scale, Distant Horizons aims to bridge the gap between literary scholarship and computational findings by consequently presenting it in a coherent, exploratory workflow.

Exploratory approach

Let us try to distil the main features of Underwood’s approach. It entails building models through perspective modelling and providing a qualitative interpretation of quantitative findings. We already know the basic assumption that the quantitative and qualitative analysis are intertwined into a single workflow, which is rather exploratory, not confirmatory. Underwood is very clear that mere numbers do not substitute for interpretation, as they are just signs with »no special power to settle questions« (xviii). Instead, they serve as a vehicle for viewing and interpreting problems at a larger scale. And this precise feature seems to be a bone of contention for both statisticians, who are critical of loosening the restrictions of the quantitative approach; and humanities scholars, who see the use of numbers as a reductionist simplification of complex textuality. Quantitative models, Underwood reminds us, are »no more objective than any other historical interpretation; they are just another way to grapple with the mystery of the human past« (xix). Computational analysis serves here as a modelling system of higher order that translates literary history into models, which may provide new insights that still need to be interpreted.

Before moving on to the interpretation, let us discuss how the models are constructed. Underwood criticises the notion of operationalisation advocated by Moretti for DLS (who understood it as »building a bridge from concepts to measurement«),[13] by calling out the arbitrariness of choosing what actually constitutes such a measurement. Instead he employs ›perspectival modelling‹, that is, he uses the actual perspectives of historical readers to train his models. For instance, he does not try to establish which features constitute the genre of detective fiction, but rather takes the actual texts which are labelled by scholars as belonging to this genre, and extracts the features which distinguish this group of works from other texts. It should be noted that he does not abandon operationalisation altogether, but rather advocates for supporting it with available evidence, because, in the end, some operationalisation is needed to convey fuzzier notions like that of a literary prestige, which Underwood operationalises as the »probability that an author will be discussed in certain elite periodicals« (73). Similarly, academic citations serve as a proxy for cultural capital, and the number of editions in libraries stands for economic success (cf. 98).

Another important assumption is that models should go beyond simple word counts and include multiple textual features. Writing about models for gender in fiction, Underwood spells out the main reason behind this principle: »everything affects everything: as usual with statistics, there is no single cause« (128). Underwood applies supervised machine learning, training his models on one part of the sample and testing them on another. This procedure creates a literary model, which is just another representation of literary history, and needs further interpretation and historical contextualisation to acquire meaning (cf. 37).

This is a tricky part of the method as it is also the essence of the mixed-methods approach: statistically generated models display a certain pattern (i.e. particular groups of words are linked to the phenomenon being researched and others are not), which are then qualitatively interpreted by scholars and assessed against the existing body of literary scholarship. Chapter four is a good example of the difficulties in moving from the distant view to interpretation, as it tries to juxtapose nine hundred words that increase the probability that a book of poems will be reviewed, with the other nine hundred that point in the opposite direction (cf. 84 sq.). Two sets amounting to almost two thousand words are pretty difficult to interpret, even when some vocabulary is presented in the context of a sample poem that highlights certain differences. Underwood then aims to reduce this complexity by classifying these words in Harvard’s General Inquirer lists, grouping the vocabulary into those relevant to colours, body parts, first-person singular, natural objects, and weakness. But even if Underwood shows there is a weak correlation between the text’s usage of words from these groups and being reviewed, it leaves readers with the need for more explanation. And it is not that simple to put all these features in a coherent and engaging explanatory narrative, which – as he notes elsewhere in the book – poses the real, aesthetic challenge: »it is simply hard to write with sweep and verve about thousands of books« (156). DLS seeks to find ways of presenting such results in a meaningful and accessible way, and although Underwood’s work serves as a good example here, there is still more to be achieved in this respect.

Discoveries

Now that we have reconstructed the main assumptions behind his work, let us try to assess the outcomes of their application – something Underwood calls discoveries. What all chapters have in common is the scale of the description embracing the longue durée of literary history. Wide perspectives allow the isolated facts (or observations conflated to certain ›periods‹) to be understood as elements of a continuous, long-lasting processes. Thus, a discrete understanding of literary history as a series of consecutive periods is substituted by reflection on the broader, underlying processes.

This continuity is already visible in the scope of the chapters, which explore some of the traditional topics in literary studies. The first three chapters analyse the dimensions of literary history – namely forms, genres and prestige – with volumes discussed as wholes. Chapter one tracks genre transformation in the fiction of the eighteenth, nineteenth, and twentieth centuries, framing it as a continuous process of differentiating literary language from ›non-fiction‹. Chapter two provides a similar perspective on the evolution of three genres (detective, gothic, and science fiction). In some cases (e.g. science fiction) it argues that genre consolidation had taken place even earlier than suspected. Interestingly, some models trained only on earlier works can recognize the same genre at later stages. This seems to oppose Moretti’s claim about the ›seasonality‹ of genres, in favour of positing more durable patterns; although Underwood doesn’t look at subgenres like Moretti did in Graphs.[14] Chapter three seems to confirm (albeit with some level of complication) the evolutionary differentiation between popular and critically acclaimed works. Instead of picturing the separation of the ›highbrow‹ and ›lowbrow‹ outputs as a rupture or the ›great divide‹, Underwood provides a model of the slow, continuous diversification of these types of writing (cf. 72 sq.). This very chapter may be seen as additionally validating his entire approach, because the events that, in a short frame of time, were considered revolutionary, could now be seen in the broader picture as elements of an evolutionary trend. The fourth chapter zooms in to the level of textual content, exploring how fictional characters are shaped by assumptions about gender and how this implicit gendering grows blurrier after 1840.

Underwood depicts the ›long arc‹ of certain literary processes, claiming that changes are not so rapid and generational as had been believed. Or rather, we scrutinized these processes at a smaller scale, unable to account for them in their entirety. This, of course, may raise some doubts as to whether the scale itself does not impose certain limits on granularity; that is, looking at the dominant arc may keep the influence of alternative or short-lived processes in the background.

The Future

In the last chapter, under the banner of »Risks of Distant Reading«, Underwood explores the disciplinary and institutional dimensions of the shift towards computational methods. He moves from the defensive approach of arguing for computation that he presented in the humanities in Why Literary Periods Mattered, to the more offensive stand of seeing it as an integral part of the future of the discipline. Underwood is pretty optimistic in assuming that computational methods can be applied everywhere in literary studies, and can also supplement every close-reading in the manner proposed in his book and discussed throughout this review (cf. 147). He assumes that the complexity of new methods, requiring not only access to data but also coding and statistical skills, are the reasons behind their low popularity among humanists (cf. 145). However, he argues, that this is not as big an obstacle as it may seem, as distant readers »need a semester of statistics and some programming experience (and perhaps a course in social science) added to their training in literary history« (163) in order to start posing valid questions, and applying these methods to answer them.

However, there are already many tools available that serve users who don’t have an advanced knowledge of programing (e.g. Voyant), which assist in close readings and have various word-frequency features. Still, the uptake is quite slow, which may suggest that it is not so much a question of competence, but of accepting these tools as valid instruments of analysis in literary studies. And here again, Underwood has something to add about the fundamentally interdisciplinary nature of DLS. He encourages us to go beyond the binary oppositions between the qualitative and quantitative approaches and embrace new modes of inquiry suitable for the twenty-first century. This involves not only humanists understanding and applying quantitative methods, but also scientists opening up to the doubt and uncertainty of humanities research. »Mutual understandings would lead us away from the fantasy that objectivity and subjectivity represent firmly distinct alternatives and toward a more flexible conversation about human attempts to understand the world by modelling it« (158). So, as he put it in another essay, humanists need to know how machine learning works to »understand why the boundary between quantitative and qualitative reasoning is growing fuzzier«.[15]

This is something more than just a methodological endeavour. Embracing new methodologies is also important for students in order to prepare them »for a world where information is filtered by computers«.[16] So, the computational method is actually not only a handy tool for literary analysis but also a vehicle for understanding contemporary culture. He points out that statistical methods have already found their way into departments of sociology, communication, and information science, so there is no excuse for literary studies to lag behind. And this is precisely the approach Underwood consequently advocates in his works; so let us conclude with the title of one of his essays: »Dear Humanists: Fear Not the Digital Revolution«. Embrace it!

Acknowledgements

Parts of this work were supported by Shaping Interdisciplinary Practices in Europe (SHAPE-ID), H2020 project #822705 and Preparing Open Access in the European Research Area through Scholarly Communication (OPERAS-P), H2020 project #871069.

Anmerkungen

[1] Cf. Eric Hayot, A Hundred Flowers in:, Jonathan Goodwin / John Holbo (eds.), Reading Graphs, Maps & Trees: Responses to Franco Moretti, Anderson, SC: Parlor Press 2011. [zurück]

[2] Ibid., 65. [zurück]

[3] Cf. Franco Moretti, The Slaughterhouse of Literature. Modern Language Quarterly 61:1( 2000) 207–28. https://doi.org/10.1215/00267929-61-1-207. [zurück]

[4] Cf. Moretti, Franco, Distant Reading, London: Verso 2013. [zurück]

[5] Cf. Lauren Klein, Distant Reading After Moretti. ARCADE, https://arcade.stanford.edu/blogs/distant-reading-after-moretti. [zurück]

[6] Cf. Ted Underwood, Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies. Stanford, CA: Stanford University Press 2011. Henceforth referred to as Why… [zurück]

[8] Cf.Katherine Bode, The Equivalence of Close and Distant Reading; or, Toward a New Object for Data-Rich Literary History. Modern Language Quarterly 78 (2017), 77–106. https://doi.org/10.1215/00267929-3699787. [zurück]

[9] Nan Z. Da, The Computational Case against Computational Literary Studies. Critical Inquiry 45:3 (2019), 601. [zurück]

[10] Ibid., 608. [zurück]

[11] Ted Underwood’s response to the Critical Inquiry forum dedicated to Da’s article: https://critinq.wordpress.com/2019/03/31/computational-literary-studies-a-critical-inquiry-online-forum/ [zurück]

[12] Cf. Ted Underwood. Why…, 166. [zurück]

[13] Cf. Moretti, Franco: ’’Operationalizing“, or, the function of measurement in modern literary theory. Literary Lab Pamphlet 6 (2013), 3. http://litlab.stanford.edu/LiteraryLabPamphlet6.pdf [zurück]

[14] Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History. London: Verso, 2005. It should be stressed that in his discussion of subgenres Moretti, unlike Underwood, analysed metadata, not the actual texts. [zurück]

[15] Ted Underwood, Dear Humanists: Fear Not the Digital Revolution. The Chronicle of Higher Education, 27 March 2019, https://www.chronicle.com/article/Dear-Humanists-Fear-Not-the/245987. [zurück]

[16] Ted Underwood, Why an Age of Machine Learning Needs the Humanities. Public Books, 5 December 2018, https://www.publicbooks.org/why-an-age-of-machine-learning-needs-the-humanities/. [zurück]

2020-10-17

JLTonline ISSN 1862-8990

Copyright © by the author. All rights reserved.

This work may be copied for non-profit educational use if proper credit is given to the author and JLTonline.

For other permission, please contact JLTonline.