Evelyn Gius
Digital Humanities as a Critical Project: The Importance and Some Problems of a Literary Criticism Perspective on Computational Approaches
James E. Dobson. Critical Digital Humanities. The Search for a Methodology. Urbana, Illinois: University of Illinois Press 2019. 175 p. [Price: $ 25.00 (Paper)]. ISBN: 978-0-252-08404-1
With Critical Digital Humanities. The Search for a Methodology, James E. Dobson presents a contribution to the current digital humanities debate about their orientation and basis in book length. His aim is to practice the digital humanities as critical digital humanities in the context of literary studies. Hereby »critical« means two things to him: On the one hand, it is about criticizing digital methods in the sense of an analysis and reflection of the digital humanities methodology. On the other hand, Dobson calls for digital humanities to be practiced as critical in the sense of literary criticism (cf. 3f.). Since Dobson focuses on literary criticism, digital humanities primarily stands for literary text analysis, which is implemented computationally with methods of corpus creation and text analysis.
Dobson outlines his project in the foreword and also mentions the central theses and goals of his contribution (vii-xii). He points out that computational procedures bring with them both new possibilities and new challenges for the humanities, and demands: »Neither the complications nor the opportunities necessitate outright rejection or unreflective acceptance« (viii). Accordingly, he will begin to examine to what extent the data-based and algorithm-driven approaches contribute to the »interpretative goals of humanities« (ibid.). In doing so, he wants to distinguish himself from the view that automated analysis in itself is a goal of digital humanities, which he believes to be widespread. In addition, Dobson criticizes approaches to literary texts that treat them mainly as cultural information, as well as the concept of evidence associated with these approaches. According to him, the »documentary realist treatment of digital or digitized sources« (ix) is not suitable for depicting the full range of literary interpretation practices. Instead, »computational criticism is a situated rather than an empirical and objective activity« (x), i.e., it is a matter of context-dependent practices.
The following chapters essentially revolve around this situatedness, with Dobson bringing a wealth of discourses, arguments, and examples into the field in order to elaborate on them. He acts both on a critical or reflective level and on the level of the implementation of computational text analysis. The latter comprises exemplary approaches of other researchers, as well as his own, in which he applies exemplarily discussed procedures and makes available their data and code. Dobson thus works on his topic of critical digital humanities on two levels, in which he to some extent combines them by discussing already implemented approaches with regard to their (non-)contribution to the critical digital humanities.
The number of research approaches discussed, together with Dobson's own approaches, theses and demands, and the different target groups implicitly addressed therefore result in a rather heterogeneous text in which the guiding threads are sometimes difficult to identify.
Nevertheless, Dobson's book can be read – in some cases with great profit – from three perspectives that correspond to the presentation and reflection of digital humanities approaches at different levels: (i) as a demand for reflection and critique of computational text analysis from the perspective of literary criticism, (ii) as a rather introductory description of the basic steps in computational text analysis, and (iii) as a critique of exemplary implementations of computational text analysis. These levels are, of course, interconnected. For a differentiated review of Dobson's contribution, though, it is worth considering them individually.
1. Reading 1: The Critical Digital Humanities Program
Dobson's contribution should first of all be read as a contribution propagating a critical approach to digital humanities in a productive sense. This aspect is highlighted by the programmatic title of his book as the central argument.
His overall aim is to establish a practice of literary criticism, which can be described as suspicious reading (according to Ricœurs hermeneutics of suspicion) in the digital humanities. Dobson identifies a number of important and appropriate aspects which in his view should play a role in the consideration and critical reflection of digital humanities approaches. The wealth of discourses that Dobson considers for this purpose cannot be reproduced here. But the most important aspects will be summarized cursorily.
In the first chapter, Dobson discusses the consequences of digitization for text and text analysis.
Dobson states that digital humanities certainly stimulate the current discussion on approaches in the humanities. This makes it possible to come back »to a series of important debates concerning the protocols of reading and the task of interpretation« (10). In addition, computational methods would »both expand what we do and raise important questions that proposed alternative methods might not« (11). Thus, Dobson regards computational approaches as a heuristic for theory and methodology (in the sense of a systematic methodological reflection) of analysis and interpretation of texts. This is a critical view that is partly already established, but, according to Dobson, must be further intensified (cf. 15f.). Consequently, the specificity of digital humanities is not determined by their methods, but by their critical claims. The central questions of the critical digital humanities are accordingly: »where the output came from, what computational transformations were executed to produce them and why these, what influenced the possible uses and processes afforded by these transformations were, and where they came from« (20f.).
Dobson criticizes the attitude that ›mining is not meaning‹ by pointing out that even a simple text mining procedure such as collocation analysis is based on a certain theory of meaning, namely the context theory of meaning established, among others, by Firth (cf. 22).[1] Additionally, Dobson regards the distant reading approaches of Matthew L. Jockers, Franco Moretti and others as belonging to the social sciences and not to the humanities. In the separation of method and interpretation that is propagated in these, he sees an attempt to cover structuralist approaches in order to avoid poststructuralist critique (cf. 29). According to Dobson, the methodological core of the humanities consists of a humanities »metamethod« with its question »How exactly shall we justify this one?« as well as in their various heterogeneous methods (30). Furthermore, Dobson states that it is difficult to combine the methods of the humanities with the computational methods based on validity. Nevertheless, one should not defer the use of computational methods because of doubts about these methods arising from »suspicious and critical readings« (31).
In the second chapter, Dobson deals with text-mining approaches as »the strongest form of digital humanities« (34).
Dobson agrees with Katharine Bode's demand for new digital objects that represent literary works appropriately in their historical context.[2] He mentions TEI as a possible starting point for the development of further standards, but does not specify these standards beyond the separation of text and paratext, which is already possible with TEI. In addition, Dobson argues that a formalized description of the computational processes involved in creating these objects is essential (cf. 36f.). Dobson predicts a separation between open source software and Voyant-like tools and then proposes ›the‹ open source model as the appropriate model for the required transparency, whereby he equates open source with minimal computing approaches. The latter are suitable for approaches that are more oriented towards the extension rather than the replacement of critical practices in the humanities, as Dobson says referring to Phillip R. Polefrone et al. (38)[3]. According to Dobson, »The human-scale, minimal computing model coupled with richly and openly available datasets answers some of [sic] important critiques launched against computational methods as now used within the humanities« (ibid). He therefore considers minimal computing and minimal methods a way to democratize participation in digital humanities due to their comparably low costs (cf. 38f.).
As a solution for publishing workflows he suggests Jupyter Notebooks (cf. 39), which he praises for including not only the possibility of integrating text blocks but also the possibility of inserting code comments (cf. 40). This is a bit puzzling: While Jupyter Notebooks certainly are a great way of documenting the process of analysis, the use of code comments is not a particular feature of them; they can be used in all common programming languages.
Using the example of machine learning, Dobson explains that data »cannot ever be said to be computed, distilled, and analyzed free of subjective intent« (43); rather, even unsupervised learning always reflects the researcher's decisions, since setting up the procedure requires many decisions, such as the selection of data and algorithms. The same also applies to »automated reading of the text« (45). Dobson exemplifies these decisions, among other things, by means of his own topic-modeling experiment and an application of Jockers model (cf. 50–57).[4]
Dobson then turns to modeling in a more general sense and refers to Richard Jean So and his appeal to redirect criticism away from false or dirty data towards modeling, i.e. towards the abstraction of interesting phenomena and their testing in the computational application (cf. 46).[5] Dobson then discusses the End of Theory debate with reference to Chris Anderson.[6] Both views — models as statistical testing grounds as proposed by So and the replacement of models by data and correlationalism proposed by Anderson —, however, are not profitable model concepts for digital humanities, since they are either too narrow or too wide. According to Dobson, the solution lies in the methods of the humanities, which provide the tools to design and test models (cf. 49).
The role of the (historical) context for text interpretation is emphasized by Dobson in the third chapter. There is a risk associated with current computational projects and digitality in general: »what might be called the fantasy of total history through the use of tools and methods that conceive of archives as a closed and complete source of knowledge, that operate on and can only address objects residing in the digital archive« (67).
Dobson outlines the (literary studies) development that led from historical criticism, which resulted in canonization with the integration of the historical context, via New Criticism, which attempted to counter this canonization through decontextualization (according to Dobson: in vain), to New Historicism in the 1980s, which juxtaposed literary texts with other contemporary texts in order to provide the anthropological »thick description« of Clifford Geertz (cf. 71–75).
Dobson presents the approach of Amy Earhart, who conceives archives as a possibility to enrich literary textual analysis with context, whereby the underlying technologies are not only to be understood as facilitators, but also as constructive in themselves.[7] According to Dobson, researchers who use computational methods for text analysis and thereby establish or reproduce structuralist, ahistorical approaches are to be seen in contrast to this (cf. 76f.).
At the same time, he demands that literary scholars should create their archives and collections according to the same contextualization considerations that are applied to digital archives. Historicist methods would not necessarily conflict with text-mining methods and could even complement them (cf. 79). Dobson points out that corpora are normally treated as ›self-enclosed bodies‹ and are examined with methods that are essentially text-immanent (even if, for example, the creation of stop-word lists can be done by using further texts, cf. 83). Secondary databases could therefore be used as heuristics to attribute meaning to texts in computational approaches. Dobson mentions lexical databases as an example of such a »text-external referential system«, as he calls them, following the linguistic designation of references referring to something outside the text (cf. 85). Although these systems interfere with their function as alternative systems of opinion by abstracting data sets, dictionaries, program libraries, etc., they point to the situatedness of all data (cf. 86).
Using examples of sentiment analysis, Dobson criticizes that the reference to text-external referential systems results in a one-sided historicization gesture, that »yokes text to context without the possibility of exchange« (97). In addition, the methods mentioned are based on empirical studies that have been carried out in the present or more recent past and thus cannot do justice to historical texts. This can be attributed to an uncritical attitude of computer science, which adheres to a ›documentary realism‹ according to Dominick LaCapra and has a naïve understanding of databases as transparent and analysis results as facts (cf. 100).[8]
In the fourth and final chapter, »The Cultural Significance of k-NN« (101–130), Dobson addresses the critique of algorithms, which he considers to be complementary to the otherwise mostly formalistic or phenomenological approaches as a cultural and historical critique. On the basis of the discussion of the k-nearest neighbor (kNN) algorithm, he aims at contributing to »algorithmic culture« or »algorithmic governmentality« (102).
He argues that we should examine and understand the historicity of algorithms in order to understand (i) which cultural assumptions are formalized in them and (ii) how they generate the conditions of possibilities in our present (cf. 111). Dobson implements this first by outlining the genesis of Bayes’ theorem. He points out that, on the one hand, it is based on a subjective »prior« and, on the other hand, implies an intelligent cause (i.e. God) (cf. 113f.).
From a more general point of view he emphasizes that data with many outliers, such as data from the humanities, leads to poor results in the application of many algorithms (cf. 116f.).
Dobson then traces the development of the NN algorithm in the context of the US military and the Second World War and presents the further development to the k-NN algorithm (cf. 121–126). Referring to Wendy Chun, Dobson criticizes that the algorithm applies the principle of homophilia.[9] He thereby uncovers a central assumption contained in the algorithm that leads to the creation of homogeneous groups called ›Virtually Gated Communities‹ by Chun. According to Dobson, the developers of the algorithm used a pseudodemocratic language to describe the function of the algorithm (cf. 126). This happened at a time when neighborhoods were actually segmented and homogenized. Dobson further problematizes this through MacCormick, who unreflectively uses k-NN to classify real neighborhoods in terms of their political donation behavior (cf. 127f.).[10] Finally, Dobson points out that the supposedly democratic and the homogenizing principle of the algorithm continue to work in current machine learning procedures and must be criticized in order to understand their consequences. Thus, »[i]n claiming that data make decisions, scientists and others displace multiple forms of ideologically influenced subjectivity« (128). He then states: »Pattern recognition gives name to theological beliefs in algorithmic form that were deployed on behalf of those seeking to restore order and regularity in the twentieth century« (ibid.).
Finally, Dobson refers again to the already discussed concept of algorithmic governmentality by Antoinette Rouvroys and her Foucault-influenced claim that there must be »a discursive space in which subjects can call into question the operations of power and governmentality« (129).[11] According to Dobson, there has been a movement in the history of our algorithms, away from »contestable narratives of causation« and towards »profiling through correlation« (ibid.). Hence an understanding of this is important for both the humanities and everyday life. »One cannot isolate and distinguish interpretive and computational methods« (ibid.). The humanities are consequently important and »Critical theorists play a crucial role in creating alternative fields in which computational critique can flourish and have an uptake« (ibid.). Here, Dobson clearly points out the substantial contribution of the humanities: They can expand evidence, question the representativeness of data, provide examples of the whole range of the phenomenon studied, and question the existence of a natural order, in ›like belongs with like‹. »And finally, they can understand that classification itself is biased because while a list of samples must be finite, the range of possible objects is infinite« (130).
In his »Conclusion« (131–140), Dobson presents word embeddings as a helpful method in the context of critical digital humanities. He outlines their possible application, especially the alignment of the semantic spaces of one or more historical/specific texts to other texts, and concludes: »It may be the case that such alignment methods are the best materialization of the notion of deformation as computational humanists willfully warp imagined, constructed, and measured semantic space to other constructed spaces and the ever-receding horizon of futurity« (140).
Overall, Dobson’s major claim could be summarized as ›mining is meaning‹, and his most important demand is for scholars to explicate the whole process of computational text analysis. Both are crucial for a reflected approach in the digital humanities.
Yet, when it comes to a more detailed argumentation, Dobson’s understanding of computational aspects is in some cases somewhat confusing. For example, he discusses the bag-of-words model as being particularly suitable for researching culture, since it does not distinguish between text types (cf. 78). What remains unclear, however, is just why it should be the bag-of-words model that is text type-agnostic (and other common text models are not) as well as why this would even be an important feature. Additionally, Dobson partly contests the suitability of the model further below when he discusses Sentiment Mining and criticizes the bag-of-words model in text mining approaches for leading to decontextualization (cf. 95). There, Dobson seems to also include approaches as the discussed approach of Bing Liu. But these rely on dependency analyses and thus are not bag-of-words models, as they conceptualize text as consisting of more or less linear sentences in which every word has a defined relation to at least one other word of the sentence. Here, a more detailed definition of bag-of-word models would have been helpful. Moreover, the question of which text model is appropriate for specific texts or analysis tasks could have been raised and perhaps discussed to some extent.
These issues become even more apparent when reading Dobson’s contribution from a less reflection-oriented and a more computational standpoint, which is discussed in the next section.
2. Reading 2: An Introduction to Computational Text Analysis
In his introduction, Dobson announces that the assumptions underlying most text-based computational approaches will be identified as core issues in the four chapters of his book. These are viewed in terms of activity classes: segmentation, normalization, classification, clustering, and modeling (cf. x). Thus, Dobson's contribution can also be read as a generalized representation of the computational analysis of literary texts.
Although these five activity classes mentioned in the preface are not explicitly taken up again in the four chapters or the conclusion, from a process perspective the book can nevertheless be read as a presentation of the following, sequential aspects of computational text analysis, which are also echoed in the chapter headings. After a brief introduction to digital humanities, Dobson begins with the question of the text basis and the transformations of texts in the course of digitization (Chapter 1: »Protocols, Methods, and Workflows. Digital Ways of Reading«, 1–31) then moves on to the question of algorithms and text-immanent aspects of analysis (Chapter 2: »Can an Algorithm Be Disturbed? Machine Learning, Intrinsic Criticism, and the Digital Humanities«, 32–65), discusses the context dependency of texts and their analysis (Chapter 3: »Digital Historicism and the Historicity of Digital Texts«, 66–100) and subsequently addresses statistical models and in particular the k-NN algorithm (Chapter 4: »The Cultural Significance of k-NN«, 101–130). The conclusion gives an outlook on the applicability of word embedding approaches in the spirit of the critical digital humanities (»Conclusion«, 131–140).
The reading of Dobson's contribution as an introduction to digital humanities includes, in addition to the general explanations and considerations of the above-mentioned aspects of computational text analysis, Dobson's applications of some typical digital humanities methods. Dobson applies the methods of collocation analysis, topic modeling, sentiment analysis and clustering based on the kNN algorithm to narratives from the North American Slave Narrative Archive from the University of North Carolina. In addition, he addresses the compilation of the corpus and emphasizes its relevance: by choosing that specific corpus, he is using an archive that has an organizational form typical for digital humanities. Furthermore, by viewing those specific texts, he argues, he is at the same time breaking open the restrictive canon of literary texts (cf. x).
While these issues actually represent most of the computational approaches to textual analysis, Dobson’s book (with the exception of the introduction and Chapter 4) is probably not readily understood by digital humanities novices. This is partly because the discourses and references Dobson responds to are too numerous to provide a coherent overview.
In Dobson’s defense, one could argue that he himself did not suggest his book to be introductory reading. However, he does imply this, in the way his introduction to digital humanities right at the beginning is given (cf. 1–5), with his general workflow model (cf. 9), and his conclusion that he had »a recursive walking down of [...] the ›stack‹ of any workflow making use of computation« (131). Moreover, the part of his audience that is not necessarily proficient in digital humanities, for example scholars from the traditional literary criticism, depends on the fact that the computational procedures are explained to a certain extent in order to then be able to understand his criticism of these procedures.
In addition to the sound parts, such as the already mentioned account of the history of the digital humanities, the explanation of Jupyter Notebooks (38–39) or machine learning (43–44), there are also a number of less accessible or even misleadingly presented aspects of computational procedures. These might cause some problems of comprehension for the audience or even lead them to inaccurate conclusions.
For instance, while Dobson does refer to the importance of workflows and discusses their relevance (cf. 8) when introducing his general workflow, he does not describe his visualization further (cf. 9). Apparently, the visualization is a flowchart representation that is also based on the ISO standard and thus contains a series of fixed, meaningful symbols that Dobson uses and varies in his other workflow representations (cf. 55, 87, 121). However, this is not explicated at any point.
Later, he presents the default values of a collocation function in the form of the output of the help command (cf. 23) and reveals this procedure only in a footnote: »This is the output produced by executing ›help(nltk.Text)‹ after importing the NLTK version 3.2.2 library with the python interpreter« (footnote 37, 145). The representation in the text might be difficult to read for someone without programming expertise. Moreover, the explanation in the footnote is only of limited use or even counterproductive. Although Dobson did introduce the natural language toolkit (NLTK) as well as Python in the main text, , in order to find an explanation involving these concepts helpful one also needs to have at least a vague idea about help functions in programming languages, what an interpreter and what a library is.
Similarly, in the context of Topic Modeling, Dobson specifies the numbering of the topics as #0 (cf. 50–52). While it is true that the programming language Python used by Dobson, just like other programming languages, assigns the first position of any sequence as position 0, there is no reason to adopt this enumeration system in the context of an otherwise barely technical representation in human communication such as this book.
The most blatant of these rather user-unfriendly depictions concerns the display of the default stopword list in the Natural Language Toolkit. Instead of simply listing the words, a screenshot of a Jupyter Notebook is tacitly inserted (Figure 2.3, p. 53). As a consequence, the figure contains double-framed information (Python call, Jupyter Notebook) which is not made explicit. Moreover, there is no discernible reason or added value to this type of framing, making it likely to lead to confusion.
Here, the obfuscation which Dobson rightly criticizes in existing tools and algorithms is performed by Dobson himself. All these cases are unnecessary displays of programming aspects, which do not contribute to the reader’s understanding of the underlying issues. On the contrary, they will probably lead to a representation of digital humanities as being difficult to understand and rather nerdy.
For beginners, the criticism of the — rightly diagnosed — intransparency of modern, modular programming languages (cf. 40) is also of little help. In this case, it would have been helpful to compare programming languages with other communication systems (such as natural language) or modelling (such as the description of certain phenomena in it). After all, maximum transparency as well as all-encompassing modeling, which results in a perfect representation, pose similar problems to all of these systems: Without a certain amount of abstraction, further consideration and contextualization cannot take place, but abstraction can only be achieved at the expense of transparency because it is not always comprehensible or even reversible. Here, therefore, the computational aspect is not the decisive problem, but rather the human cognitive capacity and perhaps even more so our practice of meaning generation.
Finally, from the point of view of digital humanities, some explicit or implicit assertions made by Dobson concerning basic concepts of digital humanities are irritating. For example, the not further explained (and wrong) equation of open source and minimal computing (cf. 38), the labeling of the Topic Modeling package MALLET as a »free« TM package and not as an open source (54), or the problematization of the Topic concept as different from our everyday or professional understanding that would be crucial but is missing (cf. 49).
These inaccuracies and obscurations unnecessarily obstruct the access of interested readers or even lead to false assumptions or conclusions. They could have been avoided, with the contribution still achieving its reasoning regarding the need for critical digital humanities.
3. Reading 3: A Critique of Exemplary Approaches to Computational Text Analysis
In its third reading, Dobson's book can be regarded as a contribution that presents and discusses exemplary computational analyses of literary texts and thus implements his conception of critical digital humanities as criticism. In the course of the discussion of the typical aspects of a workflow for computational literary text analysis, Dobson discusses exemplary instances, some of which he supplements through his own approaches.
Dobson explains that his focus in computational approaches is on »the use of sophisticated quantitative methods for text and data mining« (1), because: »These text-mining approaches collectively represent what I take to be the strongest form of digital humanities. They are strong methods in the sense that they are well understood, testable, and from certain quarters defensible« (34).
Dobson describes a series of approaches that he examines against the background of his critical digital humanities. The text-mining approaches of literary studies are presented by him more or less explicitly as not critical enough. For instance, he criticizes Matthew L. Jockers’ Macroanalysis as ignorant with regard to methodological history (cf. 24), and his approach in Text Analysis with R[12] to the chapter on cetology in Moby Dick as not informed by literary studies, since Jockers tries to grasp a phenomenon with lexical diversity that cannot be grasped that way (cf. 25f.).
Geoffrey Rockwell's and Stéfan Sinclair's Hermeneutica, on the other hand, are downplaying the role of algorithmic procedures in Dobson’s view.[13] He criticizes them for providing opaque tools whose functionality they conceal by referring to their materiality (26f.), and which in his view should not be conceptualized as hermeneutica (cf. 85). Dobson develops his own concept of hermeneutica as opposed to theirs: »The category of hermeneutica that I am calling text-external referential systems are deeply embedded within specific cultural moments and, because they are constructed and highly curated systems, they carry with them the assumptions, preferences, and prejudices of their creators« (ibid.).
Regarding the work of Andrew Goldstone and Ted Underwood on the development of literary criticism in a corpus of literary journal articles, he acknowledges their efforts to methodically reflect on their approach.[14] At the same time, unquestioned assumptions continue to exist in their approach, namely their understanding or implementation of critique as verbalized with ›critic‹, ignoring the other linguistic realization of literary criticism that could be made accessible through close reading (cf. 56f.). In addition, Dobson bemoans that Goldstone and Underwood do address the hermeneutic circle and the meaning of interpretation, but only start from the algorithmic output (cf. 63).
In the context of what Dobson calls text-external referential systems, he deals in detail with sentiment analysis approaches. He presents Jockers’ implementation of the Syuzhet Package,[15] which builds on the work of Bing Liu and others, but is used for the analysis of canonical literary texts (cf. 90f.).[16] Dobson uses examples to demonstrate the problem of sentiment analysis, emphasizing the subjectivity and situatedness of the human reading process. He also compares the approaches of Andrew J. Reagan et al. (2016)[17] and Jianbo Gao et al. (2016)[18] and notes that the former measure reader reaction and the latter the plot movements (cf. 94). Finally, he cites Jockers’ and Jodie Archers’ Bestseller Code as an example, criticizing it at the same time for obscurity due to the algorithm's inaccessibility.[19] »[E]xamples such as this provide evidence that computational methods, especially those methods making use of large numbers of undisclosed variables and features, can also invoke the same sense of depth and obscurity as found in demystifying logics of Marxist and psychoanalytical approaches« (96). Such approaches are considered a contrast to surface reading by Dobson, which he says is usually associated with digital humanities.
As a possible problem for digital humanities, Dobson sees the fact that it is often not humanities scholars but computer scientists who deal with such questions. However, due to their under-theorized and uncritical understanding of culture and literature, these scholars draw weak conclusions, as he shows in Jean-Baptiste Michel et al. (2011)[20] and James M. Hughes et al. (2012)[21] (cf. 99). He further problematizes this with John MacCormick who uses k-NN unreflectively to classify real neighbourhoods in terms of their political donation behaviour (cf. 127f.).
Apart from this work on computational text analysis, Dobson also cites positive examples: He presents the Archive of American Slavery (cf. 70) as an example of an archive dealing with problems in archival work that go beyond the usual difficulties.[22] He also points out that computational methods can help to discover gaps, and refers to Lauren Klein's work on »archival silence« in this archive (70f.).[23]
The example of the »thick digital maps« by Todd Presner et al. shows that historical geodata should not be mapped onto current map material without further ado, but must be enriched with suitable descriptions in order to do justice to their historical dimension (79f.).[24]
As already mentioned, Dobson supplements the examples of computational text analysis with his own implementations on texts from the corpus of slave narratives: He uses the function collocations() from NLTK for the analysis of »An Autobiography: The Story of the Lord's Dealings with Mrs. Amanda Smith the Colored Evangelist« (Amanda Smith, 1893) (cf. 22f.), performs Topic Modeling for the »Autobiography of a Female Slave« (Martha Brown, 1857), applies Jockers’ Syuzhet Package to »Incidents in the Life of a Slave Girl« (Harriet Jacobs, 1860–61), and finally uses the k-NN algorithm combined with »simple text-mining-tools« for the automatic literary periodization of his entire corpus (cf. 120–122).
Dobson's elaborations on the methods are at times incomplete in their reasoning or even inconsistent. For example, he notes that Jockers’ access to sentiment analysis may improve Liu's dictionary due to the use of scalar values, but the latter has a much more complex understanding of sentence structure (cf. 93). However, he neither explains why a scalar approach would be better than a binary one, nor why sentence structures are relevant. Instead, he claims that sentiment analysis has led digital humanities researchers back to the analysis of individual texts (cf. 95). This is unexpected, given the many corpus-based approaches he presents. The only single text analysis that he features is his own, and as he shows, its outcomes should definitely not be adopted. In what way, however, the results of the corpus-based approaches could be critically reflected due to the limited functioning of the method for individual texts, Dobson does not remark on. Instead, he assumes: »Perhaps [...] processing sentiment at the macro level just does not make much sense« (ibid.). To a certain extent, he revises this with the very next sentence: »But in the aggregate, as part of a model, these plots might have some value« (ibid.). It remains open to what extent macro levels and aggregates can be distinguished here.
Another somewhat bewildering aspect is a statement in the context of Gadamer's hermeneutic circle: »The poles in Gadamer's circle are less rigidly defined than allowed for much work in the digital humanities« (63). It is certainly true that one cannot implement Gadamer’s concept in a straightforward manner. But this holds for almost all sophisticated concepts in the humanities. If Dobson is right with this assessment, more critical digital humanities will not be possible. Instead of presenting the hermeneutic circle as not implementable in the digital humanities, it would be interesting to discuss problems of formalization and operationalization from a literary criticism perspective.
In general, Dobson does not state his criteria when criticizing the (un-)suitability of a method. Therefore, his critique of his core examples of Topic Modeling and Sentiment Analysis oftentimes seems undifferentiated. In particular, it is not clear whether he considers the method as a whole unsuitable or if it is rather the concrete implementation that needs to be improved. Furthermore, general criteria for an alternative, adequate implementation are not given.
In addition, there is a lack of positive examples that the critical digital humanities could exemplify in the context of text mining. Even Dobson's own text-mining activities do not meet the demands he is making regarding the criticized approaches. In most cases, Dobson doesn't formulate a question, nor does he interpret and contextualize the results of his activities. Instead, he limits the application to one text only (with the exception of the periodization approach in chapter 4) without providing reasons for this. The approaches and the incorporation of their results into further analysis are justified neither on the basis of the specific approach presented nor more generally on the basis of a literary studies perspective in the sense of the critical digital humanities. This may be justified by the fact that Dobson’s main objective was to present the respective procedures. At the same time, in the context of the propagated critical digital humanities, an isolated consideration of the functionality of these procedures for mainly singular texts is at least unfortunate, because it misses the opportunity to highlight the potential added value of critical access using concrete examples.
4. Parenthesis: Criticizing Digital Humanities with Reflected Acceptance? Structuralism as a Pseudo Problem
So, how should we handle digital humanities approaches as humanists and literary critics? As already mentioned above, Dobson recommends neither »outright rejection« nor »unreflective acceptance« (viii). He himself is certainly not at all suspicious of unreflected acceptance in his criticism of computational procedures; however, there is little reflected acceptance to be seen in detail as well. Conversely, one arguably cannot speak of outright rejection by Dobson at any point either. Dobson's own approach, however, leaves open the question as to how a reflected rejection can be distinguished from an outright rejection. Here, the formulation of what Dobson initially described as an extension of »the horizon of interpretative possibilities, in the Husserlian sense« by digital humanities would be helpful (x). In the beginning, he states: »This book examines the conditions of what might be called the seen and unseen of the interpretive scene in computational criticism« (ibid.), but he does not refer back to this explicitly.
In part, Dobson's self-declared goal of ascribing structuralism and thus an inadequate approach to the digital humanities approaches seems to stand in the way of his project. This goal could also have been used as an additional reading perspective for his book. Since the debate as to whether structuralist or post-structuralist approaches should be the basis for digital humanities analyses is not an intrinsic digital humanities debate, it was not regarded as decisive for the discussion of the critical digital humanities. Even though the debate has been fueled by (mainly academic) developments in digital humanities, there is no compelling connection between digital humanities and (neo-)structuralism. At least not beyond the fact, also mentioned by Dobson, that structuralist approaches can be more easily computationally modelled.
While computational approaches and thus digital humanities are in principle open to all literary studies approaches to text analysis, the more historical and overall situated an approach is, the greater are the challenges it poses for modelling and the more profound is the necessary human intervention in the actual analysis. However, there is no reason why deconstructivist approaches should not be approached computationally. On the contrary, it is precisely those approaches that go beyond the text that advance the discourse on a suitable computational implementation of literary practices. Such approaches involving context will very probably be seen as an enrichment also by those who in Dobson’s view adhere to the structuralist paradigm.
Additionally, without the objection of structuralism, the requirements for the critical digital humanities are higher: If the reference to structuralist approaches is no longer sufficient to criticize an approach, other, more elaborate criteria, must be adopted for digital humanities criticism. This is crucial precisely because the challenge in digital humanities is to mediate between the more discursive, coherent and plausibility-oriented procedures of the humanities and the computational procedures that are more oriented towards repeatability and probability. Thus, quality criteria must be developed that do justice to both.
5. Conclusion: Critical Digital Humanities as an ambitious endeavor to be continued
In his book, Dobson provides a broad perspective on core issues of text analytical digital humanities: the effect of digitization, the situatedness of text-mining, the need for context and for understanding biases in algorithms. These issues are discussed with numerous references to both reflections on digital humanities issues as well as implementation-oriented approaches.
Dobson’s merit is that he tries to integrate numerous discourses and to combine fundamental digital humanities procedures with reflections from literary studies. The majority of his statements aim to highlight the situatedness of digital texts and computational approaches, and they do so convincingly. Dobson's plea for modelling literary (or humanities) textual analysis to go beyond the texts, include historical contexts and the subjective perspective of the researchers is sound and of great importance.
The humanities, however, have a duty to go further: Reflection and criticism should be followed by suggestions of ways of doing things better. The humanities and especially literary criticism are in demand of providing even more adequate, situated, and transparent approaches to computational text analysis. The goal should be to establish a variety of computational approaches in a characteristically humanistic way. These should in turn be discussed discursively and in terms of their epistemological gain as well as of the resulting epistemological consequences for the whole process of computational textual analysis.
There are already approaches that try to implement this, and Dobson does name some of them. However, none of the positively discussed approaches are taken from the field of computational text analysis in literary studies and most of the research is North America-centered. Therefore, research for more positive examples within the field and with a broader geographic scope seems to be indicated.
Additionally, the approaches that are presented in the book are not developed to such an extent that they could serve as an orientation for a critical digital humanities practice exceeding the critique of existing approaches.
The somewhat metaphorical reference to the bag-of-words approach (cf. 78), or Dobson's demand for »algorithmic failure« as to be opposed by the humanities to algorithmic success (64), are less helpful here, since they ignore core computational models or procedures.
Dobson's workflow model, although having the potential to serve as a model for the conception and criticism of computational approaches in the spirit of the critical digital humanities, is not explicated appropriately either. In his general workflow model, Dobson introduces a step of »selection« before the computational processing, which is also the only decision process in the workflow otherwise consisting mainly of processes/activities (9). Thus, he introduces the situatedness and interpretation dependency at an early stage and opposes the separation of method and interpretation criticized by him. What he leaves out here is a more comprehensive integration of decision making processes in the workflow. And its adaptation for concrete workflows in the critical digital humanities.
The same applies to the strong fourth chapter, »The Cultural Significance of k-NN«. In this balanced and comprehensible chapter Dobson shows what the critical digital humanities he is calling for can achieve. But here, too, he limits himself to a critique.
Only his concluding sketch of the use of word embedding methods can be used as a starting point for a text analysis in the sense of the critical digital humanities, even though it is still quite general.
In summary, while there are some issues with implementation-related aspects in the book, Dobson provides a thought-provoking overview of critical views on digital humanities. He points repeatedly and with vigor at crucial aspects to consider when doing digital humanities in the tradition of literary criticism.
From here we should proceed and think about concepts for the development of approaches in the sense of the critical digital humanities. Among others, we should answer the questions Dobson’s book poses, at least implicitly, such as: How can we exploit a (or, ideally, several) theory of meaning in computational terms that goes beyond intratextual contexts? Which machine learning methods do not establish a dictatorship of the majority or of the known, or can even model deviations from this in terms of their innovativeness, in order to analyze developments in literary history?
Such alternative approaches to the dominant, implicit epistemological and political standpoints in algorithms and their application criticized by Dobson would be highly interesting. The challenge will be to design them according to the knowledge and expertise present in literary criticism and computational text analysis.
[1] John Rupert Firth, Papers in Linguistics: 1934 – 1951, London 1957. [zurück]
[2] Katherine Bode, The Equivalence of »Close« and »Distant« Reading; Or, Toward a New Object for Data-Rich Literary History, Modern Language Quarterly 78.1 (2017). [zurück]
[3] Phillip R. Polefrone/John Simpson/Dennis Tenen, Critical Computing in the Humanities, in: Constance Crompton/Richard J. Lane/Raymond George Siemens (ed.),Doing Digital Humanities: Practice, Training, Research, New York 2016, 85–103. [zurück]
[4] Jockers, Matthew Lee, Macroanalysis. Digital Methods and Literary History, Urbana 2013. [zurück]
[5] Richard Jean So, »All Models Are Wrong«, PMLA 132 (3, 2017), 668–73. [zurück]
[6] Chris Anderson, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Wired, 23 June 2008. [zurück]
[7] Amy E. Earhart, Traces of the Old, Uses of the New: The Emergence of Digital Literary Studies. Ann Arbor 2015. [zurück]
[8] Dominick LaCapra, Soundings in Critical Theory, Ithaca London 1989. [zurück]
[9] Wendy Jui Kyong Chun, We’re All Living in Virtually Gated Communities and Our Real-Life Relationships Are Suffering’, Wired UK, 13 April 2017, https://www.wired.co.uk/article/virtual-segregation-narrows-our-real-life-relationships (24.09.2019). [zurück]
[10] John MacCormick, Nine Algorithms That Changed the Future - the Ingenious Ideas That Drive To, Princeton 2013. [zurück]
[11] Antoinette Rouvroy, The End(s) of Critique: Data Behaviourism versus Due Process, in: Mireille Hildebrandt/Katja de Vries (ed.), Privacy Due Process and the Computational Turn: The Philosophy of Law Meets the Philosophy of Technology, London 2013, 143–167. [zurück]
[12] Matthew L. Jockers, Text Analysis with R for Students of Literature, New York 2014. [zurück]
[13] Geoffrey Rockwell⁄ Stéfan Sinclair, Hermeneutica: Computer-Assisted Interpretation in the Humanities, Cambridge, Massachusetts 2016. [zurück]
[14] Andrew Goldstone/Ted Underwood, The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us, New Literary History 45 (2014), 359–84. [zurück]
[15] https://github.com/mjockers/syuzhet (24.09.2019). [zurück]
[16] Bing Liu, Sentiment Analysis and Opinion Mining, San Rafael 2012. [zurück]
[17] Andrew J. Reagan/Lewis Mitchell/Dilan Kiley/Christopher M. Danforth/Peter Sheridan Dodds, The Emotional Arcs of Stories Are Dominated by Six Basic Shapes, EPJ Data Science 5 (1, 2016): 31. [zurück]
[18] Jianbo Gao/Matthew Jockers/John Laudun/Timothy Tangherlini, A Multiscale Theory for the Dynamical Evolution of Sentiment in Novels, Proceedings of the International Conference on Behavioral, Economic and Socio-cultural Computing 2016. [zurück]
[19] Jodie Archer/Matthew Lee Jockers, The Bestseller Code: Anatomy of the Blockbuster Novel, New York 2016. [zurück]
[20] J.-B. Michel/Y. K. Shen/A. P. Aiden/A. Veres/M. K. Gray/The Google Books Team/J. P. Pickett, et al., Quantitative Analysis of Culture Using Millions of Digitized Books’. Science 331 (6014, 2011), 176–82. [zurück]
[21] Hughes, J. M., N. J. Foti, D. C. Krakauer, and D. N. Rockmore, Quantitative Patterns of Stylistic Influence in the Evolution of Literature, Proceedings of the National Academy of Sciences 109 (20, 2012), 7682–86. [zurück]
[22] Saidiya V. Hartman, Scenes of Subjection: Terror, Slavery, and Self-Making in Nineteenth-Century America, New York 1997. [zurück]
[23] Lauren F. Klein, The Image of Absence: Archival Silence, Data Visualization, and James Hemings, American Literature 85 (4, 2013), 661–88. [zurück]
[24] Todd Samuel Presner/David Shepard/Yoh Kawano, HyperCities: Thick Mapping in the Digital Humanities. MetaLABprojects, Cambridge, Massachusetts 2014. [zurück]
2020-01-24
JLTonline ISSN 1862-8990
Copyright © by the author. All rights reserved.
This work may be copied for non-profit educational use if proper credit is given to
the author and JLTonline.
For other permission, please contact .