Use the slider to adjust links:
While most of my work on the Shakespeare corpus thus far has been on the analysis of linguistic variation to trace the development of genre, I have been keen on finding ways to visualize plot structure. Franco Moretti's essay, "Network Theory, Plot Analysis" in New Left Review last year, delved into this problem of using network visualization to account for plot through an interesting analysis of Hamlet as a series of character-networks. Moretti notices the inherent problems of this approach that results in "time turned into space" as the linear progress of the play is mapped onto a spatial two-dimensional diagram representing interactions between characters. However, one might argue that to conceptualize plot at all is to move away from the immediacy of the play's linear progress, and to abstract from or perhaps even impose on the play a somewhat arbitrary structure of relations. Moretti gestures at this arbitrariness when he notes the apparent absurdity of discussing Hamlet purely as a function of character-interactions with no reference to Shakespeare's language. However, if we consider most digital approaches to textual analysis - from n-gram based approaches to topic models - they find it hard to register the sense of plot development, instead treating texts as bags-of-words. In this sense, the abstraction of structure from the text is a necessary complement to other digital approaches. Drama, more than any other narrative form, may be thought of as bouncing characters off each other with minimal intervention from an authorial (or authoritative) voice. One might say that all drama, at its very core, can be thought of as the pure interaction of character - a pure comedy of humors! If so, perhaps a network of bouncing nodes pulling and pushing at each other is not a bad metaphor for dramatic form after all.
Building the Network
This visualization grew out of some work on network analysis (partially described here) that I was doing in collaboration with Prof. Jonathan Hope. As I explored network algorithms, it seemed that with a properly curated set of texts encoded in TEI or some XML format, it would be reasonably easy to extract a set of relations between the characters. Some bit-twiddling in Python allowed me to parse Jon Bosak's wonderful XML encoded edition of the Shakespeare corpus to retrieve a set of weighted edges among all characters who speak to each other. Essentially I scanned each scene for consecutive speakers. The assumption here is that if B speaks right after A, s/he is responding to A and hence I count this as a relationship (or an edge) within the network. Of course, there can be dramatic situations where this does not necessarily hold - Falstaff's uppity chatter in the presence of royalty after the battle might be an example - but it seems it would be a fair assumption to make for the overwhelming majority of dramatic situations. Once the relationships were established, I assigned weights to each character (or node) as a simple percentage of lines spoken. But giving weights to individual interactions turned out to be trickier. To get a proper dramatic sense of the characters' relative importance I weighted each speech both by length and the dramatic importance (node weight) of the addressee. This seems to give the best representation of relative importances. Thus, a line spoken to Hamlet by a minor character gets slightly more weight than a lines spoken among two minor characters, and conversations among major characters get correspondingly more weight. One might think of many exceptions where a character's relative importance cannot be mapped by such quantification, and the cross-weighing of speakers addresses precisely such situations. Overall it seems that this algorithm is remarkably efficient at picking up and assigning proper weights to significant interactions.
Once extracted, relations are visualized as a force directed network with proportionately sized nodes. Force-directed algorithms deploy a kind of toy-physics where nodes try to arrange themselves to achieve a set of target distances among each other and continue readjusting their position until the distance between a given node-pair cannot be significantly improved without making the distance between another pair much worse. Note that this relatively stable condition may be reached without necessarily achieving the target distances and with nodes that have multiple edges this is almost always the case because each edge puts a constraint on the node's position. Thus, while networks can behave like pseudo-clustering algorithms, it is not the positioning of the nodes, but rather the edge-relations within them that represent the significant information.
Tweaking the Network: Exploring Patterns
To begin with, the graphs represent every relationship in the play and can contain a large number of edges, and therefore a large number of constraints on the positioning of individual nodes. How can we bring out more significant plot patterns from this network? We might think of snipping off the most insignificant links and preserving the more important ones to get a better sense of structure, and the slider under the visualization-panel lets us do just that. When placed at the extreme left, the slider preserves every edge and node, while at the extreme right position, only the most significant relationship in the play is preserved. Nodes that no longer have any edges vanish to clear clutter. Playing with the slider might require a little trial-and-error since small movements over a section can have significant effects while there might be other stretches where no new edges are cut.
As you tweak the network you will notice that while edges approximate their ideal distances better and better, unconnected nodes don't respond to each other or arrange themselves mutually in any meaningful way. Thus, for example, in the visualization of 1 Henry IV, the thieves might occasionally land amidst the noblemen even though they are not connected by a node. This does not represent any significant relationship (it is a function of a "gravitational field" to prevent nodes flying off the canvas) and it might aid interpretation if such nodes are simply dragged apart. In fact, dragging node-groups is a great way to explore the networks, highlight interlinkages and centrality, and most importantly arrange groups of nodes together to reflect particular interpretations of plots.
Overlaying Gender and Status
The drop-down selections allow you to switch plays but also to switch the way the nodes are color-coded. Gender and status are the currently available categories, although much of this information is still missing and will only eventually be filled in (see below). "Not available" is visualized a black for status and gray for gender. While gender tags are clear for most characters, the notion of coding by "status" is fraught with danger. When I was writing my dissertation, I would be paranoid about deploying status or the concept of "class" in the context of early modern England. The only cure for this fear of anachronistically projecting nineteenth century ideas and social structures on sixteenth century England was voluminous apologetic footnote references to the Marxist and Weberian traditions and somewhat more defiant citations of the many recent historians who have fruitfully discussed class and status in the period. Thus, while this is a problem best discussed at length elsewhere, let me just note some of the underlying assumptions of this encoding. It is of course problematic to take Shakespeare's England and its social structure as the benchmark for plays that are either set elsewhere, or in different periods, or both. What might have been extremely productive and relatively easier to agree upon for, say, City-comedy, becomes a muddier problem for many Shakespeare texts. Also, such tagging can involve major interpretive decisions - for example, I decided to tag Sir John Falstaff with the vagrants and criminals that keep him company because I felt this represented his role better than the position of knight that he officially holds. This decision arises, of course, out of a particular reading of the play - of Hal as torn between and linking two distinct worlds - rather than any objective and absolute measure of social position. However, given that our aim here is to explore plots in innovative ways, even such subjective encodings can add great insights.
A Little Crowdsourcing?
Some rudimentary crowdsourcing would not be out of place here to help me gather more data for the status and gender categories. To this end, I've put up a simple spreadsheet on Google Docs(edit: this is no longer active) that has four columns: the first two are uneditable but I'd appreciate some help gathering information for the last two columns which represent gender and status. The gender column should be self explanatory and I've already managed to mine enough data from Wikipedia to mostly fill the entries in it, although there are still many slippages and mistakes that need correcting. But it is the final column where I need most help. So if you're reading a play and wouldn't mind filling in some information on it, I'd really appreciate it. Every so often, I'll update the data-files with new information, and hopefully we'll eventually have enough information to cover all characters.