Network Visualization with Gephi

By: Ashlyn Keil

What is Gephi?

Gephi is a software program used for network visualization and analysis, aimed to be used by data analysis but can really be used by anyone, because it’s free!

It’s used to create 3D network visualization to, according to their website, “intuitively reveal patterns and trends, highlight outliers, and tell stories” with the data that you are using. For Digital Humanists, if you have or create your own set of data based on a text, you can create a network!

Below is an example of a network made my Gephi. The circular bubbles that vary in size are called nodes, and the lines that connect them are called edges, arcs, links, ties, or relations. Edges are the relationships between the nodes, and can be whatever you want.

Edges can be directed, which connect one node to another with an arrow, or undirected, which is just a simple line. Edges can be weighted as well, heavier or lighter depending on what criteria you set for the data. Each color shows the type of node/how they are organized. Gephi also measures the centrality of nodes, so those that are the largest are the most important to the network.

For instance, each node could be character in a book, and each color is the family that they belong to. The edges/relationships could then be who they talk to, heavier for those who have talked the most number of times with each other.

(image courtesy of Gephi’s WordPress Blog)

This example is a basic network, with only one kind of node, versus a 2-mode/bimodal/bipartite network or multimodal network. According to Scott Weingart in “Desmystifying Networks” the Journal of Digital Humanities, “2-mode networks are difficult enough to work with, but once you get to three or more varieties of nodes, most algorithms … simply don’t work.”

So, it is highly suggested to use Gephi with single mode networks, and draw out ways to ask more interesting questions of your text rather than making conclusions from your network.

Outlined below are two networks that were made using Shakespeare’s Hamlet in our class. Our data for our nodes looked like this:

And the edges for the examples outlined below, the direct conversations, where the sources and targets correspond to the ID #s on the nodes:

Classmate Example A: Direct Conversations Organized by Class in Hamlet

Megan created a Gephi network in which the nodes are characters, the directed edges are direct conversations, and the colors are classes (Members of the Court are purple, Commoners are green, and light blue are Soldiers). She used the Force Atlas layout in Gephi, and spread apart the nodes so it was easier to see who was talking with who.

She noted that, “It was obvious to note that Hamlet is the most prominent character, conversing with almost everyone,” but most characters “do not converse with people from different classes.” She also mentions that Hamlet is the character that connects “all these different classes together, while Ophelia only ever converses with people from her same class.” King Hamlet is an outlier, a node with no edges connecting them to anyone, which can provide some interesting questions on the differences between King Hamlet and Hamlet and the importance of titles, or even questions concerning the centrality of young Hamlet’s father.

I would add that it is telling to see how many heavily-weighted edges are directed toward Ophelia compared to the edges directed out from her node—she is talked-to more than she talks to others. This could add to a character analysis of Ophelia if a scholar were to go back to the text and analyze those conversations.

This network would also be interesting if it were to add a self-loop, in which an edge coming out of Hamlet, for example, would loop back in on itself. Since Hamlet talks to himself often, it would be telling to see these kinds of edges as well. This would necessitate a change in the data set to include character-to-character conversations.

Classmate Example B: Direct Conversations Organized by Gender in Hamlet

Kyle created a Gephi network using indirect conversations as the edges and characters as the nodes, but organized by gender. He used the Fruchterman Reingold layout because he “thought it would produce a particularly interesting graph.”

He notes that the two female characters, Ophelia and Gertrude, are “obviously vastly outnumbered” by the male characters, and “it was interesting to see how Gertrude was talked about by Hamlet almost as equally as by the first player.” He also mentions how “Ophelia, on the other hand, was most talked to by Hamlet.”

I would also note that the size of Ophelia, Claudius and Gertrude’s nodes are fairly similar, which is interesting considering that Claudius is much more a central character, being the one Hamlet wants to enact revenge upon. This might also mean that they are more pawns in the conflict between the two male characters as well. This network could spark a gender and/or feminist analysis of the original text.

Gephi Networks: What Do They Provide Us?

Gephi networks for Digital Humanists cannot give us definitive conclusions, but can help raise more questions about the text that we are using. Unexpected results can provoke a new question about the text we would have never asked before: for instance, is Horatio more of a central character in Hamlet than a reader would originally think? If so, how is that done, and how is it hidden in the text and revealed in the network visualization?

Another example could be if you created a network visualization of novel in which the characters were nodes and the edges were direct actions against the antagonist: maybe the character who had the largest node and the heaviest edges was never intended to have the most effect on progressing the novel in this way. This could lead to a rethinking of the character and novel itself!

Though the transformation of information such as character information, location, plot points, and any other interesting tidbits into numerical data to input into Gephi is time-consuming and tedious, it means that we have control over exactly what data we are collecting and what we really want to see in a 3D visualization. And though the networks cannot give us definitive data since it has been reduced in this way, it is an easier way to see what we cannot just by reading a text.