Simple Network Visualisation with R

There are a number of user friendly tools for visualizing networks out there which don’t require any programming knowledge. These include Cytoscape and Gephi, among others. The language R, however, has become a popular and powerful platform for data analysis, as well as the cleaning of data, visualization of texts, networks, and geographical information. R benefits from a large ecosystem of open source packages, and in recent years, a collection of them that have come to be known as the Tidyverse has made the process of exploring data significantly easier. On the network analysis front, mature packages like statnet and igraph are joined by new ones including the pair tidygraph and ggraph to make it fairly easy to visualize and explore networks in R.

Network science is an active and well developed field in the sciences and social sciences such as sociology. In history, however, while use and reference to “networks” is now very common in publications, the use of formal analytical network analysis has been rather more limited. Learn more about work in this area at the website dedicated to Historical Network Research.

Effective use of formal network analysis depends on strong familiarity with the science and mathematics of graphs. However, there are many contexts in which visualising a historical network is useful without more advanced techniques that tap the full analytical potential of network exploration. Illustrating historical research with a network graph diagram can help the reader better grasp the scope and connections of a group of individuals you may be discussing in your research. This illustrative value of social network visualisations depends in great part on the ability to craft visualisations that communicate well. Network visualisation is also a way to explore connections and patterns in your historical materials which, especially as your collection of individuals and organisations (if you create a “bimodal” network, see below) grows beyond the scale at which you can easily derive patterns by browsing a table or spreadsheet. We might call this the heuristic value of social network visualisations. They may include some use of basic network analysis tools, but it uses them usually as a path to finding new questions to ask about your material, or as a way to cast the spotlight on possible patterns that you can explore in depth, perhaps returning to other sources and methods. This tutorial is primarily for students and scholars in the humanities who are interested in network visualisations for their illustrative and heuristic potential, but who may want to gain some familiarity and exposure to its analytical potential.

This tutorial was designed for history students in a masters level skills module at the University of St Andrews MLitt programme in Global, Transnational, and Spatial history to get a first taste of how R might be used to explore historical networks. In this exercise we will practice creating some simple network visualisations using a fictional network of East Asian gangsters and revolutionaries.

Prerequisites: My students working with this tutorial R Notebook have done a little bit of previous work with R and text analysis with material from Text Mining with R by Julia Silge & David Robinson, Text Analysis with R for Students of Literature by Matthew Jockers, and read some of A User’s Guide to Network Analysis in R on igraph as well has having completed the DataCamp module Introduction to the Tidyverse. I would suggest trying this tutorial if you have had a least some basic introduction to R and familiarity with RStudio.

This tutorial is inspired by or adapts material by Jesse Sadler and Douglas A. Luke, among others (see the bibliography below). It was created as an “R Notebook”" which can be used by anyone directly if you have R installed and open the file in RStudio. You can download the files used in this tutorial here in the github repository. If you open this notebook in RStudio, you will see the code and can run all of it in one go with cmd/ctrl-option-r. Alternatively, you can run code from a single section using cmd/ctrl-shift-enter. Many of the questions below ask you to see what happens when you tweak some of the code found here.

For this exercise you need the packages:

readr dplyr tidyr stringr fircats tibble ggplot2 ggraph tidygraph igraph visNetwork scales

In RStudio “Install Packages…” from the Tools menu and you can paste in the above list of packages separated as they are by a space and press Install. After the packages are installed (you may have had some of them already), then load them as follows:

library(readr)
library(igraph)
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
library(forcats)
library(ggplot2)
library(ggraph)
library(tidygraph)
library(visNetwork)
library(scales)

Importing the Data

Now we need to get our data into our network.

The raw data that is used for network analysis and visualisation is usually in the form of edge and node tables. When visualised, these are the lines and points of a graph diagram. The nodes of your network are very often indivduals along with any attributes tied to those individuals. You can style the nodes in your network graphs using these attributes.

Even more important than the nodes are a table of edges which contain the relational information of your network: the relationships between your agents, or between agents and organisations in the case of bimodal graphs (see below). These relationships may also have attributes that can be visualised with styling. John Scott’s introductory text Social Network Analysis has a nice chapter on considerations for collecting and organising your data for analysis and visualisation you might want to consult.

Put the nodes.csv and edges.csv files that I have shared with you into the working directory where you should be keeping this notebook file (or set the working directory to the write place in the Session menu).

We will now load the nodes into a nodes data frame and edges into an edges data frame. The head() command with 10 as a parameter will give you a peak at the contents of each file.

nodes<-read_csv("nodes.csv")
edges<-read_csv("edges.csv")
head(nodes,10)

person	location	age	nationality	mentions	discuss	gender
Tomohiko	Tokyo	22	Japan	14	1	m
Kyŏngmin	Seoul	55	Korea	12	0	m
Jiurong	Shanghai	44	China	3	0	f
Sangok	Pusan	33	Korea	5	0	m
Yoshinobu	Tokyo	66	Japan	67	1	m
Wei	Qingdao	57	China	30	1	f
Songbae	Seoul	36	Korea	26	0	m
Minjun	Pusan	55	Korea	4	0	m
Hayun	Pusan	22	Korea	2	0	m
Minjae	Pusan	30	Korea	12	0	m

head(edges,10)

from	to	kind	intensity	year_start	year_end
Chŏngsu	Minjae	3	3	1907	1921
Hayun	Minjae	3	1	1902	1943
Jiurong	Tomohiko	3	1	1896	1947
Kyŏngmin	Jiurong	3	1	1895	1920
Minjae	Chŏngsu	3	3	1907	1921
Takamasa	Kei	3	2	1898	1934
Tomohiko	Jiurong	3	1	1910	1936
Wei	Guoran	3	3	1872	1920
Yoshinobu	Tomohiko	3	2	1898	1915
Chŏngsu	Yŏngsik	2	1	1919	1931

Notice that the nodes have age, nationality, location, mentions (let us say this is number of times they appear in some source or collection of sources). I have also an arbitrary binary discuss column where I have manually flagged up a few important characters I might want to emphasise.

When preparing a collection of nodes and edges for network visualization it is usually best to have a column in the nodes table with unique id numbers that are used as a reference key to all other information about that agent. Then, in the edges table, you would see only the relevant id numbers, instead of the names. However, for this simple example, to increase the readability of the files as we learn the basics, I have chosen to use the given name the fictional individuals (there are just one or two real given names that fit the description of individuals for this network to add to the fun for East Asian historians) without any special id column.

Creating a Simple Network

Let us create a network object from our nodes and edges:

my_network=graph_from_data_frame(d=edges,directed=TRUE,vertices=nodes)

This creates an igraph network object, but it is a format that is easily understood by ggraph and most of its features. Later in this exercise we will convert this to a tidygraph tibble graph. For now, we can very easily create a simple graph diagram using the ggraph() command. It works in a very similar fashion to ggplot, which it is an expansion of. You tell it the network to use, assign a layout type, then add options. In this case we will simply add a geom_edge_link() which will give us the edges, and a geom_node_point() which will display points.

ggraph(my_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link() +
      # Add a node point geometry
      geom_node_point()

This is very simple. We can see that it is placed on an x,y axis and looks like a kind of special ggplot diagram. There is lots of things we might want to do to improve this.

Adding Labels

Let us start by adding labels to the graph. Under geom_node_point() we will add a geom_node_label(). The aesthetics we will give it are to connect its label to the name column of our nodes, set the font face.

Then, back outside the aesthetics aes() we will set the transparency level to 60% (alpha=0.6). This may seem like somethign we would put inside the aesthetic, but because we are giving it a specific value, and not mapping it to our data, it goes outside. This will allow us to see any edge lines and nodes behind the label.

We also add the repel = TRUE here to help with the formatting of the location of the labels.

ggraph(my_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link() +
      # Add a node point geometry
      geom_node_point() +
    geom_node_label(aes(label = name),family="serif", alpha=0.6, repel=TRUE)

Questions 1

Try the following questions below.

What happens if you remove the repel=TRUE (remember to cut out the trailing comma too)?
How would you show the age instead of the name?
What would you do if you wanted remove the nodes altogether and just use labels instead of nodes? Try this without transparency and removing the repel feature.

Adding a Theme

Both ggplot and ggraph can work with “themes” that store lots of custom settings that we can apply to our graphs. You can store a theme in a function that calls the theme and then add that theme function to any graph you call. You might, for example, create several to match different purposes. See the R for Data Science book, or ggplot: Elegant Graphics for Data Analysis or the DataCamp class Communicating with Data in R (Tidyverse) or just run ?theme for more on themes.

Let us create a theme to use for our graphs. They will make the background a light grey, extend the margins, remove the axis text, ticks, and titles. It will also remove all the grid lines.

network_theme<- function() {
  theme(
    text=element_text(family="serif",face="bold"),
    plot.background = element_rect(fill="gray95"),
    plot.margin = unit(c(20,20,15,10), units="mm"),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank()
  )
}

We need only call network_theme() at the end of our graphs to apply these settings.

Now in our next graph, notice the changes caused by our theme and no other changes:

ggraph(my_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link() +
      # Add a node point geometry
      geom_node_point() +
    geom_node_label(aes(label = name), family="serif", alpha=0.6, repel = TRUE) +
    network_theme()

Adding Labels and More

Now let us begin adding some more things to our graph. Using the labs() function, we can add a title, a caption on the source of the data at the bottom right. Also in labs() we can rename the legend titles. Notice I use the escaped n character in one case to create a two line legend header. Notice that, in the case of the edge width, I had to use an “edge_” prefix before naming the legend header.

We’ll also make some other additions to our graph diagram. In the geom_edge_link() aesthetics, we will tell it to vary the width of the edgbe by the intensity column of our data. Then outside the aesthetics we will fix the color of the lines to a mid level grey.

We can control the scale of the width with the scale_edge_width() function, which sets the range to a minimum of 0.2 in width and a maximum of 2, scaling the numbers to something within that range.

For our nodes, our aes() now scales the size of the node by the number of mentions in the sources, and the color of the nodes according to the nationality.

ggraph(my_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link(aes(width=intensity),colour="grey50") +
    # What happens when you change grey50 to grey20 or grey90? 
    scale_edge_width(range = c(0.2, 2.0)) +
      # Add a node point geometry
      geom_node_point(aes(size=mentions,color=nationality)) +
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      size = "Source\nMentions",
      color = "Nationality",
      edge_width = "Weight"
    ) +
    geom_node_label(aes(label = name), family="serif", alpha=0.6, repel = TRUE) +
    network_theme()

Questions 2

How would you change the code so that the transparency varied according to mentions?
How would you vary the color by location rather than nationality?

Showing Direction with Arrows

We have a directed network, meaning that relationships between two inviduals may only go in one or sometimes in both directions. You can add arrows to the geom_edge_link() as below. Notice I also switched from width to transparency to show varying intensity and instead of showing nationality with node color, now show the gender.

ggraph(my_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')), 
                   end_cap = circle(1.3, 'mm'), aes(alpha = intensity)) +
    # What happens when you change the 1.5 in the length? or the 1.5 in the end_cap circle?
    scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
      # Add a node point geometry
      geom_node_point(aes(size=mentions,color=factor(gender))) +
    scale_colour_manual(values=c("f"="green4","m"="sandybrown")) +
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      size = "Source\nMentions",
      color = "Gender",
      edge_alpha = "Weight"
    ) +
    # Notice I used the edge_ prefix in order to give a legend title to the edge attribte
    geom_node_label(aes(label = name), family="serif",alpha=0.6, repel = TRUE) +
    # Try changing geom_node_label to geom_node_text above. What happens?
    network_theme()

Questions 3

Change the code so that instead of varying the size of the node by the number of source mentions, it adjusts the size by the discuss column of the node table. This is a number 0 or 1.
In the scale command for the geom_node_point, scale the size from 2 to 8 by adding a scale_size()

Creating a Subgraph by Filtering on an Edge attribute

One of the variables we have that we haven’t used is the kind column in the node data, which is a number from 1-3. What if we wanted to create a second diagram that only shows those relationships which are of kind 3?

The tidygraph library has a nice activate() method that allows you to manipulate nodes and edges or filter them in various ways. Instead of calling activate(edges) before manipulating edges, there is also a nice shortcut, with %E>% instead of the usual pipe or %N>% to work with your nodes. For this we need to take our igraph network and convert it to a tbl_graph with as_tbl_graph() and then we can use the filter() command to find just the edges which have a kind==3. If graphed this immediately, we would see the filtered edges, but also a number of isolated nodes no longer connected to the rest of the graph. We can activate the node layer and then filter out the isolated nodes with filter(!node_is_isolated()).

as_tbl_graph(my_network) %E>%
  filter(kind==3)  %N>%
  filter(!node_is_isolated()) %>%
  ggraph(layout = "kk") +
      # Add an edge link geometry
      geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')), 
                   end_cap = circle(1.5, 'mm'), aes(alpha = intensity)) +
    # What happens when you change the 1.5 in the length? or the 1.5 in the end_cap circle?
    scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
      # Add a node point geometry
      geom_node_point(aes(size=mentions,color=nationality)) +
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      size = "Source\nMentions",
      color = "Nationality",
      edge_alpha = "Weight"
    ) +
    # Notice I used the edge_ prefix in order to give a legend title to the edge attribte
    geom_node_label(aes(label = name), family="serif", alpha=0.4, show.legend=FALSE, repel = TRUE) +
    # Try changing geom_node_label to geom_node_text above. What happens?
    network_theme()

Questions 4

Go back and change the filter to look only for edges of kind 2, then again for kind 1.
Instead of filtering by kind, create a graph diagram of only the people based in Tokyo, or only the Koreans in the network.
How would you create a subgraph showing only those whose relationship year_start was before 1890 and year_end after 1910? You can do this with two filter commands, or with a compound & statement.

Community Detection

Network scientists have developed a variety of algorithms to detect communities in a network. While the analytical value of this algorithmically derived grouping in the context of historical research may be limited, for larger networks, it can help you identify clusters to explore. For more on this read the chapter on “Subgroups” in the book A User’s Guide to Network Analysis in R. The tidygraph package inherits many of the community detection algorithms imbedded into igraph and makes them available to us, including Edge-betweenness (group_edge_betweenness), Leading eigenvector (group_leading_eigen), Fast-greedy (group_fast_greedy), Louvain (group_louvain), Walktrap (group_walktrap), Label propagation (group_label_prop), InfoMAP (group_infomap), Spinglass (group_spinglass), and Optimal (group_optimal). Some community algorithms are designed to take into account direction or weight, while others ignore it. Below we try Walktrap, which is not, in fact, designed for directed networks, but try comparing its results with other community detection algorithms and note the differences.

as_tbl_graph(my_network) %>% 
  to_undirected() %>%
  mutate(community = as.factor(group_walktrap())) %>% 
  ggraph(layout = "kk") +
      # Add an edge link geometry
      geom_edge_link(aes(alpha = intensity), show.legend = FALSE) +
    # No longer need the arrow because we have made our graph undirected  
    scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
      geom_node_point(aes(size=mentions, color=community)) +
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      size = "Source\nMentions",
      color = "Community"
    ) +
    geom_node_label(aes(label = name),family="serif",alpha=0.6, repel = TRUE) +
    network_theme()

Questions 5

Had you done so manually, would you have divided up the graph into “communities” along these lines? Which assignments by the algorithm look out of place to you?
Try the other community detection algorithms and compare the results.

Bimodal Networks

Bimodal, bipartite, or affiliation networks have two different types of nodes and generally only link between the two types of nodes. As the term “affiliation network” suggests, this is often in the form of the affiliation of an individual to an organisation of some kind.

Let us import a list of edges between individuals and organisations.

affiliations<-read_csv("org-edges.csv")
head(affiliations,10)

From	To
Tomohiko	Toilers of the Great East
Jiurong	Green Crane Society
Minjun	Workers Alliance
Hyejin	East Wind
Yoshinobu	Kawakami-gumi
Wei	Great Harmony Society
Wei	Green Crane Society
Hyejin	Toilers of the Great East
Kyŏngmin	Toilers of the Great East
Sangok	East Wind

We have now a table with relationships between indivdiuals and organisations, but it would be nice to create a merged node table which joins all the attribute information from organisations, which includes the location of the organisations’ headquarters, and all the attribute data for individuals. We can use full_join() for this.

# 
org_nodes<-read_csv("orgs.csv") # Orgs have a name, but also a HQ location

merged_nodes<-full_join(nodes,org_nodes,by = c("person" = "Name"))
tail(merged_nodes,20)

person	location	age	nationality	mentions	discuss	gender	HQ
Yŏngsu	Tokyo	31	Korea	34	0	f	NA
Yōsuke	Nagoya	26	Japan	10	0	m	NA
Kei	Osaka	24	Japan	3	0	m	NA
Senjūrō	Kagoshima	35	Japan	1	0	m	NA
Masahirō	Kagoshima	41	Japan	1	0	m	NA
Takamasa	Kōchi	45	Japan	4	0	m	NA
Michiō	Niigata	37	Japan	1	0	m	NA
Kanno	Osaka	32	Japan	44	1	f	NA
Fumiko	Seoul	29	Japan	31	1	f	NA
Kikue	Tokyo	40	Japan	14	0	f	NA
Zhen	Yizheng	30	China	29	1	f	NA
Jongmyung	Seoul	23	Korea	10	0	f	NA
Toilers of the Great East	NA	NA	NA	NA	NA	NA	Pusan
Green Crane Society	NA	NA	NA	NA	NA	NA	Beijing
Workers Alliance	NA	NA	NA	NA	NA	NA	Seoul
East Wind	NA	NA	NA	NA	NA	NA	Shanghai
Kawakami-gumi	NA	NA	NA	NA	NA	NA	Tokyo
Iwaguchi-gumi	NA	NA	NA	NA	NA	NA	Kagoshima
Great Harmony Society	NA	NA	NA	NA	NA	NA	Beijing
Red Wave Association	NA	NA	NA	NA	NA	NA	Tokyo

Now we can create a network object from this merged information. In order to keep track of what nodes are part of each mode (individuals or organisations) we’ll add a type column to the node data that will get a TRUE value if it is one of the organisations.

affiliation_network=graph_from_data_frame(d=affiliations,directed=FALSE,vertices=merged_nodes)
V(affiliation_network)$type<-V(affiliation_network)$name  %in% org_nodes$Name

Now we can great a graph diagram of our bimodal network. In the code, I have made a few customisations to our usual graphs above by setting the shape of the node to correspond to whether it is an individual or an organisation and then chose a circle (ggplot shape number 19) or a square (15). I increased the fig_width to make the chart wider, and used some conditionals in the form of ifelse() to conditionally distinguish the organisations by color, and only assign labels to individuals.

Note: If you run this code in R Studio, note the difference between the appearance of the plots within R Studio and the exported web page version.

ggraph(affiliation_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link() +
    geom_node_point(aes(size=type, shape=type, color=ifelse(type==1,as.character(name),NA))) +
    scale_color_discrete(breaks=unique(affiliations$To)) +
    scale_size_discrete(range=c(2,4), guide=FALSE) +
    scale_shape_manual(values=c(19,15), guide=FALSE) +  
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      shape = "Person or\nOrganization",
      color = "Organization"
    ) +
    geom_node_label((aes(label = ifelse(type==0,as.character(name),""))), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
    network_theme()

You can also use a special bipartite layout for the graph that produces a hierarchical look. Sometimes the tree layout will also produce a desirable effect as well.

ggraph(affiliation_network, layout = "bipartite") +
      # Add an edge link geometry
      geom_edge_link() +
    geom_node_point(aes(size=type, shape=type, color=ifelse(type==1,as.character(name),NA))) +
    scale_color_discrete(breaks=unique(affiliations$To)) +
    scale_size_discrete(range=c(2,4), guide=FALSE) +
    scale_shape_manual(values=c(19,15), guide=FALSE) +  
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      shape = "Person or\nOrganization",
      color = "Organization"
    ) +
    geom_node_label((aes(label = ifelse(type==0,as.character(name),""))), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
    network_theme()

Bimodal graphs are nice for visualising the connections between two different types of things. As Scott Weingart has argued in several web posts, including his overview of bimodal networks, they are significantly more difficult to analysis using formal network analysis methods, including the challenge of exploring various forms of centrality or clustering coefficients.

They are valuable, however, as a heuristic visualisation to explore your network and discover new questions, or areas to focus in on for more research. They can also serve more simple illustrative purposes when you are exploring a historical network in your narrative and want to illustrate visually relationships between individuals and organisations or some other combination of two modes even without formal analysis being carried out.

One useful transformation of your bimodal newtorks that can be particularly useful, especially for larger networks than the one we are dealing with here, is to explore connections between the nodes in one mode or the other by means of their connections to the other mode. In our historical example, we might explore what the connectivity is between organisations based on members who tie them together, or, what connections are there between individuals by virtue of the fact that they share membership in an organisation. These are called projections of bimodal networks.

To create these projections we can use the igraph function bipartite.projection() function. This will create a list with two projections proj1 and proj2, one for each mode. Let us assign each one to its own network object and then plot them.

network_projections<-bipartite.projection(affiliation_network)
member_projection<-network_projections$proj1
org_projection<-network_projections$proj2
ggraph(member_projection, layout = "kk") +
      geom_edge_link(aes(alpha=weight, width=weight)) +
    scale_edge_width(range = c(0.1, 1.5), name="Weight") +
    scale_edge_alpha(range = c(0.3, 1), guide=FALSE) +
    geom_node_point(size=3) +
    labs(
      title = "Toilers and Gangsters, 1860-1950: Bipartite Projection of Members",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      shape = "Person or\nOrganization",
      color = "Organization"
    ) +
    geom_node_label((aes(label = name)), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
    network_theme()

ggraph(org_projection, layout = "kk") +
      geom_edge_link(aes(alpha=weight, width=weight)) +
    scale_edge_width(range = c(0.1, 1.5), name="Weight") +
    scale_edge_alpha(range = c(0.3, 1), guide=FALSE) +
    geom_node_point(size=3) +
    labs(
      title = "Toilers and Gangsters, 1860-1950: Bipartite Projection of Organisations",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      shape = "Person or\nOrganization",
      color = "Organization"
    ) +
    geom_node_label(aes(label = name), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
    network_theme()

The lines here are thicker in the cases where members were more linked to each other by mutual membership in multiple organisations. In the second plot we see that four of the organisations each share two members. Not terribly revealing in this case, but with much larger networks, this may reveal interlocking organisations with overlapping memberships that might not be immediately obvious by perusing a table of membership data.

One Plot to Rule Them All

Bimodal networks include only connections between two different modes. But there is nothing preventing you from flattening a bimodal graph and including all the edges from our unimodal network. That is, you can create a visualisation, for illustrative or heuristic purposes, that depicts both relationships between individuals and between these individuals and the organisations. Please note that if formal analysis plays any role in your exploration of these networks, this is not methodologically sound for any number of reasons. Among the issues is that we are mixing a directed network (of individuals) with an undirected network (of affiliations).

To create our mega plot, we will merge the edge table with relationships between individuals and organisations using bind_rows(), with that of individuals to individuals. For simplicity, we will first assign an intensity of 1 and type 4 to all affiliation relationships, and leave all date info as NA. We’ll also standardise the naming of the columns as “From” and “To” are capitalised in one case and not in the other. mutate() makes it easy to rename the columns.

We can then visualise all the edges together, and use various visual features to help make the plot more readable, but anyone who has used software such as Cytoscape, for example, will see that it is much easier to customise the visualisation of multiple networks together there than here, as far as I have been able to determine. Especially if the aim is just to explore your data as a part of the research and thinking process, then Cytoscape is a much easier alternative to R and igraph/ggraph.

prep<- affiliations %>%
  mutate(from=From,to=To,kind=4,intensity=1,year_start=NA,year_end=NA) %>%
  select(from,to,kind,intensity,year_start,year_end)
merged_edges<- bind_rows(edges,prep)

Now let us create a new network object with this merged edge table and our previously merged node table and plot the results:

fully_merged=graph_from_data_frame(d=merged_edges,directed=TRUE,vertices=merged_nodes)
V(fully_merged)$type<-V(fully_merged)$name  %in% org_nodes$Name

ggraph(fully_merged, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link(aes(width=intensity)) +
    scale_edge_width(range=c(0.1,1.5), name="Intensity") +
    geom_node_point(aes(size=type, shape=type, color=ifelse(type==1,as.character(name),NA))) +
    scale_color_discrete(breaks=unique(affiliations$To)) +
    scale_size_discrete(range=c(2,4), guide=FALSE) +
    scale_shape_manual(values=c(19,15), guide=FALSE) +  
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      shape = "Person or\nOrganization",
      color = "Organization"
    ) +
    geom_node_label((aes(label = ifelse(type==0,as.character(name),""))), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
    network_theme()

This plot includes too much information to communicate its contents clearly at this size. If you plan on creating complex plots, I suggest you use ggsave() (see below) to export large versions of the graph after playing with the figure widths and heights.

Other Layouts

Up until now we have been mostly using the Kamada-Kawai layout algorithm to determine the look of our network. There are a range of the other layouts you can create with the replacement of the layout type.

Below see our graph with the Fruchterman-Reingold layout.

ggraph(my_network, layout = "fr") +
      # Add an edge link geometry
      geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')), 
                   end_cap = circle(1.5, 'mm'), aes(alpha = intensity)) +
    # What happens when you change grey50 to grey20 or grey90? 
    scale_edge_width(range = c(0.2, 2.0)) +
      geom_node_point(aes(size=mentions,color=nationality)) +
    scale_size(range = c(2,10)) +
    labs(
      edge_width= "Weight",
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      size = "Source\nMentions",
      color = "Nationality",
      edge_width = "Weight"
    ) +
    geom_node_label(aes(label = name), family="serif",alpha=0.6, show.legend=FALSE, repel = TRUE) +
    network_theme()

There is also a “circular” layout, which takes a bit more tweaking of the parameters and size to get it to fit well:

ggraph(my_network, layout = "linear",  circular = TRUE) + 
  geom_edge_arc(aes(width = intensity), alpha = 0.8) + 
  scale_edge_width(range = c(0.2, 1)) +
  geom_node_label(aes(label = name), size=2.2) +
  labs(edge_width = "Weight") +
  theme_graph() +
  theme(legend.position="bottom")

Questions 6 Playing with the Layouts

Try replacing the layout="" to the following possible layouts: sugiyama,star,dh,gem,graphopt,drl and compare the results.
Why did I add the fig.width=5 option used in the case of the circular layout in the declration of the r code section. What happens if you remove it?
Why did I hard code the font size , size=2.2 outside of the aes() for the geom_node_label()? What happens if you cut that out?
What happens if you add another ggplot option (don’t forget the + on the end of the previous line!) with coord_cartesian(xlim=c(-1.5,1.5),ylim=c(-1.5,1.5))
How could I colour the labels by the nationality of the nodes? By the location of the members?
How could I set it so that the size of the labels changes according to the age of the members of the network?
How could I limit the range of the size of the fonts from sizes 2 to 4?

Adding Some Network Analysis

Although we have been using the ggraph package to visualise our network, the graph itself is an igraph object and can take advantage of all the analytical tools in igraph:

Look how easy it is to add columns, using our trusty dplyr mutate() to add columes with the betweenness, closeness, and eigenvector centrality computed for our nodes, together with the total, in, and out degrees.

nodes_analysis<- nodes %>% 
  mutate(between=betweenness(my_network, directed=FALSE),
         degree_all=degree(my_network,mode="all"),
         degree_in=degree(my_network,mode="in"),
         degree_out=degree(my_network,mode="out"),
         closeness=closeness(my_network),
         eigenvector=evcent(my_network)$vector)

Note: If your graph is in tidygraph you can also use the wide variety of centrality_ prefixed functions.

We can do a quick comparison of in, out and total degree of the nodes, which measures the outgoing and incoming relationships, or their total, minus any overlapping edges. Notice I used the fct_reorder() function from the forcats library to re-sort the names by their total degree (degree_all). Comment out that line to see what happens to the graph.

nodes_analysis %>%
  mutate(person = fct_reorder(person, degree_all)) %>%
  ggplot() +
    geom_point(aes(x=degree_in,y=person, color="In Degree")) + # , color="khaki2"
    geom_point(aes(x=degree_out,y=person, color="Out Degree")) + # , color="cadetblue3"
    geom_point(aes(x=degree_all,y=person,size=eigenvector, color="Total Degree")) + # ,color="darkslategrey"
  scale_size(range=c(0.3,4)) +
  labs(
        x = "In and Degree",
        y = "Name",
        color = "Degrees",
        title = "Degrees and Total Degrees with Eigenvector Centrality",
        subtitle = "As Seen in Toilers and Gangsters Network",
        caption = "Total degree is sum of in & out minus doubled counted edges.",
        size = "Eigenvector\nCentrality"
      )

With this data we could easily plot the relationship between various kinds of centrality. Betweenness centrality is a measure of the degree to which a node is a gatekeeper to other nodes. How many of the shortest paths between nodes must pass through a given node? Eigenvector centrality tries to judge the importance of a node by the relative connectivity of its neighbors. Read more about it here. Let us compare the two in our own network:

nodes_analysis %>%
  ggplot() +
    geom_point(aes(x=between,y=eigenvector,size=degree_all)) +
    labs(
        x = "Betweenness",
        y = "Eigenvector Centrality",
        title = "Relationship between Betweenness and Eigenvector Centrality ",
        subtitle = "As Seen in Toilers and Gangsters Network",
        size = "Node\nDegree"
      )

How about the relationship bewteen Eigenvector centrality and another measure, closeness centrality. Closeness centrality is a measure of how close a given node is to all the other nodes.

nodes_analysis %>%
  ggplot() +
    geom_point(aes(x=closeness,y=eigenvector,size=degree_all)) +
    labs(
        x = "Closeness Centrality",
        y = "Eigenvector Centrality",
        title = "Relationship between Closeness and Eigenvector Centrality ",
        subtitle = "As Seen in Toilers and Gangsters Network",
        size = "Node\nDegree"
      )

Now that we have all this information, we can also now redo our network graph using any of these measures. Let us get a network graph that incorporates all the new variables we had added to the node table:

my_analysed_network=graph_from_data_frame(d=edges,directed=TRUE,vertices=nodes_analysis)

For example, here is a graph diagram with the size of the node changed to indicate its betweenness.

ggraph(my_analysed_network, layout = "kk") +
      # Add an edge link geometry
      geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')), 
                   end_cap = circle(1.5, 'mm'), aes(alpha = intensity)) +
    # What happens when you change the 1.5 in the length? or the 1.5 in the end_cap circle?
    scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
      # Add a node point geometry
      geom_node_point(aes(size=between,color=nationality)) +
    labs(
      title = "Toilers and Gangsters, 1860-1950",
      caption = "Data from the 'Toilers and Gangsters' public dataset",
      size = "Betweenness",
      color = "Nationality",
      edge_alpha = "Weight"
    ) +
    # Notice I used the edge_ prefix in order to give a legend title to the edge attribte
    geom_node_label(aes(label = name),, family="serif",alpha=0.6, repel = TRUE) +
    # Try changing geom_node_label to geom_node_text above. What happens?
    network_theme()

Questions 7

How would you change this to colour by location, but size by closeness centrality? Or eigenvector centrality?
How would you create a ggplot that showed the relationship between betweenness centrality and the mentions in the sources?
Challenge: How would you create a ggplot that visualized the comparison of the average betweenness of women in the network compared to men?
Challenge: What steps would you need to go through to compare the total density (ratio of the number of the edges vs. possible edges) of the members of the network in each of the three nationalities? What about in each location? How could you plot this in a simple bar graph? You may have to do some exploring in the documentation for igraph or ggraph

Creating an Interactive Network Graph with visNetwork

There are a number of ways to make your network graph interactive, especially in a website. These include using a Shiny app, D3.js and its R connector networkD3, or the R package visNetwork. See Jesse Sadler’s network tutorial for a comparison of D3.js and visNetwork, as well as a demonstration of how you can use networkD3 to create what is known as a Sankey diagram.

To convert our simple network to a visNetwork that will allow interaction, we’ll have to abandon our use of given name in the place of id numbers as a key. If you have been using id numbers from the start (recommended) in your node and edge tables, you don’t need this step at all. The convert our tables, we’ll add an id number column to our nodes, and then replace all the given names in the edges table with their corresponding id number. First let us add an id column to the nodes and few the top ten rows of the resulting data frame:

nodes_wids<-nodes %>%
  mutate(id=seq.int(n())) %>%
  # add the id column with a sequence of integers from 1 to the total number of entries
  select(id,person,location,age,nationality,mentions,discuss,gender)
  # the select statement here just reorders the columns to put id first
head(nodes_wids,10)

id	person	location	age	nationality	mentions	discuss	gender
1	Tomohiko	Tokyo	22	Japan	14	1	m
2	Kyŏngmin	Seoul	55	Korea	12	0	m
3	Jiurong	Shanghai	44	China	3	0	f
4	Sangok	Pusan	33	Korea	5	0	m
5	Yoshinobu	Tokyo	66	Japan	67	1	m
6	Wei	Qingdao	57	China	30	1	f
7	Songbae	Seoul	36	Korea	26	0	m
8	Minjun	Pusan	55	Korea	4	0	m
9	Hayun	Pusan	22	Korea	2	0	m
10	Minjae	Pusan	30	Korea	12	0	m

Now let’s replace the names with the node id numbers in the from and to columns of the edge table using the match() function. For each row of our new from column, we ask it to supply us the node id for the row in which the name in the edges from column matches the name in the person column of the now id-equiped nodes_wids varialbe.

from_ids<- nodes_wids$id[match(edges$from,nodes_wids$person)]
# We replace the names in the from column in the edge table with the ids from nodes_wids
to_ids <-nodes_wids$id[match(edges$to,nodes_wids$person)]
# We replace the names in the to column in the edge table with the ids from nodes_wids
edges_wids <-data_frame(from=from_ids,to=to_ids,kind=edges$kind,intensity=edges$intensity,year_start=edges$year_start,year_end=edges$year_end)
# We glue together the edges data frame again but this time with the id numbers.
head(edges_wids,10)

from	to	kind	intensity	year_start	year_end
14	10	3	3	1907	1921
9	10	3	1	1902	1943
3	1	3	1	1896	1947
2	3	3	1	1895	1920
10	14	3	3	1907	1921
25	22	3	2	1898	1934
1	3	3	1	1910	1936
6	19	3	3	1872	1920
5	1	3	2	1898	1915
14	15	2	1	1919	1931

Now we can produce the visNetwork interactive plot with our new nodes_wids and edges_wids node and edge tables.

visNetwork(nodes_wids, edges_wids)

This is a very limited and boring graph, however. You can click on and manipulate the nodes but its physics allows for very limited moving of things around before they spring back into place. It is also missing almost everything else useful to communicate anything.

Now let us create a visNetwork object with more information communicated that you can freely manipulate by clicking on nodes. It will also include navigation buttons for easily manipulation of zoom levels and panning. If we add columns to the data which indicate things like size (of the nodes), width (of the edges), and color (of the nodes). Zoom in on the graph and you will see that the labels fade in and out depending on your zoom level.

nodes_wlabels<-nodes_wids %>%
  mutate(label=person) %>%
  # visNetwork looks for a "label" column to label the nodes, so I've added this column with the contents from the person column.
  mutate(size=rescale(mentions,to=c(10,50))) %>%
  # adding a size column and scaling it with the rescale function of the scales package to a number between 10 and 50 will determine the size of our node
  mutate(color=str_replace(str_replace(str_replace(nationality,"China","#0f9e45"),"Korea","#4167a3
"),"Japan","#db310d"))
  # Here I have done three string replaces on nationality, to replace China, Korea, and Japan with the colors we want to use.

edges_wformat<-edges_wids %>%
  mutate(width=rescale(intensity,to=c(1,13)))
  # here again it is looking for a "width" column, which I have added with the contents of the 
  # intensity column, but using the rescale function of the scales package to scale to a number
  # between 1 and 13

visNetwork(nodes_wlabels, edges_wformat, width = "100%") %>%
  visIgraphLayout(layout = "layout_with_kk") %>% 
  visEdges(arrows = "to", width=) %>% # Try "middle" for comparison
  visInteraction(navigationButtons = TRUE) # Uncomment this line to see what it does

There are many more ways to customise the visNetwork options. For more on this, see the documentation for the visNetwork package.

Saving Your Plots

There are a number of ways of extracting the plots you produce. One convenient way is the use of the ggsave() command in the ggplot() package.

This is not only useful for you to embed any of the graphs seen here in a separate document but gives you the ability to create a crystal clear SVG version that is unpixelated at any zoom, or save a PNG version, for example, at a size much larger than those shown here, so that there is less chance of nodes overlapping.

The following ggsave() command, for example, will save the last plot you have made to the disk as plot.svg which is zoom independent in its resolution, and a second version saved as a png file but with a fixed size:

ggsave("plot.svg",plot=last_plot(),device="svg")

## Saving 7 x 5 in image

# Will save the plot to the working directory of R Studio
ggsave("plot.png",plot=last_plot(),device="png", width=20, height=20, units="cm")

This should give you a good start at creating network graph diagrams using R. See some of these resources for more:

Toilers and Gangsters

Simple Network Visualization with R for Historians

Konrad M. Lawson