There are a number of user friendly tools for visualizing networks out there which don’t require any programming knowledge. These include Cytoscape and Gephi, among others. The language R, however, has become a popular and powerful platform for data analysis, as well as the cleaning of data, visualization of texts, networks, and geographical information. R benefits from a large ecosystem of open source packages, and in recent years, a collection of them that have come to be known as the Tidyverse has made the process of exploring data significantly easier. On the network analysis front, mature packages like statnet
and igraph
are joined by new ones including the pair tidygraph
and ggraph
to make it fairly easy to visualize and explore networks in R.
Network science is an active and well developed field in the sciences and social sciences such as sociology. In history, however, while use and reference to “networks” is now very common in publications, the use of formal analytical network analysis has been rather more limited. Learn more about work in this area at the website dedicated to Historical Network Research.
Effective use of formal network analysis depends on strong familiarity with the science and mathematics of graphs. However, there are many contexts in which visualising a historical network is useful without more advanced techniques that tap the full analytical potential of network exploration. Illustrating historical research with a network graph diagram can help the reader better grasp the scope and connections of a group of individuals you may be discussing in your research. This illustrative value of social network visualisations depends in great part on the ability to craft visualisations that communicate well. Network visualisation is also a way to explore connections and patterns in your historical materials which, especially as your collection of individuals and organisations (if you create a “bimodal” network, see below) grows beyond the scale at which you can easily derive patterns by browsing a table or spreadsheet. We might call this the heuristic value of social network visualisations. They may include some use of basic network analysis tools, but it uses them usually as a path to finding new questions to ask about your material, or as a way to cast the spotlight on possible patterns that you can explore in depth, perhaps returning to other sources and methods. This tutorial is primarily for students and scholars in the humanities who are interested in network visualisations for their illustrative and heuristic potential, but who may want to gain some familiarity and exposure to its analytical potential.
This tutorial was designed for history students in a masters level skills module at the University of St Andrews MLitt programme in Global, Transnational, and Spatial history to get a first taste of how R might be used to explore historical networks. In this exercise we will practice creating some simple network visualisations using a fictional network of East Asian gangsters and revolutionaries.
Prerequisites: My students working with this tutorial R Notebook have done a little bit of previous work with R and text analysis with material from Text Mining with R by Julia Silge & David Robinson, Text Analysis with R for Students of Literature by Matthew Jockers, and read some of A User’s Guide to Network Analysis in R on igraph
as well has having completed the DataCamp module Introduction to the Tidyverse. I would suggest trying this tutorial if you have had a least some basic introduction to R and familiarity with RStudio.
This tutorial is inspired by or adapts material by Jesse Sadler and Douglas A. Luke, among others (see the bibliography below). It was created as an “R Notebook”" which can be used by anyone directly if you have R installed and open the file in RStudio. You can download the files used in this tutorial here in the github repository. If you open this notebook in RStudio, you will see the code and can run all of it in one go with cmd/ctrl-option-r. Alternatively, you can run code from a single section using cmd/ctrl-shift-enter. Many of the questions below ask you to see what happens when you tweak some of the code found here.
For this exercise you need the packages:
readr dplyr tidyr stringr fircats tibble ggplot2 ggraph tidygraph igraph visNetwork scales
In RStudio “Install Packages…” from the Tools menu and you can paste in the above list of packages separated as they are by a space and press Install. After the packages are installed (you may have had some of them already), then load them as follows:
library(readr)
library(igraph)
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
library(forcats)
library(ggplot2)
library(ggraph)
library(tidygraph)
library(visNetwork)
library(scales)
Now we need to get our data into our network.
The raw data that is used for network analysis and visualisation is usually in the form of edge and node tables. When visualised, these are the lines and points of a graph diagram. The nodes of your network are very often indivduals along with any attributes tied to those individuals. You can style the nodes in your network graphs using these attributes.
Even more important than the nodes are a table of edges which contain the relational information of your network: the relationships between your agents, or between agents and organisations in the case of bimodal graphs (see below). These relationships may also have attributes that can be visualised with styling. John Scott’s introductory text Social Network Analysis has a nice chapter on considerations for collecting and organising your data for analysis and visualisation you might want to consult.
Put the nodes.csv and edges.csv files that I have shared with you into the working directory where you should be keeping this notebook file (or set the working directory to the write place in the Session menu).
We will now load the nodes into a nodes data frame and edges into an edges data frame. The head()
command with 10 as a parameter will give you a peak at the contents of each file.
person | location | age | nationality | mentions | discuss | gender |
---|---|---|---|---|---|---|
Tomohiko | Tokyo | 22 | Japan | 14 | 1 | m |
Kyŏngmin | Seoul | 55 | Korea | 12 | 0 | m |
Jiurong | Shanghai | 44 | China | 3 | 0 | f |
Sangok | Pusan | 33 | Korea | 5 | 0 | m |
Yoshinobu | Tokyo | 66 | Japan | 67 | 1 | m |
Wei | Qingdao | 57 | China | 30 | 1 | f |
Songbae | Seoul | 36 | Korea | 26 | 0 | m |
Minjun | Pusan | 55 | Korea | 4 | 0 | m |
Hayun | Pusan | 22 | Korea | 2 | 0 | m |
Minjae | Pusan | 30 | Korea | 12 | 0 | m |
from | to | kind | intensity | year_start | year_end |
---|---|---|---|---|---|
Chŏngsu | Minjae | 3 | 3 | 1907 | 1921 |
Hayun | Minjae | 3 | 1 | 1902 | 1943 |
Jiurong | Tomohiko | 3 | 1 | 1896 | 1947 |
Kyŏngmin | Jiurong | 3 | 1 | 1895 | 1920 |
Minjae | Chŏngsu | 3 | 3 | 1907 | 1921 |
Takamasa | Kei | 3 | 2 | 1898 | 1934 |
Tomohiko | Jiurong | 3 | 1 | 1910 | 1936 |
Wei | Guoran | 3 | 3 | 1872 | 1920 |
Yoshinobu | Tomohiko | 3 | 2 | 1898 | 1915 |
Chŏngsu | Yŏngsik | 2 | 1 | 1919 | 1931 |
Notice that the nodes have age, nationality, location, mentions (let us say this is number of times they appear in some source or collection of sources). I have also an arbitrary binary discuss column where I have manually flagged up a few important characters I might want to emphasise.
When preparing a collection of nodes and edges for network visualization it is usually best to have a column in the nodes table with unique id numbers that are used as a reference key to all other information about that agent. Then, in the edges table, you would see only the relevant id numbers, instead of the names. However, for this simple example, to increase the readability of the files as we learn the basics, I have chosen to use the given name the fictional individuals (there are just one or two real given names that fit the description of individuals for this network to add to the fun for East Asian historians) without any special id column.
Let us create a network object from our nodes and edges:
This creates an igraph
network object, but it is a format that is easily understood by ggraph
and most of its features. Later in this exercise we will convert this to a tidygraph
tibble graph. For now, we can very easily create a simple graph diagram using the ggraph()
command. It works in a very similar fashion to ggplot
, which it is an expansion of. You tell it the network to use, assign a layout type, then add options. In this case we will simply add a geom_edge_link()
which will give us the edges, and a geom_node_point()
which will display points.
ggraph(my_network, layout = "kk") +
# Add an edge link geometry
geom_edge_link() +
# Add a node point geometry
geom_node_point()
This is very simple. We can see that it is placed on an x,y axis and looks like a kind of special ggplot diagram. There is lots of things we might want to do to improve this.
Let us start by adding labels to the graph. Under geom_node_point() we will add a geom_node_label(). The aesthetics we will give it are to connect its label to the name column of our nodes, set the font face.
Then, back outside the aesthetics aes()
we will set the transparency level to 60% (alpha=0.6
). This may seem like somethign we would put inside the aesthetic, but because we are giving it a specific value, and not mapping it to our data, it goes outside. This will allow us to see any edge lines and nodes behind the label.
We also add the repel = TRUE
here to help with the formatting of the location of the labels.
ggraph(my_network, layout = "kk") +
# Add an edge link geometry
geom_edge_link() +
# Add a node point geometry
geom_node_point() +
geom_node_label(aes(label = name),family="serif", alpha=0.6, repel=TRUE)
Try the following questions below.
repel=TRUE
(remember to cut out the trailing comma too)?Both ggplot and ggraph can work with “themes” that store lots of custom settings that we can apply to our graphs. You can store a theme in a function that calls the theme and then add that theme function to any graph you call. You might, for example, create several to match different purposes. See the R for Data Science book, or ggplot: Elegant Graphics for Data Analysis or the DataCamp class Communicating with Data in R (Tidyverse) or just run ?theme
for more on themes.
Let us create a theme to use for our graphs. They will make the background a light grey, extend the margins, remove the axis text, ticks, and titles. It will also remove all the grid lines.
network_theme<- function() {
theme(
text=element_text(family="serif",face="bold"),
plot.background = element_rect(fill="gray95"),
plot.margin = unit(c(20,20,15,10), units="mm"),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank()
)
}
We need only call network_theme()
at the end of our graphs to apply these settings.
Now in our next graph, notice the changes caused by our theme and no other changes:
Now let us begin adding some more things to our graph. Using the labs()
function, we can add a title, a caption on the source of the data at the bottom right. Also in labs()
we can rename the legend titles. Notice I use the escaped n character in one case to create a two line legend header. Notice that, in the case of the edge width, I had to use an “edge_” prefix before naming the legend header.
We’ll also make some other additions to our graph diagram. In the geom_edge_link()
aesthetics, we will tell it to vary the width of the edgbe by the intensity column of our data. Then outside the aesthetics we will fix the color of the lines to a mid level grey.
We can control the scale of the width with the scale_edge_width()
function, which sets the range to a minimum of 0.2 in width and a maximum of 2, scaling the numbers to something within that range.
For our nodes, our aes()
now scales the size of the node by the number of mentions in the sources, and the color of the nodes according to the nationality.
ggraph(my_network, layout = "kk") +
# Add an edge link geometry
geom_edge_link(aes(width=intensity),colour="grey50") +
# What happens when you change grey50 to grey20 or grey90?
scale_edge_width(range = c(0.2, 2.0)) +
# Add a node point geometry
geom_node_point(aes(size=mentions,color=nationality)) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
size = "Source\nMentions",
color = "Nationality",
edge_width = "Weight"
) +
geom_node_label(aes(label = name), family="serif", alpha=0.6, repel = TRUE) +
network_theme()
We have a directed network, meaning that relationships between two inviduals may only go in one or sometimes in both directions. You can add arrows to the geom_edge_link()
as below. Notice I also switched from width to transparency to show varying intensity and instead of showing nationality with node color, now show the gender.
ggraph(my_network, layout = "kk") +
# Add an edge link geometry
geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')),
end_cap = circle(1.3, 'mm'), aes(alpha = intensity)) +
# What happens when you change the 1.5 in the length? or the 1.5 in the end_cap circle?
scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
# Add a node point geometry
geom_node_point(aes(size=mentions,color=factor(gender))) +
scale_colour_manual(values=c("f"="green4","m"="sandybrown")) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
size = "Source\nMentions",
color = "Gender",
edge_alpha = "Weight"
) +
# Notice I used the edge_ prefix in order to give a legend title to the edge attribte
geom_node_label(aes(label = name), family="serif",alpha=0.6, repel = TRUE) +
# Try changing geom_node_label to geom_node_text above. What happens?
network_theme()
Change the code so that instead of varying the size of the node by the number of source mentions, it adjusts the size by the discuss column of the node table. This is a number 0 or 1.
In the scale command for the geom_node_point
, scale the size from 2 to 8 by adding a scale_size()
One of the variables we have that we haven’t used is the kind column in the node data, which is a number from 1-3. What if we wanted to create a second diagram that only shows those relationships which are of kind 3?
The tidygraph
library has a nice activate()
method that allows you to manipulate nodes and edges or filter them in various ways. Instead of calling activate(edges)
before manipulating edges, there is also a nice shortcut, with %E>%
instead of the usual pipe or %N>%
to work with your nodes. For this we need to take our igraph
network and convert it to a tbl_graph
with as_tbl_graph()
and then we can use the filter()
command to find just the edges which have a kind==3
. If graphed this immediately, we would see the filtered edges, but also a number of isolated nodes no longer connected to the rest of the graph. We can activate the node layer and then filter out the isolated nodes with filter(!node_is_isolated())
.
as_tbl_graph(my_network) %E>%
filter(kind==3) %N>%
filter(!node_is_isolated()) %>%
ggraph(layout = "kk") +
# Add an edge link geometry
geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')),
end_cap = circle(1.5, 'mm'), aes(alpha = intensity)) +
# What happens when you change the 1.5 in the length? or the 1.5 in the end_cap circle?
scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
# Add a node point geometry
geom_node_point(aes(size=mentions,color=nationality)) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
size = "Source\nMentions",
color = "Nationality",
edge_alpha = "Weight"
) +
# Notice I used the edge_ prefix in order to give a legend title to the edge attribte
geom_node_label(aes(label = name), family="serif", alpha=0.4, show.legend=FALSE, repel = TRUE) +
# Try changing geom_node_label to geom_node_text above. What happens?
network_theme()
year_start
was before 1890 and year_end
after 1910? You can do this with two filter commands, or with a compound &
statement.Network scientists have developed a variety of algorithms to detect communities in a network. While the analytical value of this algorithmically derived grouping in the context of historical research may be limited, for larger networks, it can help you identify clusters to explore. For more on this read the chapter on “Subgroups” in the book A User’s Guide to Network Analysis in R. The tidygraph
package inherits many of the community detection algorithms imbedded into igraph
and makes them available to us, including Edge-betweenness (group_edge_betweenness
), Leading eigenvector (group_leading_eigen
), Fast-greedy (group_fast_greedy
), Louvain (group_louvain
), Walktrap (group_walktrap
), Label propagation (group_label_prop
), InfoMAP (group_infomap
), Spinglass (group_spinglass
), and Optimal (group_optimal
). Some community algorithms are designed to take into account direction or weight, while others ignore it. Below we try Walktrap, which is not, in fact, designed for directed networks, but try comparing its results with other community detection algorithms and note the differences.
as_tbl_graph(my_network) %>%
to_undirected() %>%
mutate(community = as.factor(group_walktrap())) %>%
ggraph(layout = "kk") +
# Add an edge link geometry
geom_edge_link(aes(alpha = intensity), show.legend = FALSE) +
# No longer need the arrow because we have made our graph undirected
scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
geom_node_point(aes(size=mentions, color=community)) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
size = "Source\nMentions",
color = "Community"
) +
geom_node_label(aes(label = name),family="serif",alpha=0.6, repel = TRUE) +
network_theme()
Bimodal, bipartite, or affiliation networks have two different types of nodes and generally only link between the two types of nodes. As the term “affiliation network” suggests, this is often in the form of the affiliation of an individual to an organisation of some kind.
Let us import a list of edges between individuals and organisations.
From | To |
---|---|
Tomohiko | Toilers of the Great East |
Jiurong | Green Crane Society |
Minjun | Workers Alliance |
Hyejin | East Wind |
Yoshinobu | Kawakami-gumi |
Wei | Great Harmony Society |
Wei | Green Crane Society |
Hyejin | Toilers of the Great East |
Kyŏngmin | Toilers of the Great East |
Sangok | East Wind |
We have now a table with relationships between indivdiuals and organisations, but it would be nice to create a merged node table which joins all the attribute information from organisations, which includes the location of the organisations’ headquarters, and all the attribute data for individuals. We can use full_join()
for this.
#
org_nodes<-read_csv("orgs.csv") # Orgs have a name, but also a HQ location
merged_nodes<-full_join(nodes,org_nodes,by = c("person" = "Name"))
tail(merged_nodes,20)
person | location | age | nationality | mentions | discuss | gender | HQ |
---|---|---|---|---|---|---|---|
Yŏngsu | Tokyo | 31 | Korea | 34 | 0 | f | NA |
Yōsuke | Nagoya | 26 | Japan | 10 | 0 | m | NA |
Kei | Osaka | 24 | Japan | 3 | 0 | m | NA |
Senjūrō | Kagoshima | 35 | Japan | 1 | 0 | m | NA |
Masahirō | Kagoshima | 41 | Japan | 1 | 0 | m | NA |
Takamasa | Kōchi | 45 | Japan | 4 | 0 | m | NA |
Michiō | Niigata | 37 | Japan | 1 | 0 | m | NA |
Kanno | Osaka | 32 | Japan | 44 | 1 | f | NA |
Fumiko | Seoul | 29 | Japan | 31 | 1 | f | NA |
Kikue | Tokyo | 40 | Japan | 14 | 0 | f | NA |
Zhen | Yizheng | 30 | China | 29 | 1 | f | NA |
Jongmyung | Seoul | 23 | Korea | 10 | 0 | f | NA |
Toilers of the Great East | NA | NA | NA | NA | NA | NA | Pusan |
Green Crane Society | NA | NA | NA | NA | NA | NA | Beijing |
Workers Alliance | NA | NA | NA | NA | NA | NA | Seoul |
East Wind | NA | NA | NA | NA | NA | NA | Shanghai |
Kawakami-gumi | NA | NA | NA | NA | NA | NA | Tokyo |
Iwaguchi-gumi | NA | NA | NA | NA | NA | NA | Kagoshima |
Great Harmony Society | NA | NA | NA | NA | NA | NA | Beijing |
Red Wave Association | NA | NA | NA | NA | NA | NA | Tokyo |
Now we can create a network object from this merged information. In order to keep track of what nodes are part of each mode (individuals or organisations) we’ll add a type
column to the node data that will get a TRUE
value if it is one of the organisations.
affiliation_network=graph_from_data_frame(d=affiliations,directed=FALSE,vertices=merged_nodes)
V(affiliation_network)$type<-V(affiliation_network)$name %in% org_nodes$Name
Now we can great a graph diagram of our bimodal network. In the code, I have made a few customisations to our usual graphs above by setting the shape of the node to correspond to whether it is an individual or an organisation and then chose a circle (ggplot
shape number 19) or a square (15). I increased the fig_width
to make the chart wider, and used some conditionals in the form of ifelse()
to conditionally distinguish the organisations by color, and only assign labels to individuals.
Note: If you run this code in R Studio, note the difference between the appearance of the plots within R Studio and the exported web page version.
ggraph(affiliation_network, layout = "kk") +
# Add an edge link geometry
geom_edge_link() +
geom_node_point(aes(size=type, shape=type, color=ifelse(type==1,as.character(name),NA))) +
scale_color_discrete(breaks=unique(affiliations$To)) +
scale_size_discrete(range=c(2,4), guide=FALSE) +
scale_shape_manual(values=c(19,15), guide=FALSE) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
shape = "Person or\nOrganization",
color = "Organization"
) +
geom_node_label((aes(label = ifelse(type==0,as.character(name),""))), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
network_theme()
You can also use a special bipartite
layout for the graph that produces a hierarchical look. Sometimes the tree
layout will also produce a desirable effect as well.
ggraph(affiliation_network, layout = "bipartite") +
# Add an edge link geometry
geom_edge_link() +
geom_node_point(aes(size=type, shape=type, color=ifelse(type==1,as.character(name),NA))) +
scale_color_discrete(breaks=unique(affiliations$To)) +
scale_size_discrete(range=c(2,4), guide=FALSE) +
scale_shape_manual(values=c(19,15), guide=FALSE) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
shape = "Person or\nOrganization",
color = "Organization"
) +
geom_node_label((aes(label = ifelse(type==0,as.character(name),""))), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
network_theme()
Bimodal graphs are nice for visualising the connections between two different types of things. As Scott Weingart has argued in several web posts, including his overview of bimodal networks, they are significantly more difficult to analysis using formal network analysis methods, including the challenge of exploring various forms of centrality or clustering coefficients.
They are valuable, however, as a heuristic visualisation to explore your network and discover new questions, or areas to focus in on for more research. They can also serve more simple illustrative purposes when you are exploring a historical network in your narrative and want to illustrate visually relationships between individuals and organisations or some other combination of two modes even without formal analysis being carried out.
One useful transformation of your bimodal newtorks that can be particularly useful, especially for larger networks than the one we are dealing with here, is to explore connections between the nodes in one mode or the other by means of their connections to the other mode. In our historical example, we might explore what the connectivity is between organisations based on members who tie them together, or, what connections are there between individuals by virtue of the fact that they share membership in an organisation. These are called projections of bimodal networks.
To create these projections we can use the igraph
function bipartite.projection()
function. This will create a list with two projections proj1
and proj2
, one for each mode. Let us assign each one to its own network object and then plot them.
network_projections<-bipartite.projection(affiliation_network)
member_projection<-network_projections$proj1
org_projection<-network_projections$proj2
ggraph(member_projection, layout = "kk") +
geom_edge_link(aes(alpha=weight, width=weight)) +
scale_edge_width(range = c(0.1, 1.5), name="Weight") +
scale_edge_alpha(range = c(0.3, 1), guide=FALSE) +
geom_node_point(size=3) +
labs(
title = "Toilers and Gangsters, 1860-1950: Bipartite Projection of Members",
caption = "Data from the 'Toilers and Gangsters' public dataset",
shape = "Person or\nOrganization",
color = "Organization"
) +
geom_node_label((aes(label = name)), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
network_theme()
ggraph(org_projection, layout = "kk") +
geom_edge_link(aes(alpha=weight, width=weight)) +
scale_edge_width(range = c(0.1, 1.5), name="Weight") +
scale_edge_alpha(range = c(0.3, 1), guide=FALSE) +
geom_node_point(size=3) +
labs(
title = "Toilers and Gangsters, 1860-1950: Bipartite Projection of Organisations",
caption = "Data from the 'Toilers and Gangsters' public dataset",
shape = "Person or\nOrganization",
color = "Organization"
) +
geom_node_label(aes(label = name), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
network_theme()
The lines here are thicker in the cases where members were more linked to each other by mutual membership in multiple organisations. In the second plot we see that four of the organisations each share two members. Not terribly revealing in this case, but with much larger networks, this may reveal interlocking organisations with overlapping memberships that might not be immediately obvious by perusing a table of membership data.
Bimodal networks include only connections between two different modes. But there is nothing preventing you from flattening a bimodal graph and including all the edges from our unimodal network. That is, you can create a visualisation, for illustrative or heuristic purposes, that depicts both relationships between individuals and between these individuals and the organisations. Please note that if formal analysis plays any role in your exploration of these networks, this is not methodologically sound for any number of reasons. Among the issues is that we are mixing a directed network (of individuals) with an undirected network (of affiliations).
To create our mega plot, we will merge the edge table with relationships between individuals and organisations using bind_rows()
, with that of individuals to individuals. For simplicity, we will first assign an intensity of 1 and type 4 to all affiliation relationships, and leave all date info as NA. We’ll also standardise the naming of the columns as “From” and “To” are capitalised in one case and not in the other. mutate()
makes it easy to rename the columns.
We can then visualise all the edges together, and use various visual features to help make the plot more readable, but anyone who has used software such as Cytoscape, for example, will see that it is much easier to customise the visualisation of multiple networks together there than here, as far as I have been able to determine. Especially if the aim is just to explore your data as a part of the research and thinking process, then Cytoscape is a much easier alternative to R and igraph/ggraph.
prep<- affiliations %>%
mutate(from=From,to=To,kind=4,intensity=1,year_start=NA,year_end=NA) %>%
select(from,to,kind,intensity,year_start,year_end)
merged_edges<- bind_rows(edges,prep)
Now let us create a new network object with this merged edge table and our previously merged node table and plot the results:
fully_merged=graph_from_data_frame(d=merged_edges,directed=TRUE,vertices=merged_nodes)
V(fully_merged)$type<-V(fully_merged)$name %in% org_nodes$Name
ggraph(fully_merged, layout = "kk") +
# Add an edge link geometry
geom_edge_link(aes(width=intensity)) +
scale_edge_width(range=c(0.1,1.5), name="Intensity") +
geom_node_point(aes(size=type, shape=type, color=ifelse(type==1,as.character(name),NA))) +
scale_color_discrete(breaks=unique(affiliations$To)) +
scale_size_discrete(range=c(2,4), guide=FALSE) +
scale_shape_manual(values=c(19,15), guide=FALSE) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
shape = "Person or\nOrganization",
color = "Organization"
) +
geom_node_label((aes(label = ifelse(type==0,as.character(name),""))), family="serif", alpha=0.7, show.legend=FALSE, repel = TRUE) +
network_theme()
This plot includes too much information to communicate its contents clearly at this size. If you plan on creating complex plots, I suggest you use ggsave()
(see below) to export large versions of the graph after playing with the figure widths and heights.
Up until now we have been mostly using the Kamada-Kawai layout algorithm to determine the look of our network. There are a range of the other layouts you can create with the replacement of the layout type.
Below see our graph with the Fruchterman-Reingold layout.
ggraph(my_network, layout = "fr") +
# Add an edge link geometry
geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')),
end_cap = circle(1.5, 'mm'), aes(alpha = intensity)) +
# What happens when you change grey50 to grey20 or grey90?
scale_edge_width(range = c(0.2, 2.0)) +
geom_node_point(aes(size=mentions,color=nationality)) +
scale_size(range = c(2,10)) +
labs(
edge_width= "Weight",
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
size = "Source\nMentions",
color = "Nationality",
edge_width = "Weight"
) +
geom_node_label(aes(label = name), family="serif",alpha=0.6, show.legend=FALSE, repel = TRUE) +
network_theme()
There is also a “circular” layout, which takes a bit more tweaking of the parameters and size to get it to fit well:
ggraph(my_network, layout = "linear", circular = TRUE) +
geom_edge_arc(aes(width = intensity), alpha = 0.8) +
scale_edge_width(range = c(0.2, 1)) +
geom_node_label(aes(label = name), size=2.2) +
labs(edge_width = "Weight") +
theme_graph() +
theme(legend.position="bottom")
layout=""
to the following possible layouts: sugiyama
,star
,dh
,gem
,graphopt
,drl
and compare the results.fig.width=5
option used in the case of the circular layout in the declration of the r code section. What happens if you remove it?, size=2.2
outside of the aes()
for the geom_node_label()
? What happens if you cut that out?coord_cartesian(xlim=c(-1.5,1.5),ylim=c(-1.5,1.5))
Although we have been using the ggraph package to visualise our network, the graph itself is an igraph object and can take advantage of all the analytical tools in igraph:
Look how easy it is to add columns, using our trusty dplyr mutate()
to add columes with the betweenness, closeness, and eigenvector centrality computed for our nodes, together with the total, in, and out degrees.
nodes_analysis<- nodes %>%
mutate(between=betweenness(my_network, directed=FALSE),
degree_all=degree(my_network,mode="all"),
degree_in=degree(my_network,mode="in"),
degree_out=degree(my_network,mode="out"),
closeness=closeness(my_network),
eigenvector=evcent(my_network)$vector)
Note: If your graph is in tidygraph
you can also use the wide variety of centrality_
prefixed functions.
We can do a quick comparison of in, out and total degree of the nodes, which measures the outgoing and incoming relationships, or their total, minus any overlapping edges. Notice I used the fct_reorder() function from the forcats library to re-sort the names by their total degree (degree_all). Comment out that line to see what happens to the graph.
nodes_analysis %>%
mutate(person = fct_reorder(person, degree_all)) %>%
ggplot() +
geom_point(aes(x=degree_in,y=person, color="In Degree")) + # , color="khaki2"
geom_point(aes(x=degree_out,y=person, color="Out Degree")) + # , color="cadetblue3"
geom_point(aes(x=degree_all,y=person,size=eigenvector, color="Total Degree")) + # ,color="darkslategrey"
scale_size(range=c(0.3,4)) +
labs(
x = "In and Degree",
y = "Name",
color = "Degrees",
title = "Degrees and Total Degrees with Eigenvector Centrality",
subtitle = "As Seen in Toilers and Gangsters Network",
caption = "Total degree is sum of in & out minus doubled counted edges.",
size = "Eigenvector\nCentrality"
)
With this data we could easily plot the relationship between various kinds of centrality. Betweenness centrality is a measure of the degree to which a node is a gatekeeper to other nodes. How many of the shortest paths between nodes must pass through a given node? Eigenvector centrality tries to judge the importance of a node by the relative connectivity of its neighbors. Read more about it here. Let us compare the two in our own network:
nodes_analysis %>%
ggplot() +
geom_point(aes(x=between,y=eigenvector,size=degree_all)) +
labs(
x = "Betweenness",
y = "Eigenvector Centrality",
title = "Relationship between Betweenness and Eigenvector Centrality ",
subtitle = "As Seen in Toilers and Gangsters Network",
size = "Node\nDegree"
)
How about the relationship bewteen Eigenvector centrality and another measure, closeness centrality. Closeness centrality is a measure of how close a given node is to all the other nodes.
nodes_analysis %>%
ggplot() +
geom_point(aes(x=closeness,y=eigenvector,size=degree_all)) +
labs(
x = "Closeness Centrality",
y = "Eigenvector Centrality",
title = "Relationship between Closeness and Eigenvector Centrality ",
subtitle = "As Seen in Toilers and Gangsters Network",
size = "Node\nDegree"
)
Now that we have all this information, we can also now redo our network graph using any of these measures. Let us get a network graph that incorporates all the new variables we had added to the node table:
For example, here is a graph diagram with the size of the node changed to indicate its betweenness.
ggraph(my_analysed_network, layout = "kk") +
# Add an edge link geometry
geom_edge_link(arrow = arrow(length = unit(1.5, 'mm')),
end_cap = circle(1.5, 'mm'), aes(alpha = intensity)) +
# What happens when you change the 1.5 in the length? or the 1.5 in the end_cap circle?
scale_edge_width(range = c(0.2, 2.5), guide=FALSE) +
# Add a node point geometry
geom_node_point(aes(size=between,color=nationality)) +
labs(
title = "Toilers and Gangsters, 1860-1950",
caption = "Data from the 'Toilers and Gangsters' public dataset",
size = "Betweenness",
color = "Nationality",
edge_alpha = "Weight"
) +
# Notice I used the edge_ prefix in order to give a legend title to the edge attribte
geom_node_label(aes(label = name),, family="serif",alpha=0.6, repel = TRUE) +
# Try changing geom_node_label to geom_node_text above. What happens?
network_theme()
igraph
or ggraph
There are a number of ways to make your network graph interactive, especially in a website. These include using a Shiny app, D3.js and its R connector networkD3, or the R package visNetwork
. See Jesse Sadler’s network tutorial for a comparison of D3.js and visNetwork, as well as a demonstration of how you can use networkD3 to create what is known as a Sankey diagram.
To convert our simple network to a visNetwork that will allow interaction, we’ll have to abandon our use of given name in the place of id numbers as a key. If you have been using id numbers from the start (recommended) in your node and edge tables, you don’t need this step at all. The convert our tables, we’ll add an id number column to our nodes, and then replace all the given names in the edges table with their corresponding id number. First let us add an id column to the nodes and few the top ten rows of the resulting data frame:
nodes_wids<-nodes %>%
mutate(id=seq.int(n())) %>%
# add the id column with a sequence of integers from 1 to the total number of entries
select(id,person,location,age,nationality,mentions,discuss,gender)
# the select statement here just reorders the columns to put id first
head(nodes_wids,10)
id | person | location | age | nationality | mentions | discuss | gender |
---|---|---|---|---|---|---|---|
1 | Tomohiko | Tokyo | 22 | Japan | 14 | 1 | m |
2 | Kyŏngmin | Seoul | 55 | Korea | 12 | 0 | m |
3 | Jiurong | Shanghai | 44 | China | 3 | 0 | f |
4 | Sangok | Pusan | 33 | Korea | 5 | 0 | m |
5 | Yoshinobu | Tokyo | 66 | Japan | 67 | 1 | m |
6 | Wei | Qingdao | 57 | China | 30 | 1 | f |
7 | Songbae | Seoul | 36 | Korea | 26 | 0 | m |
8 | Minjun | Pusan | 55 | Korea | 4 | 0 | m |
9 | Hayun | Pusan | 22 | Korea | 2 | 0 | m |
10 | Minjae | Pusan | 30 | Korea | 12 | 0 | m |
Now let’s replace the names with the node id numbers in the from and to columns of the edge table using the match()
function. For each row of our new from column, we ask it to supply us the node id for the row in which the name in the edges from column matches the name in the person column of the now id-equiped nodes_wids
varialbe.
from_ids<- nodes_wids$id[match(edges$from,nodes_wids$person)]
# We replace the names in the from column in the edge table with the ids from nodes_wids
to_ids <-nodes_wids$id[match(edges$to,nodes_wids$person)]
# We replace the names in the to column in the edge table with the ids from nodes_wids
edges_wids <-data_frame(from=from_ids,to=to_ids,kind=edges$kind,intensity=edges$intensity,year_start=edges$year_start,year_end=edges$year_end)
# We glue together the edges data frame again but this time with the id numbers.
head(edges_wids,10)
from | to | kind | intensity | year_start | year_end |
---|---|---|---|---|---|
14 | 10 | 3 | 3 | 1907 | 1921 |
9 | 10 | 3 | 1 | 1902 | 1943 |
3 | 1 | 3 | 1 | 1896 | 1947 |
2 | 3 | 3 | 1 | 1895 | 1920 |
10 | 14 | 3 | 3 | 1907 | 1921 |
25 | 22 | 3 | 2 | 1898 | 1934 |
1 | 3 | 3 | 1 | 1910 | 1936 |
6 | 19 | 3 | 3 | 1872 | 1920 |
5 | 1 | 3 | 2 | 1898 | 1915 |
14 | 15 | 2 | 1 | 1919 | 1931 |
Now we can produce the visNetwork
interactive plot with our new nodes_wids
and edges_wids
node and edge tables.
This is a very limited and boring graph, however. You can click on and manipulate the nodes but its physics allows for very limited moving of things around before they spring back into place. It is also missing almost everything else useful to communicate anything.
Now let us create a visNetwork object with more information communicated that you can freely manipulate by clicking on nodes. It will also include navigation buttons for easily manipulation of zoom levels and panning. If we add columns to the data which indicate things like size (of the nodes), width (of the edges), and color (of the nodes). Zoom in on the graph and you will see that the labels fade in and out depending on your zoom level.
nodes_wlabels<-nodes_wids %>%
mutate(label=person) %>%
# visNetwork looks for a "label" column to label the nodes, so I've added this column with the contents from the person column.
mutate(size=rescale(mentions,to=c(10,50))) %>%
# adding a size column and scaling it with the rescale function of the scales package to a number between 10 and 50 will determine the size of our node
mutate(color=str_replace(str_replace(str_replace(nationality,"China","#0f9e45"),"Korea","#4167a3
"),"Japan","#db310d"))
# Here I have done three string replaces on nationality, to replace China, Korea, and Japan with the colors we want to use.
edges_wformat<-edges_wids %>%
mutate(width=rescale(intensity,to=c(1,13)))
# here again it is looking for a "width" column, which I have added with the contents of the
# intensity column, but using the rescale function of the scales package to scale to a number
# between 1 and 13
visNetwork(nodes_wlabels, edges_wformat, width = "100%") %>%
visIgraphLayout(layout = "layout_with_kk") %>%
visEdges(arrows = "to", width=) %>% # Try "middle" for comparison
visInteraction(navigationButtons = TRUE) # Uncomment this line to see what it does
There are many more ways to customise the visNetwork options. For more on this, see the documentation for the visNetwork package.
There are a number of ways of extracting the plots you produce. One convenient way is the use of the ggsave()
command in the ggplot()
package.
This is not only useful for you to embed any of the graphs seen here in a separate document but gives you the ability to create a crystal clear SVG version that is unpixelated at any zoom, or save a PNG version, for example, at a size much larger than those shown here, so that there is less chance of nodes overlapping.
The following ggsave()
command, for example, will save the last plot you have made to the disk as plot.svg
which is zoom independent in its resolution, and a second version saved as a png
file but with a fixed size:
## Saving 7 x 5 in image
# Will save the plot to the working directory of R Studio
ggsave("plot.png",plot=last_plot(),device="png", width=20, height=20, units="cm")
This should give you a good start at creating network graph diagrams using R. See some of these resources for more:
Luke, Douglas A. A User’s Guide to Network Analysis in R. 1st ed. 2015 edition. Cham Hildesheim New York: Springer, 2015. - Much of the code here is adapted from examples in this volume. Uses statnet and igraph but also shows how to convert between them. Unfortunately, all plots are with base R plots, rather than ggplot.
Wickham, Hadley. ggplot2: Elegant Graphics for Data Analysis. 2nd ed. 2016 edition. New York, NY: Springer, 2016. - ggraph is built on top of ggplot and many of the customisations to graphs benefit from understanding how ggplot works.
Scott, John. Social Network Analysis. 4th ed., 2017. - A great introductory text on the topic for humanities students.
Scott, John, and Peter J. Carrington, eds. The SAGE Handbook of Social Network Analysis. London ; Thousand Oaks, Calif: SAGE, 2011.
Wasserman, Stanley, and Katherine Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994. - The classic text in the field which devles into the mathamatical foundations of graph theory.
This R Notebook was written with the help of various books and tutorials mentioned above, but mostly thanks to 40-60 google searches, with the answers found generally on the websites above, Stack Overflow, and obscure online bulletin boards.