# working with spatial data
library(sf)
# data manipulation and visualisation
library(tidyverse)
# network manipulation and analysis
library(igraph)
2 Introduction to the basics of network analysis
2.1 Dependencies
We first clean the environment and load all the libraries required for this section.
2.2 Creating networks
Networks are used as a tool to conceptualise many real-life situations where a group of items display connections among themselves, such as the friendships among members of a school year group, airline flights between cities or hyperlinks between websites. Generally, a network (or a graph) consists of nodes (a.k.a. vertices) and edges (a.k.a. links) which represent the connections between the nodes of the network.
2.2.1 Starting from the basics
To understand the basic concepts and components of a network, we create a small example network in R using the igraph package. We create an undirected network of 5 nodes and 7 edges.
# create a graph object
<- graph(
network n = 5, # add number of nodes
edges = c(1,2,
2,3,
2,5,
3,1,
3,4,
4,5,
5,1), # add edges
directed = FALSE )
# plot
plot(network,
layout = layout.circle(network), # set layout
vertex.label = 1:5)
If the connections between the nodes of a network are non-reciprocal, the network is called directed.
# create a graph object
<- graph(
network n = 5, # add number of nodes
edges = c(1,2,
2,3,
2,5,
3,1,
3,4,
4,5,
5,1), # add edges
directed = TRUE )
# plot
plot(network,
layout = layout.circle(network), # set layout
vertex.label = 1:5)
The network can also be defined as a list containing pairs of nodes with labels. In that case we do not need to specify the number of nodes. We only specify the isolated nodes to be included.
We will describe the various arguments within plot which can be used to style your data visualisation
# create a graph object
<- graph(
network_isolated c("City 1","City 2",
"City 2","City 3",
"City 2","City 5",
"City 3","City 1",
"City 3","City 4",
"City 4","City 5",
"City 5","City 1"),
isolates=c("City 6") # isolated node
)
# plot
plot(network_isolated,
layout = layout.circle(network_isolated),
vertex.frame.color="red",
vertex.color="yellow",
vertex.size=15,
vertex.label.color="black",
vertex.label.cex=0.9,
vertex.label.dist=2.3,
edge.curved=0.3,
edge.arrow.size=.5,
edge.color = "blue"
)
2.2.2 Adding attributes
We can add attributes to individual components of the network. To add attributes to nodes, we need to access them:
V(network_isolated)
+ 6/6 vertices, named, from 4ae50a7:
[1] City 1 City 2 City 3 City 5 City 4 City 6
Node attribute names are automatically generated from the node labels we assigned.
V(network_isolated)$name
[1] "City 1" "City 2" "City 3" "City 5" "City 4" "City 6"
Additional node attributes can be added, such as population size:
V(network_isolated)$population <- c(92000, 549000, 1786000, 74000, 8000, 21000)
We can also access the edges:
E(network_isolated)
+ 7/7 edges from 4ae50a7 (vertex names):
[1] City 1->City 2 City 2->City 3 City 2->City 5 City 3->City 1 City 3->City 4
[6] City 4->City 5 City 5->City 1
We can add edge attributes, such as the number of people moving from an origin to a destination city. We call this attribute edge weight. The size of the connection between two places often represent their importance or “weight” in the network.
E(network_isolated)$weight <- c(2000, 5000, 1000, 1000, 4000, 3500, 7000)
We can examine the adjacency matrix of the network. The adjacency represents the presence of edges between different pairs of nodes. In our example, each row corresponds to an origin city and each column to a destination:
network_isolated[]
6 x 6 sparse Matrix of class "dgCMatrix"
City 1 City 2 City 3 City 5 City 4 City 6
City 1 . 2000 . . . .
City 2 . . 5000 1000 . .
City 3 1000 . . . 4000 .
City 5 7000 . . . . .
City 4 . . . 3500 . .
City 6 . . . . . .
We can also look at the existing node and edge attributes.
vertex_attr(network_isolated)
$name
[1] "City 1" "City 2" "City 3" "City 5" "City 4" "City 6"
$population
[1] 92000 549000 1786000 74000 8000 21000
And we can add network attributes:
$title <- "Network of human mobility between cities" network_isolated
2.2.3 Basic visualisation
You have already seen how we can visualise networks using the base R function plot()
. We will now elaborate on how individual components of the network can be visually adjusted using specific parameters in the plot function. More sophisticated data visualisations can be produced integrating geographic information, see Chapter 5 in Rowe, Cabrera-Arnau, and Piestrostefani (2023), or using other packages such as ggraph
in conjunction with ggplot2
as we will see in the section on Percolation.
plot(network_isolated,
layout = layout.circle(network_isolated),
# adjust nodes
vertex.frame.color = "gray90", # line colour
vertex.label.color = "black", # label colour
vertex.label.cex = 1, # label font size
vertex.label.dist = 3, # label distance
vertex.color = "yellow", # colour
vertex.size=10, # size
# adjust edges
edge.curved=0.3, # curvature
edge.arrow.size=1.5, # size
edge.arrow.width=.5,
edge.width=E(network_isolated)$weight/1000,
edge.color = "gray") # colour
2.3 Network metrics
Next we will provide a brief description of commonly used metrics to analyse the basic structure of networks.
2.3.1 Density
The density of a network refers to the proportion of existing edges over all possible edges. In a network with \(n\) nodes, the total number of possible edges is \(n \times (n-1)\). A density equal to \(1\) corresponds to a situation where \(n \times (n-1)\) edges are present. A network with no edges at all would have density equal to 0. We can obtain the density of a network by running the following code:
edge_density(network_isolated,
loops=FALSE)
[1] 0.2333333
2.3.2 Reciprocity
The reciprocity in a directed network is the proportion of reciprocated connections between nodes (i.e. number of pairs of nodes with edges in both directions) from all the existing edges.
reciprocity(network_isolated)
[1] 0
2.3.3 Degree
The degree of a node is the number of edges connected to it. The in-degree of a node in a directed network is the number of edges ending in a given node. The out-degree is the number of edges starting from a given node. The degree()
function allows computing the degree of one or more nodes, and specifying the measures of interest i.e. the total degree, in-degree or out-degree.
# compute in-degree
<- degree(network_isolated,
net_degree v=V(network_isolated),
mode="in")
# display histogram of in-degree
hist(net_degree,
breaks = 30,
main = "Node in-degree")
2.3.4 Distance
A path in a network between node \(A\) and node \(B\) is a sequence of edges joining distinct nodes, starting at node \(A\) and ending at node \(B\). In a directed path all edges follow the same direction.
The path length of a path between nodes \(A\) and \(B\) is generally defined as the number of edges forming this path. The shortest path is the minimum count of edges present to travel from \(A\) to \(B\).
The path length can be defined in alternative ways. For example, the path length can be defined as the sum of the weights of the edges forming a path if the edges are weighted.
We can use the function shortest_paths()
to find the shortest path between a given pair of nodes:
shortest_paths(network_isolated,
from = V(network_isolated)$name == "City 1",
to = V(network_isolated)$name == "City 5",
weights=NA, # if weights = NULL and a weight edge attribute is present, this is used.
output = "both", # outputs are produced for both path nodes and edges
predecessors = FALSE,
inbound.edges = FALSE,
)
$vpath
$vpath[[1]]
+ 3/6 vertices, named, from 4ae50a7:
[1] City 1 City 2 City 5
$epath
$epath[[1]]
+ 2/7 edges from 4ae50a7 (vertex names):
[1] City 1->City 2 City 2->City 5
$predecessors
NULL
$inbound_edges
NULL
Of all shortest paths in a network, the longest path length is the network diameter.
diameter(network_isolated,
directed=TRUE,
weights=NA)
[1] 4
The mean distance is the average length of all shortest paths in the network. The mean distance will always be smaller or equal than the diameter.
mean_distance(network_isolated,
directed=TRUE,
weights=NA)
[1] 2
2.3.5 Centrality
Centrality metrics assign scores to nodes (and sometimes also edges) according to their position within a network. These metrics can be used to identify the most influential nodes.
We can measure the degree of a node as we did before, but we can also measure its weighted degree. This is known as the strength of a node and it is computed as the sum of edge weights linked to adjacent nodes. Both degree and strength are considered to be centrality metrics.
# compute strength of network nodes
<- strength(network_isolated, # network object
net_strength vids = V(network_isolated), # vertices
mode = c("in"), # “in” for in-degree
loops = FALSE, # whether to loop the edge count
weights = NULL # if weights = NULL and a weight edge attribute is present, this is used.
)
# histogram
hist(net_strength,
breaks = 50,
main="Histogram of node strength")
Another two important centrality metrics are (1) closeness centrality, and (2) betweenness centrality. Closeness centrality is a measure of the shortest path length between a node and all the other nodes. For a given node, it is computed as the inverse of the average shortest paths between that node and every other node in the network. A node with closeness centrality close to 1 indicates that on average the node is very close to the other nodes in the network. A closeness centrality of 0 represents an isolated node.
<- closeness(network_isolated,
close_centr mode="in",
weights=NA) # using unweighted edges
hist(close_centr,
breaks = 50,
main = "Histogram of closeness centrality")
Betweenness centrality is a measure of the number of shortest paths going through a node. Nodes with high values of betweenness centrality indicates that they play a very important role in the overall connectivity of the network. Betweenness can also be computed for edges.
<- betweenness(network_isolated,
between_centr v = V(network_isolated),
directed = TRUE,
weights = NA)
hist(between_centr,
breaks = 50,
main = "Histogram of betweenness centrality")