Transport Analytics Training Series - Last Revision: October 2022

Network Analytics¶

Now that we have a good handle on how basic pathfinding operations work, we can perform some further exploration of the properties of our network.

As this is a new notebook, we will have to load our dataset again:

In [1]:
import networkx as nx
import pandas as pd

links = pd.read_csv('data-sioux-falls/siouxfalls_links.csv')
nodes = pd.read_csv('data-sioux-falls/siouxfalls_nodes.csv')
new_links = list(zip(links['from_node'], links['to_node']))

G = nx.Graph()
G.add_nodes_from(nodes['node'])
G.add_edges_from(new_links)
coords = list(zip(nodes['x'], nodes['y']))
pos = dict(zip(nodes['node'], coords))

Part 1 - Degrees and centralities¶

Let's now see the degrees of our nodes:

In [2]:
degrees = [G.degree(n) for n in G.nodes()]
degrees
Out[2]:
[2, 2, 3, 3, 3, 3, 2, 4, 3, 5, 4, 3, 2, 3, 4, 4, 3, 3, 3, 4, 3, 4, 3, 3]

We can plot the distribution of degrees using matplotlib. To do this, we need to import it - as with the other libraries that we used, there happens to be yet another frequently used acronym for this library - plt.

In [3]:
import matplotlib.pyplot as plt

We can plot a histogram using the plt.hist() command, passing the list in question as a parameter.

In [4]:
plt.hist(degrees)
plt.show()

We can also obtain the various centrality values. Let's start with the degree centrality:

In [5]:
nx.degree_centrality(G)
Out[5]:
{1: 0.08695652173913043,
 2: 0.08695652173913043,
 3: 0.13043478260869565,
 4: 0.13043478260869565,
 5: 0.13043478260869565,
 6: 0.13043478260869565,
 7: 0.08695652173913043,
 8: 0.17391304347826086,
 9: 0.13043478260869565,
 10: 0.21739130434782608,
 11: 0.17391304347826086,
 12: 0.13043478260869565,
 13: 0.08695652173913043,
 14: 0.13043478260869565,
 15: 0.17391304347826086,
 16: 0.17391304347826086,
 17: 0.13043478260869565,
 18: 0.13043478260869565,
 19: 0.13043478260869565,
 20: 0.17391304347826086,
 21: 0.13043478260869565,
 22: 0.17391304347826086,
 23: 0.13043478260869565,
 24: 0.13043478260869565}

As we would like to compare the various centrality values it might be useful to store them in a new dataframe.

We can use the pd.DataFrame() command to initiate an empty dataframe.

In [6]:
centralities = pd.DataFrame()

We can now start populating it by simply storing values to new columns.

In [7]:
centralities['ID'] = G.nodes()
centralities['degree_centr'] = nx.degree_centrality(G).values()
centralities
Out[7]:
ID degree_centr
1 1 0.086957
2 2 0.086957
3 3 0.130435
4 4 0.130435
5 5 0.130435
6 6 0.130435
7 7 0.086957
8 8 0.173913
9 9 0.130435
10 10 0.217391
11 11 0.173913
12 12 0.130435
13 13 0.086957
14 14 0.130435
15 15 0.173913
16 16 0.173913
17 17 0.130435
18 18 0.130435
19 19 0.130435
20 20 0.173913
21 21 0.130435
22 22 0.173913
23 23 0.130435
24 24 0.130435

Let's now obtain the rest of the centrality values that we obtained in the class.

In [8]:
centralities['closeness_centr'] = nx.closeness_centrality(G).values()
centralities['betweenness_centr'] = nx.betweenness_centrality(G).values()
centralities['eigenvector_centr'] = nx.eigenvector_centrality(G).values()
centralities
Out[8]:
ID degree_centr closeness_centr betweenness_centr eigenvector_centr
1 1 0.086957 0.264368 0.035244 0.034586
2 2 0.086957 0.267442 0.036797 0.041609
3 3 0.130435 0.306667 0.095285 0.078701
4 4 0.130435 0.348485 0.100838 0.128455
5 5 0.130435 0.333333 0.065152 0.128600
6 6 0.130435 0.310811 0.097666 0.110150
7 7 0.086957 0.306667 0.030665 0.117212
8 8 0.173913 0.343284 0.138553 0.212955
9 9 0.130435 0.365079 0.075878 0.208734
10 10 0.217391 0.425926 0.239977 0.384541
11 11 0.173913 0.403509 0.226379 0.239538
12 12 0.130435 0.348485 0.131799 0.110723
13 13 0.086957 0.306667 0.060277 0.066920
14 14 0.130435 0.353846 0.086665 0.209532
15 15 0.173913 0.370968 0.122476 0.317121
16 16 0.173913 0.389831 0.124630 0.304680
17 17 0.130435 0.359375 0.031973 0.267580
18 18 0.130435 0.343284 0.097956 0.194776
19 19 0.130435 0.333333 0.034064 0.241578
20 20 0.173913 0.328571 0.113175 0.255652
21 21 0.130435 0.315068 0.063439 0.185475
22 22 0.173913 0.328571 0.071888 0.267482
23 23 0.130435 0.315068 0.042478 0.172218
24 24 0.130435 0.302632 0.070422 0.122064

To understand the values a bit better, it might be useful to sort the values we can use the .sort_values() function provided with all DataFrames.

In [9]:
centralities.sort_values(by='betweenness_centr', ascending=False)
Out[9]:
ID degree_centr closeness_centr betweenness_centr eigenvector_centr
10 10 0.217391 0.425926 0.239977 0.384541
11 11 0.173913 0.403509 0.226379 0.239538
8 8 0.173913 0.343284 0.138553 0.212955
12 12 0.130435 0.348485 0.131799 0.110723
16 16 0.173913 0.389831 0.124630 0.304680
15 15 0.173913 0.370968 0.122476 0.317121
20 20 0.173913 0.328571 0.113175 0.255652
4 4 0.130435 0.348485 0.100838 0.128455
18 18 0.130435 0.343284 0.097956 0.194776
6 6 0.130435 0.310811 0.097666 0.110150
3 3 0.130435 0.306667 0.095285 0.078701
14 14 0.130435 0.353846 0.086665 0.209532
9 9 0.130435 0.365079 0.075878 0.208734
22 22 0.173913 0.328571 0.071888 0.267482
24 24 0.130435 0.302632 0.070422 0.122064
5 5 0.130435 0.333333 0.065152 0.128600
21 21 0.130435 0.315068 0.063439 0.185475
13 13 0.086957 0.306667 0.060277 0.066920
23 23 0.130435 0.315068 0.042478 0.172218
2 2 0.086957 0.267442 0.036797 0.041609
1 1 0.086957 0.264368 0.035244 0.034586
19 19 0.130435 0.333333 0.034064 0.241578
17 17 0.130435 0.359375 0.031973 0.267580
7 7 0.086957 0.306667 0.030665 0.117212

That's quite interesting! We can see a wide range of centrality values - obviously, the centrality measuse regards some nodes as more important than others.

What if we wanted to obtain a Top 10 table? We can do this using the head() function:

In [10]:
centralities.sort_values(by='betweenness_centr', ascending=False).head(10)
Out[10]:
ID degree_centr closeness_centr betweenness_centr eigenvector_centr
10 10 0.217391 0.425926 0.239977 0.384541
11 11 0.173913 0.403509 0.226379 0.239538
8 8 0.173913 0.343284 0.138553 0.212955
12 12 0.130435 0.348485 0.131799 0.110723
16 16 0.173913 0.389831 0.124630 0.304680
15 15 0.173913 0.370968 0.122476 0.317121
20 20 0.173913 0.328571 0.113175 0.255652
4 4 0.130435 0.348485 0.100838 0.128455
18 18 0.130435 0.343284 0.097956 0.194776
6 6 0.130435 0.310811 0.097666 0.110150

But this is supposed to be a Betweenness Top 10 - can we get rid of the other columns?

In [11]:
centralities.sort_values(by='betweenness_centr', ascending=False).head(10)[['ID','betweenness_centr']]
Out[11]:
ID betweenness_centr
10 10 0.239977
11 11 0.226379
8 8 0.138553
12 12 0.131799
16 16 0.124630
15 15 0.122476
20 20 0.113175
4 4 0.100838
18 18 0.097956
6 6 0.097666

But the row numbers at the very left column look a bit off... Could we perhaps sort these in ascending order?

In [12]:
centralities.sort_values(by='betweenness_centr', ascending=False).head(10).reset_index()[['ID','betweenness_centr']]
Out[12]:
ID betweenness_centr
0 10 0.239977
1 11 0.226379
2 8 0.138553
3 12 0.131799
4 16 0.124630
5 15 0.122476
6 20 0.113175
7 4 0.100838
8 18 0.097956
9 6 0.097666

We tinkered enough with the table for now. Now let's try to visualise the centralities.

We can do that easily by simply passing the values as a node_color. networkx/matplotlib will find a way to translate these into an actual color.


Part 2 - Visualising node properties¶

Let's see how the various centrality measures look like:

In [13]:
nx.draw(G, pos, with_labels = True, node_color = list(centralities['degree_centr']))
In [14]:
nx.draw(G, pos, with_labels = True, node_color = list(centralities['closeness_centr']))
In [15]:
nx.draw(G, pos, with_labels = True, node_color = list(centralities['betweenness_centr']))
In [16]:
nx.draw(G, pos, with_labels = True, node_color = list(centralities['eigenvector_centr']))

We can observe that there is quite a lot of similarity in the relative distribution of colors, but with some notable differences in the central nodes. That is to be expected, as each centrality measure has its own distinct definition (and purpose!).