Transport Analytics Training Series - Last Revision: October 2022

Network Analytics¶

Now that we have a good handle on how basic pathfinding operations work, we can perform some further exploration of the properties of our network.

As this is a new notebook, we will have to load our dataset again:

In [1]:

import networkx as nx
import pandas as pd

links = pd.read_csv('data-sioux-falls/siouxfalls_links.csv')
nodes = pd.read_csv('data-sioux-falls/siouxfalls_nodes.csv')
new_links = list(zip(links['from_node'], links['to_node']))

G = nx.Graph()
G.add_nodes_from(nodes['node'])
G.add_edges_from(new_links)
coords = list(zip(nodes['x'], nodes['y']))
pos = dict(zip(nodes['node'], coords))

Part 1 - Degrees and centralities¶

Let's now see the degrees of our nodes:

In [2]:

degrees = [G.degree(n) for n in G.nodes()]
degrees

Out[2]:

[2, 2, 3, 3, 3, 3, 2, 4, 3, 5, 4, 3, 2, 3, 4, 4, 3, 3, 3, 4, 3, 4, 3, 3]

We can plot the distribution of degrees using matplotlib. To do this, we need to import it - as with the other libraries that we used, there happens to be yet another frequently used acronym for this library - plt.

In [3]:

import matplotlib.pyplot as plt

We can plot a histogram using the plt.hist() command, passing the list in question as a parameter.

In [4]:

plt.hist(degrees)
plt.show()

We can also obtain the various centrality values. Let's start with the degree centrality:

In [5]:

nx.degree_centrality(G)

Out[5]:

{1: 0.08695652173913043,
 2: 0.08695652173913043,
 3: 0.13043478260869565,
 4: 0.13043478260869565,
 5: 0.13043478260869565,
 6: 0.13043478260869565,
 7: 0.08695652173913043,
 8: 0.17391304347826086,
 9: 0.13043478260869565,
 10: 0.21739130434782608,
 11: 0.17391304347826086,
 12: 0.13043478260869565,
 13: 0.08695652173913043,
 14: 0.13043478260869565,
 15: 0.17391304347826086,
 16: 0.17391304347826086,
 17: 0.13043478260869565,
 18: 0.13043478260869565,
 19: 0.13043478260869565,
 20: 0.17391304347826086,
 21: 0.13043478260869565,
 22: 0.17391304347826086,
 23: 0.13043478260869565,
 24: 0.13043478260869565}

As we would like to compare the various centrality values it might be useful to store them in a new dataframe.

We can use the pd.DataFrame() command to initiate an empty dataframe.

In [6]:

centralities = pd.DataFrame()

We can now start populating it by simply storing values to new columns.

In [7]:

centralities['ID'] = G.nodes()
centralities['degree_centr'] = nx.degree_centrality(G).values()
centralities

Out[7]:

	ID	degree_centr
1	1	0.086957
2	2	0.086957
3	3	0.130435
4	4	0.130435
5	5	0.130435
6	6	0.130435
7	7	0.086957
8	8	0.173913
9	9	0.130435
10	10	0.217391
11	11	0.173913
12	12	0.130435
13	13	0.086957
14	14	0.130435
15	15	0.173913
16	16	0.173913
17	17	0.130435
18	18	0.130435
19	19	0.130435
20	20	0.173913
21	21	0.130435
22	22	0.173913
23	23	0.130435
24	24	0.130435

Let's now obtain the rest of the centrality values that we obtained in the class.

In [8]:

centralities['closeness_centr'] = nx.closeness_centrality(G).values()
centralities['betweenness_centr'] = nx.betweenness_centrality(G).values()
centralities['eigenvector_centr'] = nx.eigenvector_centrality(G).values()
centralities

Out[8]:

	ID	degree_centr	closeness_centr	betweenness_centr	eigenvector_centr
1	1	0.086957	0.264368	0.035244	0.034586
2	2	0.086957	0.267442	0.036797	0.041609
3	3	0.130435	0.306667	0.095285	0.078701
4	4	0.130435	0.348485	0.100838	0.128455
5	5	0.130435	0.333333	0.065152	0.128600
6	6	0.130435	0.310811	0.097666	0.110150
7	7	0.086957	0.306667	0.030665	0.117212
8	8	0.173913	0.343284	0.138553	0.212955
9	9	0.130435	0.365079	0.075878	0.208734
10	10	0.217391	0.425926	0.239977	0.384541
11	11	0.173913	0.403509	0.226379	0.239538
12	12	0.130435	0.348485	0.131799	0.110723
13	13	0.086957	0.306667	0.060277	0.066920
14	14	0.130435	0.353846	0.086665	0.209532
15	15	0.173913	0.370968	0.122476	0.317121
16	16	0.173913	0.389831	0.124630	0.304680
17	17	0.130435	0.359375	0.031973	0.267580
18	18	0.130435	0.343284	0.097956	0.194776
19	19	0.130435	0.333333	0.034064	0.241578
20	20	0.173913	0.328571	0.113175	0.255652
21	21	0.130435	0.315068	0.063439	0.185475
22	22	0.173913	0.328571	0.071888	0.267482
23	23	0.130435	0.315068	0.042478	0.172218
24	24	0.130435	0.302632	0.070422	0.122064

To understand the values a bit better, it might be useful to sort the values we can use the .sort_values() function provided with all DataFrames.

In [9]:

centralities.sort_values(by='betweenness_centr', ascending=False)

Out[9]:

	ID	degree_centr	closeness_centr	betweenness_centr	eigenvector_centr
10	10	0.217391	0.425926	0.239977	0.384541
11	11	0.173913	0.403509	0.226379	0.239538
8	8	0.173913	0.343284	0.138553	0.212955
12	12	0.130435	0.348485	0.131799	0.110723
16	16	0.173913	0.389831	0.124630	0.304680
15	15	0.173913	0.370968	0.122476	0.317121
20	20	0.173913	0.328571	0.113175	0.255652
4	4	0.130435	0.348485	0.100838	0.128455
18	18	0.130435	0.343284	0.097956	0.194776
6	6	0.130435	0.310811	0.097666	0.110150
3	3	0.130435	0.306667	0.095285	0.078701
14	14	0.130435	0.353846	0.086665	0.209532
9	9	0.130435	0.365079	0.075878	0.208734
22	22	0.173913	0.328571	0.071888	0.267482
24	24	0.130435	0.302632	0.070422	0.122064
5	5	0.130435	0.333333	0.065152	0.128600
21	21	0.130435	0.315068	0.063439	0.185475
13	13	0.086957	0.306667	0.060277	0.066920
23	23	0.130435	0.315068	0.042478	0.172218
2	2	0.086957	0.267442	0.036797	0.041609
1	1	0.086957	0.264368	0.035244	0.034586
19	19	0.130435	0.333333	0.034064	0.241578
17	17	0.130435	0.359375	0.031973	0.267580
7	7	0.086957	0.306667	0.030665	0.117212

That's quite interesting! We can see a wide range of centrality values - obviously, the centrality measuse regards some nodes as more important than others.

What if we wanted to obtain a Top 10 table? We can do this using the head() function:

In [10]:

centralities.sort_values(by='betweenness_centr', ascending=False).head(10)

Out[10]:

	ID	degree_centr	closeness_centr	betweenness_centr	eigenvector_centr
10	10	0.217391	0.425926	0.239977	0.384541
11	11	0.173913	0.403509	0.226379	0.239538
8	8	0.173913	0.343284	0.138553	0.212955
12	12	0.130435	0.348485	0.131799	0.110723
16	16	0.173913	0.389831	0.124630	0.304680
15	15	0.173913	0.370968	0.122476	0.317121
20	20	0.173913	0.328571	0.113175	0.255652
4	4	0.130435	0.348485	0.100838	0.128455
18	18	0.130435	0.343284	0.097956	0.194776
6	6	0.130435	0.310811	0.097666	0.110150

But this is supposed to be a Betweenness Top 10 - can we get rid of the other columns?

In [11]:

centralities.sort_values(by='betweenness_centr', ascending=False).head(10)[['ID','betweenness_centr']]

Out[11]:

	ID	betweenness_centr
10	10	0.239977
11	11	0.226379
8	8	0.138553
12	12	0.131799
16	16	0.124630
15	15	0.122476
20	20	0.113175
4	4	0.100838
18	18	0.097956
6	6	0.097666

But the row numbers at the very left column look a bit off... Could we perhaps sort these in ascending order?

In [12]:

centralities.sort_values(by='betweenness_centr', ascending=False).head(10).reset_index()[['ID','betweenness_centr']]

Out[12]:

	ID	betweenness_centr
0	10	0.239977
1	11	0.226379
2	8	0.138553
3	12	0.131799
4	16	0.124630
5	15	0.122476
6	20	0.113175
7	4	0.100838
8	18	0.097956
9	6	0.097666

We tinkered enough with the table for now. Now let's try to visualise the centralities.

We can do that easily by simply passing the values as a node_color. networkx/matplotlib will find a way to translate these into an actual color.

Part 2 - Visualising node properties¶

Let's see how the various centrality measures look like:

In [13]:

nx.draw(G, pos, with_labels = True, node_color = list(centralities['degree_centr']))

In [14]:

nx.draw(G, pos, with_labels = True, node_color = list(centralities['closeness_centr']))

In [15]:

nx.draw(G, pos, with_labels = True, node_color = list(centralities['betweenness_centr']))

In [16]:

nx.draw(G, pos, with_labels = True, node_color = list(centralities['eigenvector_centr']))

We can observe that there is quite a lot of similarity in the relative distribution of colors, but with some notable differences in the central nodes. That is to be expected, as each centrality measure has its own distinct definition (and purpose!).