Now that we have a good handle on how basic pathfinding operations work, we can perform some further exploration of the properties of our network.
As this is a new notebook, we will have to load our dataset again:
import networkx as nx
import pandas as pd
links = pd.read_csv('data-sioux-falls/siouxfalls_links.csv')
nodes = pd.read_csv('data-sioux-falls/siouxfalls_nodes.csv')
new_links = list(zip(links['from_node'], links['to_node']))
G = nx.Graph()
G.add_nodes_from(nodes['node'])
G.add_edges_from(new_links)
coords = list(zip(nodes['x'], nodes['y']))
pos = dict(zip(nodes['node'], coords))
Let's now see the degrees of our nodes:
degrees = [G.degree(n) for n in G.nodes()]
degrees
[2, 2, 3, 3, 3, 3, 2, 4, 3, 5, 4, 3, 2, 3, 4, 4, 3, 3, 3, 4, 3, 4, 3, 3]
We can plot the distribution of degrees using matplotlib
. To do this, we need to import it - as with the other libraries that we used, there happens to be yet another frequently used acronym for this library - plt
.
import matplotlib.pyplot as plt
We can plot a histogram using the plt.hist()
command, passing the list in question as a parameter.
plt.hist(degrees)
plt.show()
We can also obtain the various centrality values. Let's start with the degree centrality:
nx.degree_centrality(G)
{1: 0.08695652173913043, 2: 0.08695652173913043, 3: 0.13043478260869565, 4: 0.13043478260869565, 5: 0.13043478260869565, 6: 0.13043478260869565, 7: 0.08695652173913043, 8: 0.17391304347826086, 9: 0.13043478260869565, 10: 0.21739130434782608, 11: 0.17391304347826086, 12: 0.13043478260869565, 13: 0.08695652173913043, 14: 0.13043478260869565, 15: 0.17391304347826086, 16: 0.17391304347826086, 17: 0.13043478260869565, 18: 0.13043478260869565, 19: 0.13043478260869565, 20: 0.17391304347826086, 21: 0.13043478260869565, 22: 0.17391304347826086, 23: 0.13043478260869565, 24: 0.13043478260869565}
As we would like to compare the various centrality values it might be useful to store them in a new dataframe.
We can use the pd.DataFrame()
command to initiate an empty dataframe.
centralities = pd.DataFrame()
We can now start populating it by simply storing values to new columns.
centralities['ID'] = G.nodes()
centralities['degree_centr'] = nx.degree_centrality(G).values()
centralities
ID | degree_centr | |
---|---|---|
1 | 1 | 0.086957 |
2 | 2 | 0.086957 |
3 | 3 | 0.130435 |
4 | 4 | 0.130435 |
5 | 5 | 0.130435 |
6 | 6 | 0.130435 |
7 | 7 | 0.086957 |
8 | 8 | 0.173913 |
9 | 9 | 0.130435 |
10 | 10 | 0.217391 |
11 | 11 | 0.173913 |
12 | 12 | 0.130435 |
13 | 13 | 0.086957 |
14 | 14 | 0.130435 |
15 | 15 | 0.173913 |
16 | 16 | 0.173913 |
17 | 17 | 0.130435 |
18 | 18 | 0.130435 |
19 | 19 | 0.130435 |
20 | 20 | 0.173913 |
21 | 21 | 0.130435 |
22 | 22 | 0.173913 |
23 | 23 | 0.130435 |
24 | 24 | 0.130435 |
Let's now obtain the rest of the centrality values that we obtained in the class.
centralities['closeness_centr'] = nx.closeness_centrality(G).values()
centralities['betweenness_centr'] = nx.betweenness_centrality(G).values()
centralities['eigenvector_centr'] = nx.eigenvector_centrality(G).values()
centralities
ID | degree_centr | closeness_centr | betweenness_centr | eigenvector_centr | |
---|---|---|---|---|---|
1 | 1 | 0.086957 | 0.264368 | 0.035244 | 0.034586 |
2 | 2 | 0.086957 | 0.267442 | 0.036797 | 0.041609 |
3 | 3 | 0.130435 | 0.306667 | 0.095285 | 0.078701 |
4 | 4 | 0.130435 | 0.348485 | 0.100838 | 0.128455 |
5 | 5 | 0.130435 | 0.333333 | 0.065152 | 0.128600 |
6 | 6 | 0.130435 | 0.310811 | 0.097666 | 0.110150 |
7 | 7 | 0.086957 | 0.306667 | 0.030665 | 0.117212 |
8 | 8 | 0.173913 | 0.343284 | 0.138553 | 0.212955 |
9 | 9 | 0.130435 | 0.365079 | 0.075878 | 0.208734 |
10 | 10 | 0.217391 | 0.425926 | 0.239977 | 0.384541 |
11 | 11 | 0.173913 | 0.403509 | 0.226379 | 0.239538 |
12 | 12 | 0.130435 | 0.348485 | 0.131799 | 0.110723 |
13 | 13 | 0.086957 | 0.306667 | 0.060277 | 0.066920 |
14 | 14 | 0.130435 | 0.353846 | 0.086665 | 0.209532 |
15 | 15 | 0.173913 | 0.370968 | 0.122476 | 0.317121 |
16 | 16 | 0.173913 | 0.389831 | 0.124630 | 0.304680 |
17 | 17 | 0.130435 | 0.359375 | 0.031973 | 0.267580 |
18 | 18 | 0.130435 | 0.343284 | 0.097956 | 0.194776 |
19 | 19 | 0.130435 | 0.333333 | 0.034064 | 0.241578 |
20 | 20 | 0.173913 | 0.328571 | 0.113175 | 0.255652 |
21 | 21 | 0.130435 | 0.315068 | 0.063439 | 0.185475 |
22 | 22 | 0.173913 | 0.328571 | 0.071888 | 0.267482 |
23 | 23 | 0.130435 | 0.315068 | 0.042478 | 0.172218 |
24 | 24 | 0.130435 | 0.302632 | 0.070422 | 0.122064 |
To understand the values a bit better, it might be useful to sort the values we can use the .sort_values()
function provided with all DataFrames.
centralities.sort_values(by='betweenness_centr', ascending=False)
ID | degree_centr | closeness_centr | betweenness_centr | eigenvector_centr | |
---|---|---|---|---|---|
10 | 10 | 0.217391 | 0.425926 | 0.239977 | 0.384541 |
11 | 11 | 0.173913 | 0.403509 | 0.226379 | 0.239538 |
8 | 8 | 0.173913 | 0.343284 | 0.138553 | 0.212955 |
12 | 12 | 0.130435 | 0.348485 | 0.131799 | 0.110723 |
16 | 16 | 0.173913 | 0.389831 | 0.124630 | 0.304680 |
15 | 15 | 0.173913 | 0.370968 | 0.122476 | 0.317121 |
20 | 20 | 0.173913 | 0.328571 | 0.113175 | 0.255652 |
4 | 4 | 0.130435 | 0.348485 | 0.100838 | 0.128455 |
18 | 18 | 0.130435 | 0.343284 | 0.097956 | 0.194776 |
6 | 6 | 0.130435 | 0.310811 | 0.097666 | 0.110150 |
3 | 3 | 0.130435 | 0.306667 | 0.095285 | 0.078701 |
14 | 14 | 0.130435 | 0.353846 | 0.086665 | 0.209532 |
9 | 9 | 0.130435 | 0.365079 | 0.075878 | 0.208734 |
22 | 22 | 0.173913 | 0.328571 | 0.071888 | 0.267482 |
24 | 24 | 0.130435 | 0.302632 | 0.070422 | 0.122064 |
5 | 5 | 0.130435 | 0.333333 | 0.065152 | 0.128600 |
21 | 21 | 0.130435 | 0.315068 | 0.063439 | 0.185475 |
13 | 13 | 0.086957 | 0.306667 | 0.060277 | 0.066920 |
23 | 23 | 0.130435 | 0.315068 | 0.042478 | 0.172218 |
2 | 2 | 0.086957 | 0.267442 | 0.036797 | 0.041609 |
1 | 1 | 0.086957 | 0.264368 | 0.035244 | 0.034586 |
19 | 19 | 0.130435 | 0.333333 | 0.034064 | 0.241578 |
17 | 17 | 0.130435 | 0.359375 | 0.031973 | 0.267580 |
7 | 7 | 0.086957 | 0.306667 | 0.030665 | 0.117212 |
That's quite interesting! We can see a wide range of centrality values - obviously, the centrality measuse regards some nodes as more important than others.
What if we wanted to obtain a Top 10 table? We can do this using the head()
function:
centralities.sort_values(by='betweenness_centr', ascending=False).head(10)
ID | degree_centr | closeness_centr | betweenness_centr | eigenvector_centr | |
---|---|---|---|---|---|
10 | 10 | 0.217391 | 0.425926 | 0.239977 | 0.384541 |
11 | 11 | 0.173913 | 0.403509 | 0.226379 | 0.239538 |
8 | 8 | 0.173913 | 0.343284 | 0.138553 | 0.212955 |
12 | 12 | 0.130435 | 0.348485 | 0.131799 | 0.110723 |
16 | 16 | 0.173913 | 0.389831 | 0.124630 | 0.304680 |
15 | 15 | 0.173913 | 0.370968 | 0.122476 | 0.317121 |
20 | 20 | 0.173913 | 0.328571 | 0.113175 | 0.255652 |
4 | 4 | 0.130435 | 0.348485 | 0.100838 | 0.128455 |
18 | 18 | 0.130435 | 0.343284 | 0.097956 | 0.194776 |
6 | 6 | 0.130435 | 0.310811 | 0.097666 | 0.110150 |
But this is supposed to be a Betweenness Top 10 - can we get rid of the other columns?
centralities.sort_values(by='betweenness_centr', ascending=False).head(10)[['ID','betweenness_centr']]
ID | betweenness_centr | |
---|---|---|
10 | 10 | 0.239977 |
11 | 11 | 0.226379 |
8 | 8 | 0.138553 |
12 | 12 | 0.131799 |
16 | 16 | 0.124630 |
15 | 15 | 0.122476 |
20 | 20 | 0.113175 |
4 | 4 | 0.100838 |
18 | 18 | 0.097956 |
6 | 6 | 0.097666 |
But the row numbers at the very left column look a bit off... Could we perhaps sort these in ascending order?
centralities.sort_values(by='betweenness_centr', ascending=False).head(10).reset_index()[['ID','betweenness_centr']]
ID | betweenness_centr | |
---|---|---|
0 | 10 | 0.239977 |
1 | 11 | 0.226379 |
2 | 8 | 0.138553 |
3 | 12 | 0.131799 |
4 | 16 | 0.124630 |
5 | 15 | 0.122476 |
6 | 20 | 0.113175 |
7 | 4 | 0.100838 |
8 | 18 | 0.097956 |
9 | 6 | 0.097666 |
We tinkered enough with the table for now. Now let's try to visualise the centralities.
We can do that easily by simply passing the values as a node_color. networkx
/matplotlib
will find a way to translate these into an actual color.
Let's see how the various centrality measures look like:
nx.draw(G, pos, with_labels = True, node_color = list(centralities['degree_centr']))
nx.draw(G, pos, with_labels = True, node_color = list(centralities['closeness_centr']))
nx.draw(G, pos, with_labels = True, node_color = list(centralities['betweenness_centr']))
nx.draw(G, pos, with_labels = True, node_color = list(centralities['eigenvector_centr']))
We can observe that there is quite a lot of similarity in the relative distribution of colors, but with some notable differences in the central nodes. That is to be expected, as each centrality measure has its own distinct definition (and purpose!).