Now that we have experimented with a few small networks, we are ready to look at a more substantial dataset - the London Underground!
import networkx as nx import pandas as pd import matplotlib.pyplot as plt
We have placed a simplified dataset of the LU network structure under the
data-london-underground folder, which we will load using
stations = pd.read_csv('data-london-underground/lu_stations.csv') stations
144 rows × 5 columns
This is about half the number of stations compared to the real world network. We did that in order to keep things simple for the purposes of this analysis.
links = pd.read_csv('data-london-underground/lu_links.csv') links
169 rows × 4 columns
No surprises here - let's now convert these into a graph.
G = nx.Graph() G.add_nodes_from(stations['id']) G.add_edges_from(list(zip(links['station1'], links['station2']))) nx.draw(G)
We have such a large number of nodes, and this ends up being a very busy graph. We can amend the way that the nodes are plotted, so that it looks a bit nicer. We can do this using the
nx.draw(G, node_size = 6)
But it remains a bit difficult to see - what if we could we make it a bit bigger?
This is possible using a few more advanced
matplotlib features. You see, in every new cell we are creating a new instance of a
matplotlib chart. Thanks to the
pyplot library within
matplotlib, the chart creation is quite similar to the one found in Matlab - so some concepts might look familiar.
To modify the size of the figure, we simply have to initialise the chart ourselves, usiing the
plt.figure() command, and then specify its size using the
If you want to more help with the transition from Matlab to Python, you can read this very helpful guide, or follow this DataCamp course.
plt.figure(figsize=(16,10)) nx.draw(G, node_size = 40)
Much better, but now that we have a better look at it, this certainly doesn't look anything like the London Tube.
Ah! But of course! We forgot to add the coordinates.
plt.figure(figsize=(16,10)) coords = list(zip(stations['longitude'],stations['latitude'])) pos = dict(zip(stations['id'], coords)) nx.draw(G,pos,node_size = 40)
What if we wanted to only illustrate the subgraph of the network that lies within Zones 1?
We can do that easily using using the
zone column in the
stations dataframe - note that the authors of that list chose use "half" values to denote stations that lie in two zones at the same time. Therefore
Archway station is described in being in zone
2.5, when in official maps it is placed on the boundaries of zones 2 and 3.
Therefore, if we want to obtain all the nodes that are found in zone 1, we would really have to obtain the stations with a
zone value of
<2 - it we used
<=1 to filter the list, we would have excluded stations that lie in the zone boundary, such as Earls Court.
stations_z1 = pd.read_csv('data-london-underground/lu_stations.csv') stations_z1 = stations_z1[stations_z1['zone']<2] len(stations_z1)
We filtered the stations using a condition applied on the
zone columnn. This effectively says:
"Look at the
zone column within the
stations_z1 dataframe, and select the rows where its value is less than 2. Now return a new dataframe, that contains only these rows".
We can now proceed to filter the stations. The
stations dataframe does not contain any information on zones, but we can do this by filtering the list by checking whether both stations in each edge are found within out filtered list of Zone 1 station.
To do this, we first create a list of all "allowed" node IDs. We then filter the list be exluding any link whose endpoints do not belong in Zone 1.
allowed_stations = list(stations_z1['id']) links_z1 = pd.read_csv('data-london-underground/lu_links.csv') links_z1 = links_z1.loc[links_z1['station1'].isin(allowed_stations)] len(links_z1)
We have now the list down to 57 nodes, which have an allowed station in the
station1 column. Let's now apply the filter to
links_z1 = links_z1.loc[links_z1['station2'].isin(allowed_stations)] len(links_z1)
Let's now visualise the part of the network:
G_z1 = nx.Graph() G_z1.add_nodes_from(stations_z1['id']) G_z1.add_edges_from(list(zip(links_z1['station1'], links_z1['station2']))) plt.figure(figsize=(16,10)) coords = list(zip(stations_z1['longitude'],stations_z1['latitude'])) pos = dict(zip(stations_z1['id'], coords)) nx.draw(G_z1, pos, node_size = 60)
We can now add a list of our centralities.
I am going to use a
lambda function to add station names into a column, based on dictionary and the value of the
ID column. There are much easier ways to achieve this, but I wanted to take this opportunity to show you the
lambda feature in action.
dict_names = dict(zip(stations['id'],stations['name']))
centralities = pd.DataFrame() centralities['ID'] = G.nodes() centralities['Names'] = centralities["ID"].map(lambda x:dict_names[x]) centralities['degree_centr'] = nx.degree_centrality(G).values() centralities['closeness_centr'] = nx.closeness_centrality(G).values() centralities['betweenness_centr'] = nx.betweenness_centrality(G).values() centralities['eigenvector_centr'] = nx.eigenvector_centrality(G).values()
Let us now obtain our "Top 10" lists.
|9||Tottenham Court Road||0.027972|