In [1]:

```
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
```

`data-london-underground`

folder, which we will load using `pandas`

.

In [2]:

```
stations = pd.read_csv('data-london-underground/lu_stations.csv')
stations
```

Out[2]:

id | latitude | longitude | name | zone | |
---|---|---|---|---|---|

0 | 1 | 51.5028 | -0.2801 | Acton Town | 3.0 |

1 | 8 | 51.5653 | -0.1353 | Archway | 2.5 |

2 | 9 | 51.6164 | -0.1331 | Arnos Grove | 4.0 |

3 | 10 | 51.5586 | -0.1059 | Arsenal | 2.0 |

4 | 11 | 51.5226 | -0.1571 | Baker Street | 1.0 |

... | ... | ... | ... | ... | ... |

139 | 296 | 51.5120 | -0.2239 | White City | 2.0 |

140 | 297 | 51.5492 | -0.2215 | Willesden Green | 2.5 |

141 | 303 | 51.5975 | -0.1097 | Wood Green | 3.0 |

142 | 301 | 51.6070 | 0.0341 | Woodford | 4.0 |

143 | 302 | 51.6179 | -0.1856 | Woodside Park | 4.0 |

144 rows × 5 columns

In [3]:

```
links = pd.read_csv('data-london-underground/lu_links.csv')
links
```

Out[3]:

station1 | station2 | line | time | |
---|---|---|---|---|

0 | 1 | 234 | 10 | 4 |

1 | 1 | 265 | 10 | 4 |

2 | 8 | 124 | 9 | 3 |

3 | 8 | 264 | 9 | 2 |

4 | 9 | 31 | 10 | 3 |

... | ... | ... | ... | ... |

164 | 257 | 258 | 9 | 2 |

165 | 261 | 302 | 9 | 3 |

166 | 266 | 303 | 10 | 2 |

167 | 279 | 285 | 7 | 2 |

168 | 288 | 302 | 9 | 1 |

169 rows × 4 columns

No surprises here - let's now convert these into a graph.

In [4]:

```
G = nx.Graph()
G.add_nodes_from(stations['id'])
G.add_edges_from(list(zip(links['station1'], links['station2'])))
nx.draw(G)
```

`node_size`

parameter.

In [5]:

```
nx.draw(G, node_size = 6)
```

But it remains a bit difficult to see - what if we could we make it a bit bigger?

This is possible using a few more advanced `matplotlib`

features. You see, in every new cell we are creating a new instance of a `matplotlib`

chart. Thanks to the `pyplot`

library within `matplotlib`

, the chart creation is quite similar to the one found in Matlab - so some concepts might look familiar.

To modify the size of the figure, we simply have to initialise the chart ourselves, usiing the `plt.figure()`

command, and then specify its size using the `figsize`

command.

If you want to more help with the transition from Matlab to Python, you can read this very helpful guide, or follow this DataCamp course.

In [6]:

```
plt.figure(figsize=(16,10))
nx.draw(G, node_size = 40)
```

Much better, but now that we have a better look at it, this certainly doesn't look anything like the London Tube.

Ah! But of course! We forgot to add the coordinates.

In [7]:

```
plt.figure(figsize=(16,10))
coords = list(zip(stations['longitude'],stations['latitude']))
pos = dict(zip(stations['id'], coords))
nx.draw(G,pos,node_size = 40)
```

What if we wanted to only illustrate the subgraph of the network that lies within Zones 1?

We can do that easily using using the `zone`

column in the `stations`

dataframe - note that the authors of that list chose use "half" values to denote stations that lie in two zones at the same time. Therefore `Archway`

station is described in being in zone `2.5`

, when in official maps it is placed on the boundaries of zones 2 and 3.

Therefore, if we want to obtain all the nodes that are found in zone 1, we would really have to obtain the stations with a `zone`

value of `<2`

- it we used `<=1`

to filter the list, we would have excluded stations that lie in the zone boundary, such as Earls Court.

In [8]:

```
stations_z1 = pd.read_csv('data-london-underground/lu_stations.csv')
stations_z1 = stations_z1[stations_z1['zone']<2]
len(stations_z1)
```

Out[8]:

36

We filtered the stations using a condition applied on the `zone`

columnn. This effectively says:

"Look at the `zone`

column within the `stations_z1`

dataframe, and select the rows where its value is less than 2. Now return a new dataframe, that contains only these rows".

We can now proceed to filter the stations. The `stations`

dataframe does not contain any information on zones, but we can do this by filtering the list by checking whether both stations in each edge are found within out filtered list of Zone 1 station.

To do this, we first create a list of all "allowed" node IDs. We then filter the list be exluding any link whose endpoints do not belong in Zone 1.

In [9]:

```
allowed_stations = list(stations_z1['id'])
links_z1 = pd.read_csv('data-london-underground/lu_links.csv')
links_z1 = links_z1.loc[links_z1['station1'].isin(allowed_stations)]
len(links_z1)
```

Out[9]:

57

`station1`

column. Let's now apply the filter to `station2`

.

In [10]:

```
links_z1 = links_z1.loc[links_z1['station2'].isin(allowed_stations)]
len(links_z1)
```

Out[10]:

54

Let's now visualise the part of the network:

In [11]:

```
G_z1 = nx.Graph()
G_z1.add_nodes_from(stations_z1['id'])
G_z1.add_edges_from(list(zip(links_z1['station1'], links_z1['station2'])))
plt.figure(figsize=(16,10))
coords = list(zip(stations_z1['longitude'],stations_z1['latitude']))
pos = dict(zip(stations_z1['id'], coords))
nx.draw(G_z1, pos, node_size = 60)
```

We can now add a list of our centralities.

I am going to use a `lambda`

function to add station names into a column, based on dictionary and the value of the `ID`

column. There are much easier ways to achieve this, but I wanted to take this opportunity to show you the `lambda`

feature in action.

In [12]:

```
dict_names = dict(zip(stations['id'],stations['name']))
```

In [13]:

```
centralities = pd.DataFrame()
centralities['ID'] = G.nodes()
centralities['Names'] = centralities["ID"].map(lambda x:dict_names[x])
centralities['degree_centr'] = nx.degree_centrality(G).values()
centralities['closeness_centr'] = nx.closeness_centrality(G).values()
centralities['betweenness_centr'] = nx.betweenness_centrality(G).values()
centralities['eigenvector_centr'] = nx.eigenvector_centrality(G).values()
```

Let us now obtain our "Top 10" lists.

In [14]:

```
centralities.sort_values(by='degree_centr', ascending=False).head(10).reset_index()[['Names','degree_centr']]
```

Out[14]:

Names | degree_centr | |
---|---|---|

0 | Green Park | 0.041958 |

1 | Oxford Circus | 0.034965 |

2 | Waterloo | 0.034965 |

3 | Leicester Square | 0.027972 |

4 | Bond Street | 0.027972 |

5 | Euston | 0.027972 |

6 | Finsbury Park | 0.027972 |

7 | Piccadilly Circus | 0.027972 |

8 | Stockwell | 0.027972 |

9 | Tottenham Court Road | 0.027972 |

In [15]:

```
centralities.sort_values(by='closeness_centr', ascending=False).head(10).reset_index()[['Names','closeness_centr']]
```

Out[15]:

Names | closeness_centr | |
---|---|---|