Transport Analytics Training Series - Last Revision: October 2022

The Nearest Neighbour Heuristic¶

This notebook contains a simple implementation of a genetric algorithm (GA) to solve the Travelling Salesman Problem.

GAs belong to the family of evolutionary metaheuristics, which are based on the "survival of the fittest". Each solution is represented by a chromosome, which consists of a sequence of genes that represent a solution to the problem.

GAs operate upon a population of solutions, which are manipulated over several iterations, which are called generations.

Better solutions are progressively identified, as the algorithm pairs parent chromosomes to produce offspring, or applies random mutations on previously generated chromosomes.

There are 5 main steps to the GA process after we initialise the parameters:

  1. Population generation - where a randomised population of chromosomes (a one-dimensional string of real values) is generated.
  2. Fitness evaluation - where the fitness of all chromosomes in the population is calculated
  3. Parent selection - where certain chromosomes are selected to be carried over to the next generation based on elitism
  4. Crossover - where sections of the best parent chromosomes are combines to create offspring
  5. Mutation - where the chromosomes are tweaked to get a new solution

When the population has converged (where the offspring are not significantly different to the parents), the algorithm can be terminated.

We will be using Distributed Evolutionary Algorithms in Python (DEAP) library.

In [1]:
!pip install deap
Requirement already satisfied: deap in c:\users\hc6118\anaconda3\envs\tsenv\lib\site-packages (1.3.1)
Requirement already satisfied: numpy in c:\users\hc6118\anaconda3\envs\tsenv\lib\site-packages (from deap) (1.19.2)
In [2]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.datasets import make_blobs

%matplotlib inline

We will first create a map of the 15 homes with two centres using make_blobs. The function of make_blob from sklearn (or scikit-learn) allows to create spatially normally distributed points with given number of centres.

For this example, we will set two clusters and the standard deviation of the cluster to be 20 based on the random generator seed of 2. By setting a value for the random_state parameter, the results can be reproduced for future comparison of the results.

In [3]:
plot_size   = 15
plot_width  = 16
plot_height = 8

params = {'legend.fontsize': 'large',
          'figure.figsize': (plot_width,plot_height),
          'axes.labelsize': plot_size,
          'axes.titlesize': plot_size,
          'xtick.labelsize': plot_size*0.75,
          'ytick.labelsize': plot_size*0.75,
          'axes.titlepad': 25}
plt.rcParams.update(params)
plt.rcParams.update(params)
In [4]:
num_homes = 15
In [5]:
center_box = (100, 200) 

homes_coord, _ = make_blobs(n_samples=num_homes, 
                           centers=2, 
                           cluster_std=20, 
                           center_box=center_box, 
                           random_state = 2)

homes_names = [i for i in range(num_homes)]
homes_coord_dict = {name: coord for name,coord in zip(homes_names, homes_coord)}
In [6]:
plt.scatter(homes_coord[:, 0], homes_coord[:, 1], s=plot_size*2, cmap='viridis');

We will then calculate the distances between the homes within the created map. The Euclidean distance gives us the length of the line segment between two points. These distances will form as an input to our genetic algorithm.

In [7]:
from scipy.spatial import distance

dist_matrix = distance.cdist(homes_coord, homes_coord, 'euclidean')

We now have:

  • The number of homes (num_homes): 15
  • The IDs of the homes (homes_names)
  • The coordinates of the homes (homes_coord_dict)

Using the DEAP library, we will solve our travelling salesman problem using genetic algorithm.

In [8]:
from deap import base, creator, tools

We will create the necessary functions to solve our problem.

Firstly, we will need to create our population of chromosomes. The function to achieve this, is the following:

In [9]:
import copy

np.random.seed(3)  # we are setting the same random seed to be able to reproduce the answers.

def chromo_create(_homes_names):
    chromo = copy.deepcopy(_homes_names)
    np.random.shuffle(chromo)    
    return chromo

We also need a generic function that will evaluate the overall distance the travelling salesperson will be covering based on the current route to be able to compare between each iteration.

In [10]:
def chromo_eval(_dist_matrix, _chromo):
    dist = 0
    for p in range(len(_chromo) - 1):
        _i = _chromo[p]
        _j = _chromo[p+1]
        dist += _dist_matrix[_i][_j]
        
    dist += dist_matrix[_chromo[-1], _chromo[0]]
    return dist,

The second step of GA is to evaluate the fitness of all chromosomes in the population. We can use the embedded functions from the DEAP library.

In [11]:
tb = base.Toolbox()
creator.create('Fitness_Func', base.Fitness, weights=(-1.0,))
creator.create('Individual', list, fitness=creator.Fitness_Func)

Now that we have set up the necessary functions, we will start with our solver.

Before we start, we need to initialise the parameters for the genetic algorithm to function. We are specifying the solution population size, number of generations, the crossover probability, and the mutation probability.

In [12]:
num_population = 200
num_generations = 1000
prob_crossover = .4
prob_mutation = .6

Step 1: We will generate our population. Using the function chromo_create we wrote above, we will create the population based on the home IDs we created in the beginning. We then will need to register all this information to our Toolbox.

In [13]:
tb.register('indexes', chromo_create, homes_names)
tb.register('individual', tools.initIterate, creator.Individual, tb.indexes)
tb.register('population', tools.initRepeat, list, tb.individual)
tb.register('evaluate', chromo_eval, dist_matrix)
tb.register('select', tools.selTournament)
tb.register('mate', tools.cxPartialyMatched)
tb.register('mutate', tools.mutShuffleIndexes)
In [14]:
population = tb.population(n=num_population)

Step 2: We will calculate the fitness of all chromosomes in the population.

In [15]:
fitness_set = list(tb.map(tb.evaluate, population))
for ind, fit in zip(population, fitness_set):
    ind.fitness.values = fit
  • Step 3: We will select the parent chromosomes selected to be carried over to the next generation.
  • Step 4: We wil create offspring chromosomes from the elite parent chromosomes based on the crossover probability we set in our initialisation.
  • Step 5: We will mutate the offspring chromosomes based on the mutation probability we set in our initialisation.

We create a for-loop to iterate for the number of generations we set in our initialisation.

In [16]:
best_fit_list = []
best_sol_list = []

best_fit = np.Inf

for gen in range(0, num_generations):
    
    if (gen % 50 == 0):
        print(f'Generation: {gen:4} | Fitness: {best_fit:.2f}' )   # print the generation and their fitness level
    
    offspring = tb.select(population, len(population), tournsize=3)
    offspring = list(map(tb.clone, offspring))
    
    for child1, child2 in zip(offspring[0::2], offspring[1::2]):
        if np.random.random() < prob_crossover:
            tb.mate(child1, child2)
            del child1.fitness.values
            del child2.fitness.values

    for chromo in offspring:
        if np.random.random() < prob_mutation:
            tb.mutate(chromo, indpb=0.01)
            del chromo.fitness.values

    invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
    fitness_set = map(tb.evaluate, invalid_ind)
    for ind, fit in zip(invalid_ind, fitness_set):
        ind.fitness.values = fit
    
    population[:] = offspring
    
    curr_best_sol = tools.selBest(population, 1)[0]
    curr_best_fit = curr_best_sol.fitness.values[0]
    
    if curr_best_fit < best_fit:
        best_sol = curr_best_sol
        best_fit = curr_best_fit

    best_fit_list.append(best_fit)
    best_sol_list.append(best_sol)
Generation:    0 | Fitness: inf
Generation:   50 | Fitness: 327.09
Generation:  100 | Fitness: 327.09
Generation:  150 | Fitness: 327.09
Generation:  200 | Fitness: 327.09
Generation:  250 | Fitness: 327.09
Generation:  300 | Fitness: 327.09
Generation:  350 | Fitness: 327.09
Generation:  400 | Fitness: 327.09
Generation:  450 | Fitness: 327.09
Generation:  500 | Fitness: 327.09
Generation:  550 | Fitness: 327.09
Generation:  600 | Fitness: 327.09
Generation:  650 | Fitness: 327.09
Generation:  700 | Fitness: 327.09
Generation:  750 | Fitness: 327.09
Generation:  800 | Fitness: 327.09
Generation:  850 | Fitness: 327.09
Generation:  900 | Fitness: 327.09
Generation:  950 | Fitness: 327.09

We will plot the fitness level for each generation to see if the population has converged after the limited number of generations we have set.

In [17]:
plt.plot(best_fit_list)
plt.show()

We can see the population has converged!

So we will now print the best solution for the tour route ofor Santa Claus to cover all 15 homes the quickest and return to his origin.

In [18]:
print(best_sol)
[1, 14, 2, 6, 9, 8, 13, 11, 12, 0, 3, 4, 7, 5, 10]

We will create a plot to see what such route would look like.

In [19]:
final_sol = best_sol + best_sol[0:1]

plt.scatter(homes_coord[:, 0], 
            homes_coord[:, 1], 
            s=plot_size*2, 
            cmap='viridis',
            zorder = 10000);

for i, txt in enumerate(homes_names):
    plt.annotate(txt, (homes_coord[i, 0]+1, homes_coord[i, 1]+1))

lines = []
for p in range(len(final_sol) - 1):
    i = final_sol[p]
    j = final_sol[p+1]
    colour = 'black'       
    plt.arrow(homes_coord[i][0], 
              homes_coord[i][1],
              homes_coord[j][0] - homes_coord[i][0], 
              homes_coord[j][1] - homes_coord[i][1], 
              color=colour)