Transport Analytics Training Series - Last Revision: October 2022

Single Facility Location Problems¶

In [7]:
!pip install pulp
Collecting pulp
  Downloading PuLP-2.7.0-py3-none-any.whl (14.3 MB)
     |████████████████████████████████| 14.3 MB 2.5 MB/s eta 0:00:01
Installing collected packages: pulp
Successfully installed pulp-2.7.0
In [8]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.datasets import make_blobs

%matplotlib inline

plot_size   = 14
plot_width  = 5
plot_height = 5

params = {'legend.fontsize': 'large',
          'figure.figsize': (plot_width,plot_height),
          'axes.labelsize': plot_size,
          'axes.titlesize': plot_size,
          'xtick.labelsize': plot_size*0.75,
          'ytick.labelsize': plot_size*0.75,
          'axes.titlepad': 25}
plt.rcParams.update(params)
In [9]:
num_customer = 40
num_facility = 5

We set here the number of customers and facilities that will be used by the data generators and later in the model.

In [10]:
center_box = (100, 200) 

cust_coord, _ = make_blobs(n_samples=num_customer, 
                           centers=2, 
                           cluster_std=20, 
                           center_box=center_box, 
                           random_state = 2)

We are again using make_blobs to generate a random set of customers.

You might notice that we are saving predicted cluster indices (which we don't need) to a variable named _. This is a real variable name, which is commonly used whenever we want to disregard a certain value.

In [11]:
facil_coord, _ = make_blobs(n_samples=num_facility, 
                            centers=1, 
                            cluster_std=30, 
                            center_box=center_box, 
                            random_state = 50)

facil_coord
Out[11]:
array([[110.99138668,  82.98394341],
       [153.25029374, 148.67412191],
       [126.04608814, 154.9163425 ],
       [130.83231184,  78.87089589],
       [191.81854821, 108.50634614]])

To solve the facility location problem, we also need to provide a set of candidate locations. We use make_blobs again for this.

In [12]:
plt.scatter(cust_coord[:, 0], cust_coord[:, 1], s=plot_size*2, cmap='viridis');

plt.scatter(facil_coord[:, 0], facil_coord[:, 1], s=plot_size*5, cmap='viridis');

image.png

In [13]:
from pulp import *

prob = LpProblem('prob', LpMinimize)

We are back again in using PuLP.

In [14]:
from scipy.spatial import distance

dist_matrix = distance.cdist(facil_coord, facil_coord, 'euclidean').round(2)

dist_matrix
Out[14]:
array([[ 0.  , 78.11, 73.49, 20.26, 84.76],
       [78.11,  0.  , 27.91, 73.31, 55.69],
       [73.49, 27.91,  0.  , 76.2 , 80.5 ],
       [20.26, 73.31, 76.2 ,  0.  , 67.81],
       [84.76, 55.69, 80.5 , 67.81,  0.  ]])

We are going to use the distance.cdist() command to determine the euclidean distances between our sets of customers and facility coordinates.

In a real application, we will instead by using a pathfinder, or a travel time database to obtain realistic values that reflect the structure of the network.

The cell above calculates a distance matrix between all facilitiy locations. You might also notice that we are also using the round() function, as the original values will have a very large number of decimals, and would not render well when printed out.

In [15]:
dist_matrix = distance.cdist(cust_coord, facil_coord, 'euclidean')

dist_matrix.shape
Out[15]:
(40, 5)

With the above command we have now calculated the distance matrix between the set of customers and facilities.

The shape property is used check the size of the matrix. This appears to be 40 x 5, as expected.

In [16]:
S = [('S'+str(i+1)) for i in range(num_facility)]
D = [('C'+str(i+1)) for i in range(num_customer)]

With the above in-line for loop, we are generating the indices for each customer and facility location element, to be saved in D and S, respectively.

You can check out their contents int he cell below:

In [17]:
print(f'The S array contains the following: \n{S}\n')

print(f'While D, contains: \n{D}')
The S array contains the following: 
['S1', 'S2', 'S3', 'S4', 'S5']

While D, contains: 
['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10', 'C11', 'C12', 'C13', 'C14', 'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21', 'C22', 'C23', 'C24', 'C25', 'C26', 'C27', 'C28', 'C29', 'C30', 'C31', 'C32', 'C33', 'C34', 'C35', 'C36', 'C37', 'C38', 'C39', 'C40']
In [18]:
d = {customer:{facility: round(dist_matrix[i][j],2) for j,facility in enumerate(S)} for i,customer in enumerate(D)}

With another inline for loop (a nested one, in this case), we have generated a dictionary of distances between customers i and facilities j. For visualisation purposes, we have rounded again the values to two decimals.

In [19]:
d
Out[19]:
{'C1': {'S1': 104.73, 'S2': 26.92, 'S3': 47.4, 'S4': 98.4, 'S5': 63.35},
 'C2': {'S1': 83.04, 'S2': 25.2, 'S3': 52.49, 'S4': 72.12, 'S5': 32.17},
 'C3': {'S1': 53.66, 'S2': 42.74, 'S3': 58.85, 'S4': 39.57, 'S5': 32.6},
 'C4': {'S1': 43.83, 'S2': 34.55, 'S3': 38.59, 'S4': 39.68, 'S5': 55.31},
 'C5': {'S1': 69.77, 'S2': 15.1, 'S3': 14.4, 'S4': 68.75, 'S5': 66.12},
 'C6': {'S1': 78.67, 'S2': 1.4, 'S3': 29.31, 'S4': 73.52, 'S5': 54.57},
 'C7': {'S1': 53.82, 'S2': 24.71, 'S3': 33.55, 'S4': 48.82, 'S5': 51.69},
 'C8': {'S1': 33.55, 'S2': 69.01, 'S3': 76.91, 'S4': 13.67, 'S5': 55.18},
 'C9': {'S1': 90.97, 'S2': 33.25, 'S3': 61.02, 'S4': 78.86, 'S5': 29.29},
 'C10': {'S1': 4.28, 'S2': 77.66, 'S3': 71.54, 'S4': 24.11, 'S5': 87.11},
 'C11': {'S1': 11.54, 'S2': 71.27, 'S3': 70.6, 'S4': 10.06, 'S5': 73.44},
 'C12': {'S1': 44.62, 'S2': 36.53, 'S3': 45.04, 'S4': 36.85, 'S5': 48.47},
 'C13': {'S1': 61.2, 'S2': 48.4, 'S3': 67.4, 'S4': 44.97, 'S5': 23.63},
 'C14': {'S1': 83.34, 'S2': 16.31, 'S3': 16.77, 'S4': 82.26, 'S5': 71.99},
 'C15': {'S1': 29.81, 'S2': 53.39, 'S3': 58.06, 'S4': 19.92, 'S5': 56.01},
 'C16': {'S1': 17.06, 'S2': 62.17, 'S3': 56.45, 'S4': 25.92, 'S5': 77.17},
 'C17': {'S1': 37.74, 'S2': 66.01, 'S3': 75.58, 'S4': 18.33, 'S5': 50.16},
 'C18': {'S1': 94.83, 'S2': 33.39, 'S3': 21.46, 'S4': 96.38, 'S5': 88.95},
 'C19': {'S1': 26.52, 'S2': 52.08, 'S3': 52.21, 'S4': 24.0, 'S5': 63.44},
 'C20': {'S1': 44.06, 'S2': 58.02, 'S3': 70.22, 'S4': 26.34, 'S5': 41.47},
 'C21': {'S1': 100.93, 'S2': 22.91, 'S3': 40.18, 'S4': 96.01, 'S5': 66.83},
 'C22': {'S1': 71.39, 'S2': 7.0, 'S3': 28.26, 'S4': 66.31, 'S5': 52.66},
 'C23': {'S1': 77.33, 'S2': 12.07, 'S3': 16.27, 'S4': 75.7, 'S5': 67.06},
 'C24': {'S1': 61.58, 'S2': 19.02, 'S3': 35.42, 'S4': 54.77, 'S5': 46.02},
 'C25': {'S1': 69.41, 'S2': 8.81, 'S3': 24.98, 'S4': 65.32, 'S5': 55.61},
 'C26': {'S1': 68.06, 'S2': 15.25, 'S3': 15.65, 'S4': 66.84, 'S5': 64.87},
 'C27': {'S1': 20.28, 'S2': 73.34, 'S3': 76.23, 'S4': 0.03, 'S5': 67.81},
 'C28': {'S1': 22.32, 'S2': 56.23, 'S3': 55.45, 'S4': 21.14, 'S5': 66.4},
 'C29': {'S1': 42.99, 'S2': 70.99, 'S3': 82.02, 'S4': 22.86, 'S5': 49.05},
 'C30': {'S1': 59.44, 'S2': 51.94, 'S3': 27.63, 'S4': 69.22, 'S5': 95.97},
 'C31': {'S1': 67.96, 'S2': 10.25, 'S3': 24.87, 'S4': 63.98, 'S5': 55.63},
 'C32': {'S1': 79.77, 'S2': 1.66, 'S3': 28.55, 'S4': 74.91, 'S5': 56.06},
 'C33': {'S1': 54.44, 'S2': 48.39, 'S3': 26.56, 'S4': 63.32, 'S5': 90.27},
 'C34': {'S1': 71.76, 'S2': 7.04, 'S3': 23.54, 'S4': 68.02, 'S5': 57.45},
 'C35': {'S1': 83.32, 'S2': 14.56, 'S3': 42.45, 'S4': 75.02, 'S5': 43.6},
 'C36': {'S1': 42.73, 'S2': 62.97, 'S3': 74.5, 'S4': 23.81, 'S5': 44.47},
 'C37': {'S1': 92.89, 'S2': 15.5, 'S3': 39.55, 'S4': 86.59, 'S5': 56.29},
 'C38': {'S1': 43.98, 'S2': 35.94, 'S3': 43.01, 'S4': 37.4, 'S5': 50.79},
 'C39': {'S1': 61.56, 'S2': 39.32, 'S3': 15.83, 'S4': 67.67, 'S5': 85.58},
 'C40': {'S1': 27.14, 'S2': 72.97, 'S3': 58.96, 'S4': 44.34, 'S5': 96.98}}
In [20]:
x = LpVariable.dicts('x', (D,S), lowBound = 0, upBound = 1, cat = LpInteger)
y = LpVariable.dicts('y', S, lowBound = 0, upBound = 1, cat = LpInteger)

With the above we have created our sets of decision variables, both defined as Booleans (integers with values between 0 and 1).

In [21]:
prob += lpSum([lpSum([x[i][j]*d[i][j] for j in S]) for i in D])

The above expression defines our objective, which minimises the product of distances with the decision variables that indicate whether a facility and customer are connected.

In [22]:
for i in D:
    prob += lpSum([x[i][j] for j in S]) == 1

Contraint 1: Ensure that only one facility serves each customer.

In [23]:
prob += lpSum([y[j] for j in S]) == 1

Contraint 2: Ensure that only one facility is open.

In [24]:
for i in D:
    for j in S:
        prob += x[i][j] <= y[j]

Contraint 3: A facility will not serve a customer if it has not been opened.

In [25]:
status = prob.solve()

print(f'STATUS\n{LpStatus[status]}\n')
Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/pa01/opt/anaconda3/lib/python3.9/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/kk/14m2mg8s0w9c_tln749srnrw0000gp/T/6e769006fd6547e694d78c7a4a868341-pulp.mps timeMode elapsed branch printingOptions all solution /var/folders/kk/14m2mg8s0w9c_tln749srnrw0000gp/T/6e769006fd6547e694d78c7a4a868341-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 246 COLUMNS
At line 1462 RHS
At line 1704 BOUNDS
At line 1910 ENDATA
Problem MODEL has 241 rows, 205 columns and 605 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 1494.27 - 0.00 seconds
Cgl0005I 41 SOS with 205 members
Cgl0004I processed model has 241 rows, 205 columns (205 integer (205 of which binary)) and 605 elements
Cutoff increment increased from 1e-05 to 0.00999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 1494.27
Cbc0038I Before mini branch and bound, 205 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.01 seconds)
Cbc0038I After 0.01 seconds - Feasibility pump exiting with objective of 1494.27 - took 0.00 seconds
Cbc0012I Integer solution of 1494.27 found by feasibility pump after 0 iterations and 0 nodes (0.01 seconds)
Cbc0001I Search completed - best objective 1494.27, took 0 iterations and 0 nodes (0.01 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 1494.27 to 1494.27
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                1494.27000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.01
Time (Wallclock seconds):       0.01

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock seconds):       0.01

STATUS
Optimal

In [26]:
for j,facility in enumerate(S):
    if y[facility].varValue == 1:
        chosen_facil = j
        print(f'We will be opening a facility in {facility}')
We will be opening a facility in S2

With the above loop, we are iterating through all decision variables in y to establish whether a facility has been opened.

In [27]:
tran_cost = 0;

for j,facility in enumerate(S):
    for i,customer in enumerate(D):
        if x[customer][facility].varValue == 1:
            tran_cost += dist_matrix[i][j]

print(f'The objective value of the problem (total transportation cost) is {value(prob.objective)}');
The objective value of the problem (total transportation cost) is 1494.27

Summary¶

from pulp import *
from scipy.spatial import distance

# Calculate distance matrix 
dist_matrix = distance.cdist(cust_coord, facil_coord, 'euclidean')

# Initialise problem
prob = LpProblem('prob', LpMinimize)

S = [('S'+str(i+1)) for i in range(num_facility)]
D = [('C'+str(i+1)) for i in range(num_customer)]

# Prepare distance matrix parameters
d = {customer:{facility:dist_matrix[i][j] for j,facility in enumerate(S)} for i,customer in enumerate(D)}

# Define decision variables
x = LpVariable.dicts('x', (D,S), lowBound = 0, upBound = 1, cat = LpInteger)
y = LpVariable.dicts('y', S, lowBound = 0, upBound = 1, cat = LpInteger)

# Objective function
prob += lpSum([lpSum([x[i][j]*d[i][j] for j in S]) for i in D])

# Constraint 1: Each customer will be served by 1 facility
prob += lpSum([y[j] for j in S]) == 1

# Constraint 2: Only 1 facility will be established
prob += lpSum([y[j] for j in S]) == 1

# Constraint 3: Only established facilities can serve customers
for i in D:
    for j in S:
        prob += x[i][j] <= y[j]

# Solve the model
status = prob.solve()
In [28]:
plt.scatter(cust_coord[:, 0], 
            cust_coord[:, 1], 
            s=plot_size*2, 
            cmap='viridis');

plt.scatter(facil_coord[:, 0], 
            facil_coord[:, 1], 
            s=plot_size*5, 
            cmap='viridis');

plt.scatter(facil_coord[chosen_facil, 0], 
            facil_coord[chosen_facil, 1], 
            s=plot_size*10, 
            cmap='viridis');

In green color, the above graph demonstrates the location of the chosen facility.