At a Nextmv tech talk a couple weeks ago, I showed a least absolute deviations (LAD) regression model using OR-Tools. This isn’t new – I pulled the formulation from Rob Vanderbei’s “Local Warming” paper, and I’ve shown similar models at conference talks in the past using other modeling APIs and solvers.

There are a couple reasons I keep coming back to this problem. One is that it’s a great example of how to build a machine learning model using an optimization solver. Unless you have an optimization background, it’s probably not obvious you can do this. Building a regression or classification model with a solver directly is a great way to understand the model better. And you can customize it in interesting ways, like adding epsilon insensitivity.

Another is that least squares, while most commonly used regression form, has a fatal flaw: it isn’t robust to outliers in the input data. This is because least squares minimize the sum of squared residuals, as shown in the formulation below. Here, $A$ is an $m \times n$ matrix of feature data, $b$ is a vector of observations to fit, and $x$ is a vector of coefficients the optimizer must find.

$$ \min f(x) = \Vert Ax-b \Vert^2 $$

Since the objective function minimizes squared residuals, outliers have a much bigger impact than other data. LAD regression solves this by simply summing the values of the residuals as they are.

$$ \min f(x) = \vert Ax-b \vert $$

So why isn’t this used more? Simple – least squares has a convenient analytical solution, while LAD requires an algorithm to solve. For instance, you can formulate LAD regression as a linear program, but now you need a solver.

$$ \begin{align*} \min \quad & 1’z \\ \text{s.t.}\ \quad & z \ge Ax - b \\ & z \ge b - Ax \end{align*} $$

While I like using this example, it paints a rather negative picture of squaring. If it does funny things to solvers, is there any good reason to square? Thus I’ve been on the lookout for a practical example where squaring a variable or expression makes a model more useful.

Luckily for me, Erwin Kalvelagen recently posted about using optimization to schedule team meetings. This is an application where minimizing squared values of overbooking can be beneficial – it may be worse to be triple booked than double booked.

I won’t recreate the reasoning behind Erwin’s post here. You can read his blog for that. What we’ll do is look at both the formulations in his post, along with a couple extras using Julia for code, JuMP for modeling, SCIP for optimization, and Gadfly for visualization. All model code and data are linked in the resources section at the end.

Maximize attendance

To start off, I built a new data set, which you can find in the resources section. This differentiates team membership between two types of employees: individual contributors (starting with ic in the data), who attend meetings for 1 or 2 teams, and managers (prefixed with mgr), who attend meetings to coordinate across multiple teams. We schedule meetings for 10 teams (prefix t) into 3 time slots (s).

The first model in Erwin’s post maximizes attendance. This means it tries to schedule team members for as many unique time slots as possible. It doesn’t consider overbooking.

$$ \begin{align*} \max\quad & \sum_{i,s} y_{i,s} \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & y_{i,s} \le \sum_{t} m_{i,t}\ x_{t,s} &\quad\forall&\ i,s & \text{individuals attend team meetings}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & y_{i,s} \in \{0,1\} &\quad\forall&\ i,s \end{align*} $$

This yields the following team schedule, with red representing a scheduled team meeting.

Maximize attendance - team schedules

If we look at the manager schedules, we’ll see that every manager is completely booked. This makes sense. That’s what managers do, right? Go to meetings?

Maximize attendance - manager attendance

Minimize overbooking

The model gets more interesting once we account for overbooking. Erwin’s post has a model that minimizes overbooking, where overbooking is the number of additional meetings in a time slot. If a team member is double booked, that’s 1 overbooking. If they are triple booked, that’s 2 overbookings.

Sum of overbooking

The second model in Erwin’s post minimizes the sum of all overbookings. He does this by adding a continuous c vector that only incurs value once a team member goes over a single meeting in a given time slot.

$$ \begin{align*} \min\quad & \sum_{i,s} c_{i,s} \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & c_{i,s} \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{measure overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & c_{i,s} \ge 0 &\quad\forall&\ i,s \end{align*} $$

Given our data this results in the following team schedule, which is probably not all that interesting. I’ll leave this visualization out from now on.

Minimize overbooking - team schedules

Where it gets interesting is plotting overbookings for the managers. Here we see that 3 manager time slots are triple booked (red), while 8 are double booked (gray).

Minimize overbooking - manager overbooking

Sum of squared overbooking

Let’s say it’s worse to triple book (or, gasp, quadruple book) than to double book. How can the model account for this? One answer, if you have a MIQP-enabled solver, is to simply square the c values.

$$ \begin{align*} \min\quad & \sum_{i,s} c_{i,s}^2 \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & c_{i,s} \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{measure overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & c_{i,s} \ge 0 &\quad\forall&\ i,s \end{align*} $$

This completely eliminates triple booking, as shown below. No manager is worse off than being double booked, which seems normal given my experiences.

Minimize squared overbooking - manager overbooking

The problem with this is that the solver now takes a lot longer. It’s not bad for the data in this example, but if you try it with something larger you’ll see what I mean. You can find the data generator code in the resources section.

Constrained bottleneck

So how can we do something similar without the computational cost? One option is to continue using MILP formulations, but in the context of hierarchical optimization. This means splitting the model into two. First, we try to minimize the maximum overbookings for any team member (the bottleneck, if you will). This involves adding a variable $b$ representing that maximum.

$$ b = \max\Bigl\{\sum_{t} m_{i,t}\ x_{t,s} - 1 : i \in I, s \in S \Bigr\} $$

Now we can simply minimize $b$ using a MILP instead of a MIQP.

$$ \begin{align*} \min\quad & b \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & b \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{maximum overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s \end{align*} $$

Once we solve the first model, we get the minimal value of $b$, which we call $b^*$. We can simply use $b^*$ as an upper bound for overbookings in the second original model.

$$ \begin{align*} \min\quad & \sum_{i,s} c_{i,s} \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & c_{i,s} \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{measure overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & 0 \le c_{i,s} \le b^* &\quad\forall&\ i,s \end{align*} $$

As we see below, this model also eliminates triple bookings, and it’s quite a bit faster to solve than the MIQP.

Minimize bottleneck - manager overbooking

Resources