🐍 Monte Carlo Simulation in Python

Note: This post was updated to work with Python 3.

One of the most useful tools one learns in an Operations Research curriculum is Monte Carlo Simulation. Its utility lies in its simplicity: one can learn vital information about nearly any process, be it deterministic or stochastic, without wading through the grunt work of finding an analytical solution. It can be used for off-the-cuff estimates or as a proper scientific tool. All one needs to know is how to simulate a given process and its appropriate probability distributions and parameters if that process is stochastic.

Here’s how it works:

Construct a simulation that, given input values, returns a value of interest. This could be a pure quantity, like time spent waiting for a bus, or a boolean indicating whether or not a particular event occurs.
Run the simulation a, usually large, number of times, each time with randomly generated input variables. Record its output values.
Compute sample mean and variance of the output values.

In the case of time spent waiting for a bus, the sample mean and variance are estimators of mean and variance for one’s wait time. In the boolean case, these represent probability that the given event will occur.

One can think of Monte Carlo Simulation as throwing darts. Say you want to find the area under a curve without integrating. All you must do is draw the curve on a wall and throw darts at it randomly. After you’ve thrown enough darts, the area under the curve can be approximated using the percentage of darts that end up under the curve times the total area.

This technique is often performed using a spreadsheet, but that can be a bit clunky and may make more complex simulations difficult. I’d like to spend a minute showing how it can be done in Python. Consider the following scenario:

Passengers for a train arrive according to a Poisson process with a mean of 100 per hour. The next train arrives exponentially with a rate of 5 per hour. How many passers will be aboard the train?

We can simulate this using the fact that a Poisson process can be represented as a string of events occurring with exponential inter-arrival times. We use the sim() function below to generate the number of passengers for random instances of the problem. We then compute sample mean and variance for these values.

import random

PASSENGERS = 100.0
TRAINS     =   5.0
ITERATIONS = 10000

def sim():
    passengers = 0.0

    # Determine when the train arrives
    train = random.expovariate(TRAINS)

    # Count the number of passenger arrivals before the train
    now = 0.0
    while True:
        now += random.expovariate(PASSENGERS)
        if now >= train:
            break
        passengers += 1.0

    return passengers

if __name__ == '__main__':
    output = [sim() for _ in range(ITERATIONS)]

    total = sum(output)
    mean = total / len(output)

    sum_sqrs = sum(x*x for x in output)
    variance = (sum_sqrs - total * mean) / (len(output) - 1)

    print('E[X] = %.02f' % mean)
    print('Var(X) = %.02f' % variance)