adventures in optimization

The charming adventures of an analyst and his solver.

« Thursday, October 8, 2009 »

Easy Monte Carlo Simulation in Python

A quick introduction to writing and interpreting Monte Carlo simulations in Python.

One of the most useful tools one learns in an Operations Research curriculum is Monte Carlo Simulation. Its utility lies in its simplicity: one can learn vital information about nearly any process, be it deterministic or stochastic, without wading through the grunt work of finding an analytical solution. It can be used for off-the-cuff estimates or as a proper scientific tool. All one needs to know is how to simulate a given process and its appropriate probability distributions and parameters if that process is stochastic.

Here's how it works:

  • Construct a simulation that, given input values, returns a value of interest. This could be a pure quantity, like time spent waiting for a bus, or a boolean indicating whether or not a particular event occurs.
  • Run the simulation a, usually large, number of times, each time with randomly generated input variables. Record its output values.
  • Compute sample mean and variance of the output values.
In the case of time spent waiting for a bus, the sample mean and variance are estimators of mean and variance for one's wait time. In the boolean case, these represent probability that the given event will occur.

One can think of Monte Carlo Simulation as throwing darts. Say you want to find the area under a curve without integrating. All you must do is draw the curve on a wall and throw darts at it randomly. After you've thrown enough darts, the area under the curve can be approximated using the percentage of darts that end up under the curve times the total area.

This technique is often performed using a spreadsheet, but that can be a bit clunky and may make more complex simulations difficult. I'd like to spend a minute showing how it can be done in Python. Consider the following scenario:

Passengers for a train arrive according to a Poisson process with a mean of 100 per hour. The next train arrives exponentially with a rate of 5 per hour. How many passers will be aboard the train?

We can simulate this using the fact that a Poisson process can be represented as a string of events occurring with exponential inter-arrival times. We use the sim() function below to generate the number of passengers for random instances of the problem. We then compute sample mean and variance for these values.

import random

TRAINS     =   5.0

def sim():
    passengers = 0.0

    # Determine when the train arrives
    train = random.expovariate(TRAINS)

    # Count the number of passenger arrivals before the train
    now = 0.0
    while True:
        now += random.expovariate(PASSENGERS)
        if now >= train:
        passengers += 1.0

    return passengers

if __name__ == '__main__':
    output = [sim() for _ in xrange(ITERATIONS)]

    total = sum(output)
    mean = total / len(output)

    sum_sqrs = sum(x*x for x in output)
    variance = (sum_sqrs - total * mean) / (len(output) - 1)

    print 'E[X] = %.02f' % mean
    print 'Var(X) = %.02f' % variance

Running this code yields the intuitive result: we expect 20 passengers on the train. While this is a very simple example, we could easily replace the sim() function with a simulator of any system of arbitrary complexity and get a pretty good estimation of its behavior.