adventures in optimization

The charming adventures of an analyst and his solver.

« Thursday, June 9, 2011 »

A Few Notes on Deterministic vs. Stochastic Simulation

Mental meanderings about simulation and how to get stakeholders to understand the importance of randomness.

I find I have to build simulations with increasing frequency in my work and life. Usually this indicates I'm faced with one of the following situations:

  1. The need for a quick estimate regarding the quantitative behavior of some situation.
  2. The desire to verify the result of a computation or assumption.
  3. A situation which is too complex or random to effectively model or understand.

Anyone familiar at all with simulation will recognize the last item as the motivating force of the entire field. Simulation models tend to take over when systems become so complex that understanding them is prohibitive in cost and time or entirely infeasible. In a simulation, the modeler can focus on individual interactions between entities while still hoping for useful output in the form of descriptive statistics.

As such, simulations are nearly always stochastic. The output of a simulation, whether it be the mean time to service upon entering a queue or the number of fish alive in a pond, is determined by a number of random inputs. It is estimated by looking at a sample of the entire, often infinite, problem space and therefore must be described in terms of mean and variance.

For me, simulation building usually follows a process roughly like this:

  1. Work with a domain expert to understand the process under study.
  2. Convert this process into a deterministic simulation (no randomness).
  3. Verify the output of the deterministic simulation.
  4. Anlyze the inputs of the simulation to determine their probability distributions.
  5. Convert the deterministic simulation to a stochastic simulation.

The reason for creating a simulation without randomness first is that it can be difficult or impossible to verify its correctness otherwise. Thus one may focus on the simulation logic first before analyzing and adding sources of randomness.

Where the procedure breaks down is after the third step. Domain experts are often happy to share their knowledge about systems to aid in designing simulations, and typically can understand the resulting abstractions. They are also invaluable in verifying simulation output. However, they are unlikely to understand why it is necessary to add randomness to a system that they already perceive as functional. Further, doing so can be just as difficult and time consuming as the initial model development and therefore requires justification.

This can be a quandary for the model builder. How does one communicate the need to incorporate randomness to decision makers who lack understanding of probability? It is trivially easy to construct simulations that use the same input parameters but yield drastically different outputs. Consider the code below, which simulates two events occurring and counts the number of times event b happens before event a.

import random

def sim_stochastic(event_a_lambda, event_b_lambda):
    # Returns 0 if event A arrives first, 1 if event B arrives first

    # Calculate next arrival time for each event randomly.
    event_a_arrival = random.expovariate(event_a_lambda)
    event_b_arrival = random.expovariate(event_b_lambda)

    return 0.0 if event_a_arrival <= event_b_arrival else 1.0

def sim_deterministic(event_a_lambda, event_b_lambda):
    # Returns 0 if event A arrives first, 1 if event B arrives first

    # Calculate next arrival time for each event deterministically.
    event_a_arrival = 1.0 / event_a_lambda
    event_b_arrival = 1.0 / event_b_lambda

    return 0.0 if event_a_arrival <= event_b_arrival else 1.0

if __name__ == '__main__':
    event_a_lambda = 0.3
    event_b_lambda = 0.5

    repetitions = 10000

    for sim in (sim_stochastic, sim_deterministic):
        output = [
            sim(event_a_lambda, event_b_lambda)
            for _ in range(repetitions)
        ]
        event_b_first = 100.0 * (sum(output) / len(output))
        print('event b is first %0.1f%% of the time' % event_b_first)

Both simulations use the same input parameter, but the second one is essentially wrong as b will always happen first. In the stochastic version, we use exponential distributions for the inputs and obtain an output that verifies our basic understanding of these distributions.

event b is first 63.0% of the time
event b is first 100.0% of the time

How about you? How do you discuss the need to model a random world with decision makers?