(This is the first of two articles about normal distributions. The second article is, How to Create Monte Carlo Models and Forecasts Using Excel Data Tables.)
Back when I created models and forecasts for employers, I KNEW that my results were going to be wrong, and I had no practical way to express any degree of uncertainty in my results.
For example, if I were forecasting profits for a period, my spreadsheet would use one number for my forecast of sales in a period, another number for my forecast of operating expenses, another number for my cost-of-goods-sold percentage, and so on.
I knew that each predicted number in my forecast probably would land somewhere between a best-case and worst-case value…probably as determined by a normal distribution as shown in the Excel chart above.
But my models offered no way for me to include such probabilities. And they offered no way to assess or reduce my degree of uncertainty about my final forecast.
And also, during my years as an onsite Excel consultant, I saw many models and forecasts created by spreadsheet users from around the world. And in all that time, I never saw a model, or forecast that could have done any better.
Since then, I’ve come to realize that most Excel models and forecasts are actually a form of simulation.
Wikipedia tells us that a simulation is the imitation of the operation of a real-world process or system over time.
The act of simulating something, Wikipedia explains, first requires that a model be developed. This model represents the key characteristics or behaviors of a system or process. The model represents the system itself, and the simulation represents the operation of the system over time.
In general, there are two types of models: deterministic and probabilistic.
- Deterministic models use specific numbers for assumed values, including ratios.
- Stochastic (or “probabilistic”) models rely on random numbers, typically drawn from a normal distribution.
From this perspective, the type of simulation I prepared for my employers, and the type I saw in other companies was a deterministic simulation.
On the other hand, what we should have been using was a stochastic simulation. This would have allowed us to specify our degree of uncertainty about each of our assumptions, and it also would have helped us to evaluate our uncertainty about our final results.
It’s way past time for Excel users to reduce our reliance on deterministic simulations. It’s time to bring stochastic modeling and forecasting in our work.
Two Types of Stochastic Simulations
Loosely speaking, we can divide stochastic simulations into two types, tabular and expanded.
With tabular simulations, you create your entire model in several cells in one row of a spreadsheet. And some of these cells include random numbers.
To create the simulation, you copy your row of formulas down their columns to many thousands of rows. And then, to analyze your simulation, you analyze those many different results generated by those random numbers in each row of your table of simulations.
For example, suppose that your company places a variable number of online ads each month, which generate a variable number of visitors to your web site. And suppose that a variable number of visitors is needed to generate each sale of a variable amount.
So in several cells in a row, you could simulate one month of activity, using random numbers to define each degree of variability. You could copy those formulas down their columns to create a table with thousands of possible results. And then, you’d analyze the table to determine what your average sales would be for a period, and how variable that estimate might be.
The expanded approach is different. In this approach, you create a model that can be as detailed as you would like it to be. You could use as many rows in as many worksheets as you want for your model. And then you run that model many times and summarize the results.
Typically, this is known as a Monte Carlo analysis.
But unlike most models created in Excel, the Monte Carlo analysis uses random numbers to generate key assumptions.
For example, if your best-possible sales one month would be 120, and your worst-possible sales would be 80, you’d use a random number to choose between those limits. Or if your highest-possible cost of sales would be 40% and your lowest-possible cost would be 30%, you’d use another random number to choose between those limits.
With this approach, your model gives you a revised forecast each time you recalculate your workbook.
You easily can recalculate this model as many times as you want and capture the results from each calculation…automatically. And you can do it without programming. I’ll show you how to do it in my next article, How to Create Monte Carlo Models and Forecasts Using Excel Data Tables.
But for now, let’s dig into a critical ingredient of both types of simulations: random numbers. First, I’ll show you the obvious method, which you seldom should use for your simulations. Then I’ll show you the method you should use most of the time.
Excel’s Two Random Number Functions
Excel offers two functions that generate random numbers…
- RAND() returns a random number between 0 and 1.
- RANDBETWEEN(bottom, top) returns a random integer between the bottom and top arguments.
Both functions return results that have the same chance of appearing anywhere between the bottom and top boundaries.
To illustrate, I copied the RAND() function and pasted it to 10,000 cells in one column. I grouped its values into 10 groups of equal-size bins, then I created a histogram that shows how many times a value appeared in each bin.
The chart on the left shows the results after I pressed F9 the first time to recalculate my workbook. The first column in the first chart shows that RAND generated a value between 0 and .10 about 1010 times in those 10,000 rows. And the last column in that chart shows that RAND generated a value between .9 and 1.0 about 990 times. And the other chart shows the results after I pressed F9 again.
As you can see, the results for each column are grouped around 1000, which is each column’s equal share of the results.
If I had used RANDBETWEEN rather than RAND, I would have seen a similar pattern.
The problem with using either RAND or RANDBETWEEN for your simulations is that we want results that look about like this Excel chart.
That is, once we define the boundaries of our assumptions, we typically want the random numbers to be center-weighted.
So how can we do this? How can we return a random number from a normal distribution?
Random Numbers from a Normal Distribution
To return a random number from a normal distribution, we rely mostly on the NORM.INV function, which uses this syntax:
=NORM.INV(probability, Mean, standard_dev)
The probability argument is easy to specify. Probabilities have values from zero to one, just like the RAND function generates. Therefore, if we can figure out how to calculate the Mean and the standard deviation, we can use this formula to return a random number from a normal distribution:
=NORM.INV(RAND(), Mean, standard_dev)
If you’re a statistical superstar, you’ll have no problem calculating the Mean and standard deviation of your data for use in this formula. But if you’re like most of us Excel users, you need an easier approach. Here it is…
Take another look at this Excel chart. The light blue area shows one standard deviation on each side of the mean.
As the first label below the chart illustrates, that light-blue area represents about 68% of the total area.
Together, the light- and medium-blue areas show what’s within two standard deviations of the mean. And the second label below the chart shows that this represents about 95% of the total area.
So think about what that means. Suppose your original forecast shows that sales next month will be 100. But if your job depended on it, what do you think your best-case and worst-case estimates might be for those sales? After you really think about it, let’s say you estimate that sales could be as high as 120 and as low as 70.
Those two numbers define the outer boundaries of the medium blue areas in the figure. That is, your estimate says that there’s about a 95% chance that actual sales will be between 70 and 120.
- Your mean is the average of 120 and 70, which is 95 in this case.
- Your standard deviation is the difference between 120 and 70, divided by 4 standard deviations, which is 12.5 in this case.
Therefore, here’s your formula to return a random number from a normal distribution with a mean of 95 and a standard deviation of 12.5:
=NORM.INV(RAND(), 95, 12.5)
Now let’s check whether this formula gives us the results we expect.
Checking the Results with a Histogram
The following two figures aren’t fancy, but they tell a story that you need to understand if you create Excel models or forecasts.
This figure calculates how successfully the previous formula returns numbers from a normal distribution. Here’s how to set it up…
Enter this formula in the cell shown:
A1: =NORM.INV(RAND(), 95, 12.5)
We want to copy this cell down the column to cell A10000. Here’s a fast way to do this:
- Copy cell A1.
- Press the F5 function key to launch the GoTo dialog.
- In the Reference edit box, enter A10000, then press Enter to select cell A10000.
- Hold down your Ctrl and Shift keys, then press the up-arrow key, which will select all cells from A10000 through A1.
- Press Ctrl+v to paste cell A1 to the selected cells.
The cells in column C show the min and max values in column A. Enter the formulas in the cells shown:
Column D shows nine numbers to use as a visual guide. Enter them any way you want.
The first formula in cell E2 merely references the min value:
The next formula calculates the minimum value plus one-ninth of the distance between the min and max values:
Copy the formula in cell E3 downward as shown. Notice that cell E11 equals the max value in cell C3.
Set Up the Data for the Histogram
Column F contains the data that we’ll display in our new histogram. We’ll use the FREQUENCY function to generate it.
To do so, first select the range F3:F11, then type this formula into your formula bar:
Now array-enter this formula. That is, hold down your Ctrl and Shift keys and then press Enter.
After you do so, you should have a pattern similar to the data shown in the figure above. That is, the cells in the middle of the formulas in column F should display much larger numbers than the cells at each end.
Create the Histogram
The final step of the test is to create the histogram, shown here.
To do so, make sure that the range F3:F11 is still selected, then choose Insert, Charts, Column Chart, Clustered Column.
After you do so, Excel will create a chart somewhat like this one. If you’re using Excel 2010 or before, your chart might not include a title.
You certainly could make this chart look better, but that’s not necessary for the test. The chart easily demonstrates that we’ve done what we wanted by combining the NORM.INV and RAND functions: We now have a way to return random numbers from a normal distribution.
In How to Create Monte Carlo Models and Forecasts Using Excel Data Tables, I show you how to build on this foundation to create probabilistic simulations using the Monte Carlo method.