(This is the second of two articles about normal distributions. The first article is, How to Return Random Numbers from a Normal Distribution for Your Excel Forecasts.)
During more than three decades of working with spreadsheets in business, most of the spreadsheet models and forecasts I’ve seen have used what statisticians call a deterministic method.
To illustrate, if I were forecasting profits for a period, a deterministic model would use one number for my forecast of sales, another number for my forecast of operating expenses, and so on.
Instead, a more useful method would take a probabilistic approach, supported by the Monte Carlo method.
In the past, without thinking much about it, I wrote macros to support the Monte Carlo method.
But I was wrong back then. There’s a much easier way.
The Monte Carlo Method
The Monte Carlo method supports a “stochastic” or “probabilistic” system.
According to Wikipedia, “a purely stochastic system is one whose state is randomly determined, having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.”
This figure illustrates a normal probability distribution, which probably is the best approach for most business use.
To use this method, you first set up your key assumptions to be defined by this curve. The models in your workbook then randomly select values from this normal distribution, use those values in their calculations, record key results, repeat the process many times, and then summarize your record of results. This process is known as the Monte Carlo method.
In this article, I’m going to show you how to do this using Excel Data Tables to record the results produced by each calculation.
Before I get into the details, however, I need to mention two things.
First, whenever you open a Monte Carlo analysis that uses data tables, make sure that the Monte Carlo workbook is the only workbook open. This is because it will need to recalculate many times, and if you have other workbooks open they also will recalculate, needlessly. And that could make your simulation VERY SLOW.
Second, you might grumble about my examples.
If you work in Finance, you’ll probably grumble that the following examples vastly oversimplify what is typically a complex financial-modeling process. In this case, I’ll respond that I’m making the model very simple so you can understand the changes I’m suggesting to your standard modeling process.
And if you don’t work in Finance, you’ll probably grumble that I should have used an example from your own specialty, not Finance. In this case, I’ll respond that I’m using a simple income statement as the example because even if you work in engineering, or operations, or marketing, or wherever, you understand a simple income statement. Therefore, you’ll be able to understand what the model is doing, and you’ll be able to adapt my techniques to your own models and forecasts.
The Standard Forecasting Method
This figure illustrates a deterministic forecast…the standard method in my experience. It essentially asks, “If we have Sales of 100, Cost of Goods Sold of 35%, Operating Expenses of 45, and Taxes of 25%, what will our Profits be?”
Unlike most such forecasts, this one states its assumptions explicitly…in column F.
In real life, of course, each assumption might be supported by separate analyses and forecasts.
Again, the problem with this approach is that we know the forecast will be incorrect, because most forecasts are incorrect, and we have no way to express how far wrong the profit forecast might reasonably be.
So let’s fix this forecast…
Begin the Stats Table
This figure illustrates the Stats Table, which I’ve set up in a new workbook.
This table translates our four key assumptions into five results that we can use for each iteration of our forecast.
Let’s look at the Sales assumptions…
To calculate a random number from a normal curve of potential sales, we need to know the mean and standard deviation of our sales curve. If you can calculate those values directly, you could enter them into cells E5 and F5 directly.
As the numbers at the bottom of this figure illustrate, one standard deviation from the mean represents about 68% of the potential results from a normal curve. And two standard deviations represent about 95%.
Therefore, if we estimate the highest-feasible amount of sales, we could say that the number represents the second standard deviation above the mean, and enter it in cell C5 in the Stats Table, repeated below. And we could say that our estimate of the lowest-feasible amount of sales represents the second standard deviation below the mean, and enter that number in cell D5 in the table.
And just to double-check our assumptions, we’re saying that there’s a 95% chance that our actual sales will be somewhere between these two numbers.
So now, the average of the max and min values is the mean, as calculated by this formula:
And the standard deviation is merely one-fourth the range between the max and min values, as calculated by this formula:
And now, we need Excel to return a random number from the normal distribution that’s defined by the mean value in cell E5 and the standard deviation in cell F5. To do so, we use this formula:
Now copy the range E5:H5 downward, as shown in the table. Then enter the labels shown in column I.
To assign these labels as names for the adjacent cells in column H, first select the range H5:I8. Choose Formulas, Defined Names, Create from Selection, or press Ctrl+Shift+F3. In the Create Names dialog, make sure that only Right Column is checked, then choose OK.
For convenience, these four names begin with “c.”, which stands for Current Results. Because Excel sorts names alphabetically in most lists, this will group those four names together in those lists, so you can find them easily. Also, when we use these names in formulas, we’ll have no doubt that they came from the Current Results section of the Stats Table.
Before we can set up the last two columns of the Stats Table, we need to complete the model.
Set Up the Stochastic Model
This figure illustrates the model we’ll use. Each of the four formulas in the Assumptions data column references one of the four values you just named in the Stats Table.
To create this figure, add a new worksheet to your Monte Carlo workbook, and name the worksheet Model.
Now enter these formulas in column D:
The formulas in the Amounts section rely on the assumptions in column D:
Finally, to finish the figure, use the Create Names dialog to assign the labels in columns E to the cells on their left, and then do the same for column H.
Notice that each time you recalculate your workbook, the model generates different results. We now need to capture those results automatically for many recalculations.
Set Up the Data Table
We’re now going to set up a Data Table.
This table will automatically recalculate Excel, return the values for the items named in row 2 of this figure, record those values in row 4, recalculate, record the current items in row 5, and so on…through the last row of the table.
To begin the Data Table, add a new sheet to your Monte Carlo workbook and name it Data. Then enter the labels, which are shown in bold in the preceding figure.
The Seq (sequence) column is convenient for several reasons. To create the column…
- Enter the value 1 in cell B4.
- With cell B4 selected, press Ctrl+Shift+DownArrow to select cell B4 through the last cell in column B.
- Choose Home, Editing, Fill, Series to launch the Series dialog.
- Accept the default Step Value of 1, but enter 5000 as the Stop Value.
- Choose OK.
You now have the beginnings of a table with 5000 rows.
Now enter the following formulas in the cells shown:
It’s now time to set up the Data Table. After you do so, Excel will calculate the workbook 5000 times, because the Data Table will contain 5000 rows. Each row of the completed table will contain the values returned to row 3 of the figure after each calculation.
Before you start your Data Table, however, set your calculation option to manual. To do so, choose Formulas, Calculation, Calculation Options, Manual. By taking this step, you’ll make sure that your Data Table will calculate only when you press the F9 key or save your workbook.
Now, to set up the table so that it displays results that start in cell C4 of the figure below…
- Select cell B3.
- Press Ctrl+Shift+DownArrow to slide down the Seq column to the 5000th row of your table.
- Press Ctrl+. (period), which won’t appear to do anything…yet.
- Hold down your Shift key and then press your right arrow once. When you do so, Excel will extend your selection by one column and also will display the top of your table, as shown here, not the bottom. (This was why you pressed Ctrl plus the period key.) Now tap Shift+RightArrow several times, until all calculations in row 3 are selected, through cell G3.
- Make sure that all other workbooks are closed, because otherwise, the next few steps could take a LONG time.
- Choose Data, Data Tools, What-If Analysis, Data Table.
- In the Data Table dialog, click in the Column Input Cell edit box; click on any empty cell outside of the area of your table; and then choose OK.
- If you forgot to close other workbooks, you can press Esc to interrupt the Data Table’s calculations.
- After a delay of perhaps ten seconds (assuming you don’t press Esc), your Data Table should be completed.
Here’s what’s happened…
Strictly speaking, Data Tables are intended to enter a value into either one or two cells that your model uses for each calculation. But in this case, your model ignores the values entered in the empty cell. Instead, it merely recalculates the model each time a new value is written. The model’s values change each time because of the random numbers that the model contains. And then the Data Table captures the results we’ve specified using the formulas in row 3.
Name the Data Table Columns
Here’s the top of the completed Data Table. So that we can reference the columns of this table easily, we need to name them.
To name the columns, we can’t use the Create Names dialog because we need to exclude row 3 from the defined names. Therefore…
- Select cell C4.
- Press Ctrl+Shift+DownArrow to extend the selection through the bottom row of the table. When you do so, Excel will display the bottom of the table.
- Press Ctrl+Backspace to display your active cell, which still is cell C4.
- Press Ctrl+Alt+F3, or choose Formulas, Defined Names, Define Name, to launch the New Name dialog, which should show the selected range as the default Refers To range.
- To name that range Profits, type the name into the Name box, and then choose OK.
Now repeat this process for the other columns shown in the figure above, from Sales through the TaxRate.
Summarize the Results of the Data Table
We’ll summarize our results in a section to the right of the Data Table, in the area shown here.
To begin, enter the labels shown in this figure. Then use the Create Names dialog to assign the labels in the range I1:I3 to the cells on their right. Then do the same with the labels in the range M1:M2.
The Seq (Sequence) columns, beginning in cells I7 and M7, count the number of bins into which we want to summarize our data. Use the Series dialog (Home, Editing, Fill, Series) to set up a series from 1 through 21 for both sets of sequences.
Now enter these formulas found in the first three rows:
The two Frequency sections at the bottom of the figure above generate the data we’ll use for the histograms in our report.
The ProfBins column contains data that defines the beginning and ending values for the profit bins. Here are the first two formulas in this column:
Copy the formula in cell J8 downward to cell J27. Notice that the value in cell J27 approximately equals the MaxProfits value.
Now select the range J6:K27 and use the Create Names dialog to assign the labels at the top of this range as the names for the two columns.
Similarly, here are the first two formulas in the SalesBins column:
Copy the formula in cell N8 downward to cell N27. The value in cell N27 should approximately equal the MaxSales value.
And as before, select the range N6:O27 and assign the labels at the top of the selection to the two columns below the top row in the selection.
We now can use the FREQUENCY function to return the number of items found in each Profit Bin.
First, select the range K7:K27. Then type in this formula:
Now hold down Ctrl and Shift and then press Enter. This key combination array-enters the formula in the selected range. And this formula returns the number of values in the Profits range that fall into each bin in the ProfBins range.
Similarly, select the range O7:O27, then array-enter this formula:
Now, with our results summarized, we must take one more step before we can create our Monte Carlo forecast.
Complete the Stats Table
You now can complete columns K and L of the Stats Table, shown here.
To complete this table, first enter the following formulas in the cells shown:
(All of these formulas return values from named ranges in the Data Table in the Data worksheet.)
Now enter the labels shown in column L, next to the values. After you’ve done so, use the Create Names dialog to assign the five labels in column L to the cells with the five formulas.
And now, finally, we can set up the Monte Carlo forecast.
An Overview of the Monte Carlo Forecast
This figure shows our forecast generated by the work done so far. Each time you recalculate your workbook, it might change slightly…but it shouldn’t change significantly.
The forecast is divided into four sections…
- The Projected Net Profit shows the simple income statement, which uses the average values for each item shown.
- The Key Percentiles section shows the likely values for both sales and profits. It shows that there’s a 25% chance that sales and profits will be equal to the values shown in the 25% row, or less. It shows that there’s a 50% chance that sales and profits will be equal to the values shown in the 50% row, or less…and so on.
Also, the Loss Percentile value shows that there’s a 17% chance that a loss will be incurred.
- The Sales Histogram shows the distribution of sales results among the 5000 sales forecasts that our model performed.
- The Profits Histogram shows the distribution of the 5000 profits forecasts.
Now let’s create this report…
Create the Projected Net Profit Section
To begin, add another worksheet to your workbook and name the sheet Report. And then enter the labels shown in this figure.
Now enter the formulas shown here:
Whenever you press the F9 key, Excel recalculates 5000 times and calculates new averages, which this forecast displays. But notice that this simple forecast changes very little each time you recalculate.
However, in a more complex model, you’ll probably get more variation. Therefore, you might need to use more than 5000 rows in your Data Table of the Data worksheet.
Create the Key Percentiles Section
This section is easy to set up. To begin, just enter the few labels in the figure, and then enter the percentages shown in the range B15:B18.
Now enter these formulas:
Now copy each of these formulas down their column as illustrated in the figure.
In these formulas, the PERCENTILE.INC function…
…returns the Sales and Profit percentiles for each of the k values (the percentages) shown. For example, 75% of the time profits were 14 or lower.
And from another perspective, when the PERCENTILE.INC function uses a k value of 50%, it returns the median value. That is, like the MEDIAN worksheet function, it returns the number in the middle of a set of numbers.
Finally for this section, the Loss Percentile calculation is basically the mirror image of the PERCENTILE.INC function.
An exact mirror image would begin with a particular value in our column of profit results, and then return the percentile for that value. To illustrate, its formula for profits of zero or less would be…
…where the COUNTIFS function’s arguments are:
However, in this case, we want to know the percentile for profits LESS than zero. So its formula is:
That is, the formula in cell D21 tells us that 15% of the numbers in the Profits range have values that are less than zero.
Create the Histograms
Here are the two histograms again.
These two charts summarize the number of calculations (out of 5000) that produced the sales results shown in the top chart and the profit results shown in the bottom chart.
You’ll find the data for these charts in the Data worksheet in the Sales Frequency and Profits Frequency sections.
Logically, you’d create the Sales Histogram by first selecting the range N6 through the bottom of the SalesItems column, as shown here…
…and then choosing Insert, Charts, Column Chart, 2D, Clustered Column. But if you do so, Excel won’t treat the SalesBins column as data for the X axis. Instead, it treats SalesBins as a second data series and plots both columns of data.
The easiest way I’ve found to get around this problem is to give Excel a clue that you want the SalesBins column to be used as the X axis, and not as a separate data series. To do so, temporarily remove the label in cell N6 from your worksheet. The way I do it is to select cell N6, select the text in my formula bar, press Ctrl+X to cut it, then press Enter.
Now you can select the range N6 through the last cell in the SalesItems column, and choose Insert, Charts, Column Chart, 2D, Clustered Column. When you do so, Excel will generate a histogram as you expect. And then, once your chart is created, select cell N6 again and press Ctrl+v to paste the SalesBins text back into that cell.
Now you can select your chart, press Ctrl+X to cut it, select the Report worksheet, and then paste it where you want it. You then can format the chart to make it pretty.
Wrapping Up the Monte Carlo Analysis
As you think about ways to apply Monte Carlo analysis to your own data, you might wonder what you could do to reduce the amount of uncertainty in your forecast. Another look at the Stats Table points the way.
The yellow cells show our honest opinion about the maximum and minimum values that we expect for each of our key assumptions. Therefore, if you want to reduce the uncertainty in your forecast, you’ll need to find realistic ways to narrow the distance between each set of those Max and Min values.
But to do that, unfortunately, you’ll need to spend a lot more time and effort to understand what has influenced performance for those items in the past and what’s likely to affect performance in the future…and to what degree.
Good luck with that!