An Excel Tutorial
An Introduction to Excel's Normal Distribution Functions
Excel provides several worksheet functions for working with normal distributions or 'bellshaped curves.' This introduction to Excel's Normal Distribution functions offers help for the statistically challenged.
When a visitor asked me how to generate a
random number from a Normal
distribution she set me to thinking about doing statistics
with Excel.
Many of us were introduced to statistics in school and then
forgot what little we learned...often within seconds of the
final exam. Also, when we took statistics, many of us weren't
taught how to use it with Excel. This is unfortunate, because in
business it's often useful to have some grasp of that topic.
For all these reasons, I thought it would be worthwhile to
briefly explore normal  or "bellshaped"  curves in Excel.
This is a commonly used area of statistics, and one for which
Excel provides several useful functions.
One interesting thing about the normal curve is that it
occurs frequently in many different settings:
 The height of each gender in a population is normally
distributed.
 The measure of LDL cholesterol is normally distributed
in adults.
 The width of stripes on a zebra is said to be normally
distributed.
 Most measurement errors are assumed to be normally
distributed.
 Many SixSigma calculations assume normal distribution.
 Etc.
As a final example, here's a surprising occurrence of the
normal curve: Take any population, whether it's normally
distributed or not. Randomly select at least 30 members from
that population, measure them for some characteristic, and then
find the average of those measures. That average is one data
point. Return the samples, select another random sample of the
same number, and find the average of their measures. Do
the same again and again. The Central Limit Theorem says that
those averages tend to have a normal distribution.
Normal distributions are all around us. Therefore, as
painlessly as possible, let's take a closer look at how we work
with them using Excel.
Brief Definitions
We
need to get some brief definitions out of the way so that we can
start to describe data using Excel functions.
From cholesterol to zebra stripes, the normal probability
distribution describes the proportion of a population having
a specific range of values for an attribute. Most members have
amounts that are near the average; some have amounts that are
farther away from the average; and some have amounts extremely
distant from the average.
For example, a population could be all the stripes on all the
zebras in the world. The normal curve would show the proportion
of stripes that have various widths.
The standard deviation of a sample is a measure of the
spread of the sample from its mean. (We're taking about many
items in a "sample," of course, not just a single item.) In a
normal distribution, about 68% of a sample is within one
standard deviation of the mean. About 95% is within two standard
deviations. And about 99.7% is within three standard deviations.
The numbers in the figure above mark standard deviations from
the mean.
The z value is the distance between a value and the
mean in terms of standard deviations. In the figure above, each
number is a z value.
New Excel's Improved Statistics Functions
Beginning with Excel 2007, Microsoft updated many of their
statistics functions. To provide backward compatibility, they
changed the names of their updated functions by adding periods
within the name.
I show both versions in this article, but Microsoft
recommends that you use the new version if you use New Excel.
Calculating or Estimating the Standard Deviation
Several of the following functions require a value for the
standard deviation. There are at least two ways to find that
value.
First, if you have a sample of the data, you can estimate the
standard deviation from the sample using one of these formulas:
=STDEV.S(range_of_values)
=STDEV(range_of_values)
On the other hand, if you're working with the entire
population, you calculate the standard deviation using:
=STDEV.P(range_of_values)
However, if you're working with rough estimates, you must
take a different approach, because you don't have actual data to
support your estimates.
In this case, first calculate the range. This is the
smallest likely value subtracted from the largest likely value.
By likely, let's use the assumption that all possible
values will be within that range about 95% of the time.
Remember that about 95% of a sample is within two standard
deviations on each side of the mean. (This is a total of four
standard deviations, of course.) Therefore, if we divide the
range by four we should have the approximate standard deviation.
Merely dividing the range by four might seem to be a slipshod
approach. But consider the way this calculation often is used.
Suppose you're forecasting sales for next year. You think
sales will be about 1,000, but the number could be as high as
1,200 and as low as 800. With that information, you can put a
normal curve around your estimated sales and begin to generate a
variety of forecasts for profits and cash flow.
To emphasize, these numbers are only your best estimates.
Therefore, using an estimated standard deviation doesn't seem
quite as sloppy as it otherwise might.
Based on these estimates, your mean sales will be about 1,000
and your standard deviation will be about (1200  800) / 4 =
100. With this information, you can use the following functions
to perform many of the calculations you will need in your
analysis.
NORM.DIST(x, mean, standard_dev ,cumulative)
NORMDIST(x, mean, standard_dev, cumulative)
NORM.DIST
gives the probability that a number falls at or below a given
value of a normal distribution.
 x  The value you want to test.
 mean  The average value of the distribution.
 standard_dev  The standard deviation of the
distribution.
 cumulative  If FALSE or zero, returns the probability
that x will occur; if TRUE or nonzero, returns the
probability that the value will be less than or equal to x.
Example: The distribution of heights of American women aged
18 to 24 is approximately normally distributed with a mean of
65.5 inches (166.37 cm) and a standard deviation of 2.5 inches
(6.35 cm). What percentage of these women is taller than 5' 8",
that is, 68 inches (172.72 cm)?
The percentage of women less than or equal to 68 inches is:
=NORM.DIST(68, 65.5, 2.5, TRUE) = 84.13%
=NORMDIST(68, 65.5, 2.5, TRUE) = 84.13%
Therefore, the percentage of women taller than 68 inches is 1
 84.13%, or approximately 15.87%. This value is represented by
the shaded area in the chart above.
NORM.S.DIST(z, cumulative)
NORMSDIST(z)
NORM.S.DIST translates the number of standard deviations (z)
into cumulative probabilities.
 z  The value for which you want the distribution.
 cumulative  Cumulative is a logical value that
determines the form of the function. If cumulative is TRUE,
NORM.S.DIST returns the cumulative distribution function; if
FALSE, it returns the probability mass function.
(The probability mass function, PMF, gives the probability
that a discrete  that is, noncontinuous  random variable is
exactly equal to some value.)
To illustrate:
=NORM.S.DIST(1, TRUE) = 84.13%
=NORMSDIST(1) = 84.13%
=NORM.S.DIST(1, TRUE) = 15.87%
=NORMSDIST(1) = 15.87%
Therefore, the probability of a value being within one
standard deviation of the mean is the difference between these
values, or 68.27%. This range is represented by the shaded area
of the chart.
NORM.INV(probability, mean, standard_dev)
NORMINV(probability, mean, standard_dev)
NORM.INV
is the inverse of the NORM.DIST function. It calculates the x
variable given a probability.
To illustrate, consider the heights of the American women
used in the illustration of the NORM.DIST function above. How
tall would a woman need to be if she wanted to be among the
tallest 75% of American women?
Using NORM.INV, she would learn that she needs to be at least
63.81 inches tall, as shown by this formula:
=NORM.INV(0.25, 65.5, 2.5) = 63.81 inches
=NORMINV(0.25, 65.5, 2.5) = 63.81 inches
The figure shows the area represented by the 25% of the
American women who are shorter than this height.
NORM.S.INV(probability)
NORMSINV(probability)
NORM.S.INV
is the inverse of the NORM.S.DIST function. Given the
probability that a variable is within a certain distance of the
mean, it finds the z value.
To illustrate, suppose you care about the half of the sample
that's closest to the mean. That is, you want the z values that
mark the boundary that is 25% less than the mean and 25% more
than the mean.
The following two formulas provide those boundaries of .674
and +.674, as illustrated by the figure.
=NORM.S.INV(0.25)
=NORM.S.INV(0.75)
=NORMSINV(0.25)
=NORMSINV(0.75)
STANDARDIZE(x, mean, standard_dev)
STANDARDIZE
returns the z value for a specified value, mean, and standard
deviation.
To illustrate, in the NORM.INV example above, we found that a
woman would need to be at least 63.81 inches tall to avoid the
bottom 25% of the population, by height. The STANDARDIZE
function tells us that the z value for 63.81 inches is:
=STANDARDIZE(63.81, 65.5, 2.5) = 0.6745
We can check this number by using the NORM.S.DIST function:
=NORM.S.DIST(0.6745, TRUE) = 25%
=NORMSDIST(0.6745) = 25%
That is, a z value of .6745 has a probability of 25%.
Two Ways to Calculate a Random
Number from a Normal Distribution
Remember that the NORM.INV function returns a value given a
probability:
NORM.INV(probability, mean, standard_dev)
NORMINV(probability, mean, standard_dev)
Also, remember that the RAND() function returns a random
number between 0 and 1. That is, RAND() generates random
probabilities. Therefore, you can use the NORM.INV function to
calculate a random number from a normal distribution, using this
formula:
=NORM.INV(RAND(), mean, standard_dev)
=NORMINV(RAND(), mean, standard_dev)
However, if you use Classic Excel with a large number of
standard deviations, you might want to use a different approach.
Nearly ten years ago, Jerry W. Lewis  a former Excel MVP and a
professional statistician  offered a stern warning. Prior to
Excel XP (2002), he wrote, NORMINV "produced a very unnormal
fraction of values around 6 million standard deviations from the
mean."
Instead, Jerry recommended the BoxMuller method described
here:
http://mathworld.wolfram.com/BoxMullerTransformation.html
This method uses this formula to calculate a random number from
a normal distribution:
=SQRT(2*LN(RAND()))*SIN(2*PI()*RAND())
The BoxMuller method is mathematically exact, Jerry writes,
if implemented with a perfect uniform random number generator
and infinite precision.
A Note About The Charts
I created all of the figures for this article in Excel. If
you would like to know how, see
How to Create Normal Curves with Shaded Areas in New Excel.
References
I must have at least 15 statistics books gathering dust on
bookshelves in the basement. Even so, these two books offered
clear explanations that you might find useful:
Statistical Analysis with Excel for Dummies, by Joseph
Schmuller, PhD
Excel Data Analysis for Dummies, by Stephen L. Nelson, MBA,
CPA
