Because the Texas and California governors have been bickering over the Texan’s attempt to poach California employers, I got curious about the distribution of the unemployment rates in the two states.
So I downloaded data from the Federal Reserve Bank of St. Louis. Then I summarize the data and created this chart, technically a histogram, which I originally had set up as swipe file.
The source data consists of 361 unemployment rates for each state, extending from December, 1982, through December, 2012.
The chart shows that the unemployment rates in Texas are clumped in the upper part of the chart. That is, over the past 30 years, Texas has tended to maintain a lower unemployment rate than California.
You could apply the logic of this chart to calculate frequency distributions for your own data. The distributions would compare the number of instances for nearly any criteria you can count: orders by product by month for two sales people, failure rates by month for two products, the number of sick days taken by age by gender in the past year, and so on.
Using Excel’s Frequency Function
The obvious way to generate these frequency distributions is to use Excel’s FREQUENCY function, which has this syntax:
In this figure, for example, the data_array is the range B3:B9, and the bins_array is the range D4:D6.
The FREQUENCY function has three unusual characteristics.
First, you must enter the formulas as a multi-cell array. Here for example, I selected the range E3:E6, typed the formula…
…held down the Ctrl+Shift keys, then pressed Enter to enter the formula as an array.
Later, if you decide to change the number of bins, you first must change the array-entered function into a normal function, add or delete the rows you need, then re-enter the array. To change the array into a normal function, select the cells in the array (here, E3:E6), press F2 to edit the function, then hold down Ctrl and press Enter to enter the formula in each cell as a normal function.
Second, you must enter one more FREQUENCY formula than the number of bins. Here, for example, I entered the FREQUENCY formula in a four-cell range to reference the three Bins cells.
The extra cell, cell E3 in this case, contains the number of values less than or equal to the lowest bin number. So in the figure, the value of 3 in cell E3, is the count of the numbers 10, 16, and 20.
Third, the FREQUENCY function returns values that are greater than one bin and less than or equal to the next-higher bin. Notice, for example, that although the Data column contains the value of 20, the Freq column shows zero for the bin value of 20.
In short, using Excel’s FREQUENCY function seems a little strange. There’s got to be a better way.
Using the COUNTIFS Function to Calculate Frequency Distributions
With the addition of the COUNTIFS function in Excel 2007, we now have an easier and more-powerful way to generate frequency distributions.
Here, we use three formulas to generate the same results as with the FREQUENCY function.
The formula at the top of the frequency range is:
The formula at the bottom of the frequency range is:
The remaining formulas are like this:
Copy this formula downwards as necessary.
So why is it easier and more powerful to use three formulas rather than one? There are four reasons.
First, we eliminate the multi-cell array formula, which allows us to add and delete rows easily.
Second, we make the logic obvious for each cell in the Frequency range. In cell E3, for example, we don’t have to figure out what the number 3 represents; the formula shows us what’s happening.
Third, if we don’t care about values outside the range of the bins, we don’t have to show them. That is, if we don’t care about values less than 20, we could erase the formula in cell E3.
Fourth, we can change how the frequency distribution is calculated. For example, it’s confusing that although our data contains the value 20, our frequency distribution shows zero results for a bin value of 20.
Therefore, we could change our distribution so that it shows values greater than or equal to the bin values, as shown in this figure. This makes the frequency values easier to understand.
That is, we now can say that there are two values less than 20, one value in the 20s, two values in the 30s, and two values in the fourties and above.
The relevant formulas are:
Because I used this COUNTIFS method in the workbook, the unemployment rates are grouped in bins of 3.0–3.9, 4.0–4.9, and so on.
On the other hand, if I had used the FREQUENCY function to summarize the data, the rates would have been grouped in bins of 2.1–3.0, 3.1–4.0, and so on.