Walt captures blood-pressure readings and wants to find how often the readings fall into various ranges of values.
This is a common need for a wide variety of data.
- In sales, you might want to find the distribution of the number of products purchased per order.
- In accounting, you might want to find the distribution of the ages of all outstanding invoices.
- In manufacturing operations, you might want to find the distribution of production times for a particular assembly.
This figure shows my simplified view of Walt’s data. As I usually do, I added the gray borders shown in rows 2 and 13, and I anchored the formulas that look at this data in those empty rows. By doing so, I always can insert new rows of data between the shaded rows.
I also created my own artificial data in column C. To do so, I entered this formula in the cell shown and copied it down the column of data:
C3: =INT(RAND()*70+50)
This formula returns a random integer between 50 and 119.
This table shows the frequency distribution for the data above. That is, there is one item that is 60 or less; three items from 81 through 85, and so on.
I didn’t include border rows in this figure, because the FREQUENCY function, which I’ll describe shortly, needs to reference only the actual Bin data. This function would give different answers if it were to reference empty border rows.
After I set up this data I assigned range names to the Value and Bin ranges. To do so in any version of Excel:
- Select the range C1:C13; press Ctrl+Shift+F3; make sure that only Top row is checked; then choose OK.
- Select the range C16:C24 and follow the same approach.
I assigned these range names for two reasons. First, replacing cell references with names makes my formulas much easier to understand. Second, using range names reduces the chance that I’ll reference an incorrect range in a formula.
With these preparations out of the way, I thought of five ways to calculate the counts in column D. Although Walt only needed to use one of these, it’s useful to have the other methods in the back of your mind.
1. Excel’s FREQUENCY Function, with the Standard Method
Take a look at Excel’s help topic for the FREQUENCY function. To do so, type…
=frequency(
…into a cell, and then press Ctrl+A to launch the Function Arguments dialog. The bottom-left corner of this dialog has a link titled Help on this function. When you click that link, Excel launches the help topic about the function.
The FREQUENCY help topic explains that the function has this syntax:
FREQUENCY(data_array,bins_array)
Here, data_array is the data in the range C2:C13, and the bins_array is the range C18:C25. The topic explains that because FREQUENCY returns an array, you would need to select the range D18:D25, type in…
=FREQUENCY(Value,Bin)
…and then press Ctrl+Shift+Enter, to enter the formula as a multi-cell array.
The problem with this approach is that you can’t modify the number of bins in your distribution easily. To illustrate, if you select row 21 and then try to insert (Ctrl++) or delete (Ctrl+-) a row, Excel launches an error dialog that says, “You can not change part of an array.”
However, we can use a different approach with the FREQUENCY function.
2. Excel’s FREQUENCY Function, with INDEX
We know that the FREQUENCY function returns an array. But consider the syntax for the INDEX function:
=INDEX(array,row_num,column_num)
That is, INDEX accepts arrays. To illustrate, this formula…
=INDEX(FREQUENCY(Value,Bin),6)
…entered normally (not array-entered) returns the value 3. This corresponds with the value in cell D23 in the results table repeated here.
Therefore, we can enter this formula in the cell shown…
D17: =INDEX(FREQUENCY(Value, Bin),$A17)
…and copy the formula down the column as needed. (You enter the formula in one line, of course.)
The value of this approach is that if you want to adjust your distribution of bins by inserting or deleting rows, you can do so easily.
3. Excel’s SUM-IF Array Formulas
After I first set up the counts using method 1, I wanted to find a quick way to check the results. The SUM-IF approach came immediately to mind.
To use this approach in the figure above, array-enter these formulas in the cells shown:
D17: =SUM(IF((Value<>””)*(Value<=$C17),1,0))
The first test checks whether the source Value contains an empty cell. For each cell that’s not empty, the test returns TRUE, otherwise it returns FALSE. The second test returns TRUE when values in the source data are less than or equal to the amount in cell C17. Multiplying the tests returns an array of TRUEs and FALSEs that is TRUE whenever both test results are TRUE.
The rest of the IF function returns an array of 1s and 0s; and the SUM function returns the sum of these results.
Be sure to array-enter this formula. That is, with only cell D17 selected, type in the formula shown and then array-enter the formula by pressing Ctrl+Shift+Enter. Follow the same approach for the two formulas that follow.
D18: =SUM(IF((Value>$C17)*(Value<=$C18),1,0))
When the two test results are multiplied in this array formula, we get an array with TRUE whenever numbers in the VALUE range need to be counted, the IF replaces the TRUE values with 1s, and the SUM returns the sum of those 1s. After you array-enter this formula, copy it to the range D19 through D23.
D24: =SUM(IF((Value>$C23),1,0))
Array enter this simple formula.
4. Excel’s SUMPRODUCT Function
Excel’s SUMPRODUCT function works much like the SUM-IF approach. But it doesn’t need to be array entered when you use the syntax I’ll demonstrate. (SUMPRODUCT does need to be array-entered if you use other syntax, however.)
To use this approach, enter the following formulas in the cells shown:
D17: =SUMPRODUCT((Value>0)*(Value<=$C17))
D18: =SUMPRODUCT((Value>$C17)*(Value<=$C18))
D24: =SUMPRODUCT((Value>$C23)+0)
At this point only the formula in cell D24 needs explanation. The test in this formula returns TRUE and FALSE, not 1 and 0. Before the results can be summed, they need to be convereted to 1 and 0. By adding zero to those results, Excel converts TRUE and FALSE to 1 and 0, as required. We would get the same result if we multiplied the test by 1, using this formula:
D24: =SUMPRODUCT((Value>$C23)*1)
As with the SUM-IF approach, the first and the last formulas need to be different from the interior formulas. Copy cell D18 down the column to cell D23.
5. Excel’s COUNTIFS Function
New Excel (Excel 2007) adds a fifth way to find the same results, the COUNTIFS function.
Classic Excel offers the SUMIF and COUNTIF functions, which accept only one test criteria. But you should use the SUMIFS and COUNTIFS functions instead, because accept any number of criteria. They use this syntax:
=COUNTIFS(criteria_range1,criteria1,criteria_range2,criteria2,…)
=SUMIFS(sum_range,criteria_range1,criteria1,criteria_range2,criteria2,…)
To use the COUNTIFS function in New Excel, enter these formulas for the cells shown:
D17: =COUNTIFS(Value,”<>0″,Value,”<=”&Bin)
D18: =COUNTIFS(Value,”<=”&Bin,Value,”>”&$C17)
D24: =COUNTIFS(Value,”>”&$C23)
As before, copy cell D18 down the column through cell D23.
Which Version Should You Use?
There’s no good answer to this question. Each version has advantages over the other versions. Each version can be used in circumstances where the other versions probably wouldn’t be recommended.
However, if you’re somewhat familiar with them all, you’ll recognize what’s happening when you read workbooks created by others. And you’ll have alternatives when you’re not able to make one approach work.