(Note: I wrote this before Microsoft introduced Excel Tables or SUMIFS. This post is scheduled for an update.)
Excel users often need to summarize data in Excel formulas. Unfortunately, few Excel users understand the real power that Excel offers for doing this.
One of the problems that Excel users have is that the best-known way to summarize data probably is the least powerful. And the two least-known approaches probably are the most powerful.
The Problems With Database Functions
The best-known way to summarize spreadsheet data probably is to use database functions. At the very least, these functions have been used for the longest time.
When Lotus 1-2-3 was introduced in 1983, it offered five such functions: @DMAX, @DMIN, @DSTD, @DSUM, and @DVAR. (1-2-3’s spreadsheet functions all began with the “@” symbol.)
When Excel was introduced five years later, it offered 1-2-3’s database functions and more. Today, Excel offers a dozen database functions.
Although the number of database functions has grown over time, their power has not. Excel database functions still are linked to their Lotus roots in one key regard: We Excel users can’t include summarization criteria within our spreadsheet formulas. Instead, our formulas must reference a criteria range in our spreadsheets.
This limitation severely limits our power to summarize Excel data with these functions. It also makes the functions more difficult to use. Therefore, I’ll ignore Excel database functions like DSUM and DMAX in this article.
Instead, many of the following examples will use both array formulas and spreadsheet functions that are new to Excel 2007.
Entering Array Formulas
From its very early days, Excel has provided this powerful feature for summarizing data. Unfortunately, most Excel users either avoid array formulas or aren’t even aware that this technology exists.
In Excel, you normally enter a formula by typing it into the formula bar and then pressing the Enter key. You array enter a formula by typing it into your formula bar, holding down the Ctrl and Shift keys, and then pressing Enter.
After you array-enter a formula, the formula bar shows that it begins and ends with braces, like this:
{=A1}
To be clear, you do not type in those “{” and “}” characters yourself. Instead, Excel displays them after you array-enter a formula by pressing Ctrl+Shift+Enter.
In the following examples, I’ll use those braces to indicate the formulas that you must array-enter.
Set Up Your Data
This figure shows a database of sales data, data that I’ll use in this article. To set up the database yourself, enter the data as shown.
The shaded areas in rows 3 and 17 mark the boundaries of your data. Using those border rows is optional, but useful.
The date values are for 2007. To format them, select the range A3:A17; in the Home tab, choose the Number group’s dialog launcher; in the Number tab of the Format Cells dialog, choose the Custom category; enter mmm-d for the Type; then choose OK.
(You also can choose Ctrl+1 to launch the Format Cells dialog.)
The following formulas make extensive use of range names. To assign these names, select the range A2:E17; choose Formulas, Defined Names, Create from Selection; in the Create Names dialog make sure that only Top Row is checked; then choose OK.
(You also can choose Ctrl+Shift+F3 to launch the Create Names dialog.)
Now, let’s summarize this data…
Using the Data As the Criteria
At times you’ll want to apply your criteria to the data you’re summarizing.
If you need to apply only one criteria, you can do this with arrays and with the SUMIF, COUNTIF, and AVERAGEIF functions:
=SUMIF(range,criteria,[sum_range])
=COUNTIF(range,criteria)
=AVERAGEIF(range,criteria,[average_range]))
The AVERAGEIF function is new with Excel 2007.
The brackets indicate that the ranges are optional. If these arguments are missing, the SUMIF and AVERAGEIF functions work with the range area.
Below, the left cell shows the result of the calculation and the right cell shows one or more alternative formulas.
15 | =SUMIF(Units,”=5″) =SUMIF(Units,”=5″, Units) |
Return the total number of Units sold for sales of exactly 5 units. |
3 | =COUNTIF(Units,”=5″) |
Return the number of times we’ve sold exactly 5 units. |
5 | =AVERAGEIF(Units,”=5″) |
Return the average units sold for sales of exactly 5 units. |
Above, the criteria are fully contained with the formulas. But suppose you enter the value 5 in cell J3. You then could use these formulas:
15 | =SUMIF(Units,”=”&J3) =SUMIF(Units,”=”&J3, Units) |
Return the total number of Units sold for sales of exactly the number of units entered in cell J3. |
3 | =COUNTIF(Units,”=”&J3) |
Return the number of orders for exactly the number of units entered into cell J3. |
5 | =AVERAGEIF(Units,”=”&J3) |
Return the average of the orders when sales were exactly the number of units entered into cell J3. |
You also could use array formulas to return the same results:
15 | {=SUM(IF(Units=5,Units,0))} {=SUM(IF(Units=J3,Units,0))} |
Return the total number of Units sold for sales of the specified number of units. |
3 | {=SUM(IF(Units=5,1,0))} {=SUM(IF(Units=J3,1,0))} |
Return the number of times we’ve sold the specified number of units. |
Notice that the two array formulas above sum ones and zeros rather than using a count function.
5 | {=AVERAGE(IF(Units=5,Units,””))} {=AVERAGE(IF(Units=J3,Units,””))} |
Return the average number of units sold for sales of the specified number of units. |
Notice that the two formulas above use null strings (“”) rather than zero. If you were to use zero, the AVERAGE function would include those zeros in the average.
Using Non-Data and a Single Criteria
Until now, the examples have summarized data using criteria that applied to the data itself.
More frequently, however, we summarize data by applying our criteria to related columns of information that describe the data.
Here, for example, we find the number of units of Hats that have been sold, the number of hat orders, and the average order:
41 | =SUMIF(Prods,”=Hats”, Units) |
7 | =COUNTIF(Prods,”=Hats”) |
5.9 | =AVERAGEIF(Prods,”=Hats”) |
We could use array formulas to return the same results:
41 | {=SUM(IF(Prods=”Hats”,Units,0))} |
7 | {=SUM(IF(Prods=”Hats”,1,0))} |
5.9 | {=AVERAGE(IF(Prods=”Hats”,Units,””))} |
As before, “Hats” could be in a cell that the formulas could reference.
And again, notice that to count the number of hat sales, we we SUM an array of ones and zeros rather than using a COUNT function. And we also use a null string (“”) rather than zeros for the AVERAGE-IF array formula.
Introducing SUMIFS, COUNTIFS, and AVERAGEIFS
So far, the examples have used only one criteria. But frequently, Excel users need to summarize data using multiple criteria. Excel 2007 introduced three new summary functions that allow for multiple criteria:
=SUMIFS(sum_range,criteria_range,criteria,…)
=COUNTIFS(criteria_range,criteria,…)
=AVERAGEIFS(average_range,criteria_range,criteria,…)
These functions work like SUMIF, COUNTIF, and AVERAGEIF, but they allow you to add any number of criteria. Here are some examples:
15 | =SUMIFS(Units,Clients,”=Smith”,Colors,”=Blue”) |
Return the total number of blue products sold to Smith. |
4 | =COUNTIFS(Clients,”=Smith”,Colors,”=Blue”) |
Return the number of times we’ve sold blue products to Smith. |
3.8 | =AVERAGEIFS(Units,Clients,”=Smith”,Colors,”=Blue”) |
Return the average number of blue products sold to Smith. |
Array formulas also allow you to use multiple criteria.
Introducing Array Formulas With Multiple Criteria
Before you can use arrays successfully, you need to understand how they perform their calculations.
Array-enter the following formula in some cell. Then, in the formula bar, select the highlighted part of the formula:
{=SUM(IF(Prods=”Hats”,Units,0))}
With this piece of the formula selected, press the F9 key. When you do so, the formula changes to:
=SUM(IF({FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;
FALSE;FALSE;FALSE;FALSE;FALSE;FALSE},Units,0))
(You can use a similar approach to calculate one or more sections of any formula in your formula bar. After checking your formula, press Esc to return to your original formula. If you accidentally press Enter rather than Esc, just click on Undo in QAT.)
Now select Units in your formula bar and press F9 again. After you do so, the formula expands to:
=SUM(IF({FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;
FALSE;FALSE;FALSE;FALSE;FALSE;FALSE},{0;7;5;4;7;4;8;6;1;1;5;2;5;3;0},0))
Including the gray borders in the table, the database shown above and below has 15 rows of data. As a consequence, the green section of the formula has 15 TRUE and FALSE results, and the brown section has 15 numbers. Each TRUE in the green section marks where it’s true that the product equals “Hats”.
Press Esc to dismiss your changes in the formula bar. Now select the highlighted part of this formula:
{=SUM(IF(Prods=”Hats”,Units,0))}
Again, press F9. When you do so your formula bar displays:
=SUM({0;7;5;4;7;4;8;6;0;0;0;0;0;0;0})
Now compare the final orange section with the final green and brown sections above.
Notice that whenever the green section is TRUE, the orange section contains the corresponding value from the brown section; otherwise, the orange section contains zero.
From another perspective, notice that whenever the product is Hats in this figure, its number of units appears in the orange section above.
Finally, the SUM function returns the sum of the numbers in the orange section.
In general, this is how all array formulas work. They apply the logic of your array formula to each corresponding value in the multi-cell ranges that your formula references.
Don’t worry if this idea isn’t too clear yet, the following examples should help to give you a better understanding of how this works.
Using Formulas With Multiple Criteria
Let’s take a simple example of how array functions can summarize multiple criteria:
15 | {=SUM(IF((Clients=”Smith”)*(Colors=”Blue”),Units,0))} |
Return the total number of blue products sold to Smith. |
4 | {=SUM(IF((Clients=”Smith”)*(Colors=”Blue”),1,0))} |
Return the number of times we’ve sold blue products to Smith. |
From the previous discussion you already know that the Excel formula converts both Clients=”Smith” and Colors=”Blue” into a series of TRUE and FALSE values. But the gray section of the formula above multiplies these two arrays. You can see the results of this process if you calculate the gray section in your formula bar.
That is, after you array-enter the formula by pressing Ctrl+Shift+Enter, select the gray section shown above and press F9. When you do so, your formula bar will show this result:
=SUM(IF({0;0;0;0;0;1;0;1;0;0;0;1;0;1;0},Units,0))
Here, the array displays 1 only where the client equals “Smith” and where the color equals “Blue”. Otherwise, the formula returns 0. The following display shows how this works.
(Clients=”Smith”) | {FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;FALSE;TRUE; FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;FALSE} |
(Colors=”Blue”) | {FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE; FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;FALSE} |
(Clients=”Smith”) *(Colors=”Blue”) |
{0;0;0;0;0;1;0;1; 0;0;0;1;0;1;0} |
Here, I’ve highlighted the four results in each array where the TRUEs are in sync. These results correspond with the four values of 1 found in the third row of the table.
Notice in the original array formula that each logical test is enclosed in parentheses. These parentheses are required. That is, within a multi-criteria array formula:
This works: | (Clients=”Smith”)*(Colors=”Blue”) |
This fails: | Clients=”Smith”*Colors=”Blue” |
Here are some more examples of array formulas in action:
7 | {=SUM(IF((Prods=”Coats”)*(Colors=”White”),Units,0))} |
Total number of white coats sold |
6 | =SUM(IF((Prods=”Coats”)*(Colors=”White”)* (Clients=”Jones”),Units,0)) |
Total number of white coats sold to Jones. |
6 | =SUM(IF((Prods=”Coats”)*(Colors=”White”)* (Clients=”Jones”)*(MONTH(Dates)=12),Units,0)) |
Total number of white coats sold to Jones in any December. |
Notice in the last example above that you can perform operations on values in a column, and then compare the results to other values. Specifically, the green section of the formula finds the month value of each date and then compares it to 12.
However, this particular date test isn’t very practical, because we typically need the month and year in a test like this…not merely the month. Here is a more practical date test:
19 | =SUM(IF((Dates>=DATE(2007,12,1))*(Dates<=DATE(2008,1,0))* (Clients=”Jones”),Units,0)) |
Total number of units sold to Jones in December, 2007. |
(Notice that DATE(2008,1,0) returns the last day of December, 2007.)
In actual practice, you probably wouldn’t include the DATE function in your formula, for two reasons.
First, the formula forces Excel to calculate the same date value for each cell in the array. This slows calculation slightly. Second, you would need to change the formula to change the dates you want to report.
Instead, you probably would enter the date functions in two cells of your spreadsheet. You could name the first cell FirstDate and the second cell LastDate. Your array formula then could be:
19 | =SUM(IF((Dates>=FirstDate)*(Dates<=LastDate)* (Clients=”Jones”),Units,0)) |
Total number of units sold to Jones within the specified dates. |
Using Array Formulas in a Report
So far, I haven’t demonstrated the real value of using array formulas with multiple criteria: They allow you to set up standard reports that work much like PivotTables, but with more flexibility in many ways.
Here, for example, the actual report is in the range A7:D11. The area at the top of the page contains setup values and side calculations.
The two values highlighted in yellow act like PivotTable page fields. You can change their values to change the report.
Cell B5 checks the report for errors. If the grand total in cell D11 doesn’t match the total of all sales for the specified client and month, the Error value turns TRUE, and the report title changes to “ERROR!!”.
The key formulas for this report are shown with their cell addresses below. These formulas rely on names assigned to the range B1:B5. To assign these names, select the range A1:B5; choose Formulas, Defined Names, Create from Selection; in the Create Names dialog make sure that only Left Column is checked; then choose OK.
(You also can choose Ctrl+Shift+F3 to launch the Create Names dialog.)
Here are the key values and formulas:
B1: 12/1/2007
B2: Jones
Enter these values as shown. To format the date in cell B1, select that cell; in the Home tab, choose the Number group’s dialog launcher; in the Number tab of the Format Cells dialog enter this format string as the Type value: mmm-yy; then choose OK.
(You also can choose Ctrl+1 to launch the Format Cells dialog.)
J3: =DATE(YEAR(ReportMonth),MONTH(ReportMonth),1)
B4: =DATE(YEAR(ReportMonth),MONTH(ReportMonth)+1,0)
The array formulas that follow use the values returned by these standard formulas, as I described above.
B5: {=SUM(IF((Dates>=FirstDate)*(Dates<=LastDate)*
(Clients=Client),Units,0))<>$D$11}
This array formula finds total unit sales for the specified month and client, and then compares that value to the grand total in cell D11. If the values don’t match — if there’s an error — the formula returns TRUE; otherwise, it returns FALSE.
A7: =IF(Error,”ERROR!!”,”Unit Sales, “&Client)
Normally, this formula displays the current client name with the report title. But if there’s an error, this formula returns “ERROR!!”.
D7: =TEXT(ReportMonth,”mmmm yyyy”)
This formula returns the date specified in cell B1. It uses the TEXT function to format that date as shown in the report. The cell is right-justified.
B9: {=SUM(IF((Dates>=FirstDate)*(Dates<=LastDate)*
(Clients=Client)*(Prods=$A9)*(Colors=B$8),Units,0))}
This array formula uses five criteria to return the correct value to the cell.
When you copy this formula to the rest of the report, you must paste it in two steps. This is because Excel doesn’t allow a cell with an array to be copied to a range that includes the copied cell.
Therefore, copy cell B9 and paste it to cell D9. Then copy the range B9:C9 and paste it to the range B10:C10.
D9: =SUM(B9:C9)
B11: =SUM(B9:B10)
These formulas provide the row and column totals. Copy them down or to the right as needed.
Other Uses of Multi-Criteria Array Formulas
In old Excel, array formulas became really useful when you needed to use multiple criteria. But because new Excel offers three multi-criteria summary functions, you have a choice of whether to use the new functions or array functions.
But even if you choose to use the new functions, you should keep the power of arrays in the back of your mind. This is because you can use array functions for other types of calculations. Here are some examples:
6 | {=MAX(IF((Clients=”Smith”)*(Colors=”Blue”),Units,””))} |
2 | {=MIN(IF((Clients=”Smith”)*(Colors=”Blue”),Units,””))} |
The largest and the smallest number of blue products sold to Smith. |
5 | {=LARGE(IF((Clients=”Jones”)*(Colors=”Blue”),Units,””),2)} |
The second largest number of blue products sold to Jones. |
7 | {=SMALL(IF((Clients=”Jones”)*(Prods=”Hats”),Units,””),2)} |
The second smallest number of hats sold to Jones. |
6 | {=MEDIAN(IF((Colors=”White”)*(Prods=”Hats”),Units,””))} |
The median number of white hats sold. |
Potential Problems with Arrays
Array formulas are very powerful. But they can have several problems:
- Only a small number of Excel users use them regularly. So if you’ll be using them for the first time, you might not be able to find a co-worker who can help you with them.
- You can’t copy a cell with an array and then paste it to a range that includes the cell. Therefore, if you want to copy the cell to a range with several rows and columns you typically will do so in two steps. First, you copy the top-left cell to the rest of the row that needs an array formula. Second, you copy that row of formulas downwards as needed.
- Array formulas do a lot of work. Depending on the number of rows of data and on the speed of your computer, arrays can slow your workbook’s calculation to a noticeable degree. However, they are so powerful that you probably won’t mind if they require a few extra seconds to calculate.
- If you forget to array-enter an array formula, you’ll probably get a #VALUE! error. And in rare cases, you might get an incorrect answer. Therefore, it’s always a good idea to set up an error test as I explained above. (Error testing is a good idea in any case, of course.)
Even with these potential problems, using arrays to summarize your data can reduce Spreadsheet Hell. This is because much of Spreadsheet Hell involves sorting and summarizing data. By using formulas like the ones shown here to summarize your data, you can begin to reduce the more time-consuming chores associated with Excel reporting.