An Excel 2003 Tutorial
How to Summarize Data for Reporting and Analysis in Excel 2003
Excel offers several ways to summarize data quickly and easily. Here are the most powerful and flexible approaches, which
include using Excel arrays.
(Follow this link for the
version for Excel 2007 and above.)
Excel users often need to summarize data in Excel formulas.
Unfortunately, few Excel users understand the real power that Excel
offers for doing this.
One
of the problems Excel users have is that the bestknown way to
summarize data probably is the least powerful...and the leastused
approach probably is the most powerful.
The Problems With Database Functions
The bestknown way to summarize spreadsheet data probably is to use
database functions. At the very least, these functions have been used
for the longest time.
When Lotus 123 was introduced in 1983, it offered five such functions: @DMAX, @DMIN, @DSTD, @DSUM, and @DVAR. (123's spreadsheet
functions all began with the "@" symbol.)
When Excel was introduced five
years later, it offered 123's database functions and more. Today, Excel offers a
dozen database functions.
Although the number of database functions has grown over time, their
power has not. Excel database functions still are linked to their Lotus
roots in one key regard: We Excel users can't include summarization criteria within our
spreadsheet formulas. Instead, our formulas must reference a criteria
range in our spreadsheets.
This limitation severely limits our power to summarize Excel data
with these functions. It also makes the functions more difficult to use.
Therefore, I'll ignore Excel database functions like DSUM and DMAX in
this article.
Instead, many of the following examples will use array formulas.
Entering Array Formulas
From its very early days, Excel has provided this powerful feature
for summarizing data. Unfortunately, most Excel users either avoid
array formulas or aren't even aware that this technology exists.
In Excel, you normally enter a formula by typing it into the formula
bar and then pressing the Enter key. You array enter a formula by
typing it into your formula bar, holding down the Ctrl and Shift keys,
and then pressing Enter.
After you arrayenter a formula, the formula bar shows that it begins
and ends with braces, like this:
{=A1}
To be clear, you do not type in those "{" and "}" characters
yourself. Instead, Excel displays them after you arrayenter a formula
by pressing Ctrl+Shift+Enter.
In the following examples, I'll use those braces to indicate the
formulas that you must arrayenter.
Set Up Your Data
This
figure shows a GrayRow Table used as a database of sales data, data I'll use in this
article. You can
download the
workbook and examples using this link.
To set up the database yourself, enter the data as shown.
The shaded areas in rows 3 and 17 mark the boundaries of your data.
Using those border rows is optional, but useful.
The dates are for 2007. To format these date values, select the
range A3:A17; choose Format, Cells; in the Number tab of the Format
Cells dialog, choose the Custom
category; enter mmmd for the Type; then choose OK.
(You also can choose Ctrl+1 to launch the Format Cells dialog.)
The following formulas make extensive use of range names. To assign
these names, select the range A2:E17; choose Insert, Name,
Create; make sure that only Top Row is checked; then choose OK.
(You also can choose Ctrl+Shift+F3 to launch the Create Names
dialog.)
Now, let's summarize this data...
Using the Data As the Criteria
At times you'll want to apply your criteria to the data you're
summarizing. If you need to apply only one criteria, you can do this
with arrays and with the SUMIF and COUNTIF functions:
=SUMIF(range,criteria,[sum_range])
=COUNTIF(range,criteria)
The brackets indicate that the sum_range is optional. If this
argument is missing, the SUMIF function returns a sum from the range
area.
Below, the left cell shows the result of the calculation and the right cell shows one or
more alternative formulas.
15 
=SUMIF(Units,"=5")
=SUMIF(Units,"=5", Units) 
Return the total number of Units sold for sales
of exactly 5 units. 
3 
=COUNTIF(Units,"=5") 
Return the number of times we've sold exactly 5 units. 
Above, the criteria are fully contained with the formulas. But
suppose you enter the value 5 in cell J3. You then could use these
formulas:
15 
=SUMIF(Units,"="&J3)
=SUMIF(Units,"="&J3, Units) 
Return the total number of Units sold for sales
of exactly the number of units entered in cell J3. 
3 
=COUNTIF(Units,"="&J3) 
Return the number of orders for exactly the
number of units entered into cell J3. 
You also could use array formulas to return the same results:
15 
{=SUM(IF(Units=5,Units,0))}
{=SUM(IF(Units=J3,Units,0))} 
Return the total number of Units sold for sales
of the specified number of units. 
3 
{=SUM(IF(Units=5,1,0))}
{=SUM(IF(Units=J3,1,0))} 
Return the number of times we've sold the
specified number of units. 
Notice that the two array formulas above sum ones
and zeros rather than using a count function.
Array formulas also allow you to perform other summary calculations.
For example, Excel 2007 introduced the AVERAGEIF function, but if you
use an earlier version of Excel you can use an array formula:
5 
{=AVERAGE(IF(Units=5,Units,""))}
{=AVERAGE(IF(Units=J3,Units,""))} 
Return the average number of units sold for
sales of the
specified number of units. 
Notice that the two formulas above use null strings ("") rather than
zero. If you were to use zero, the AVERAGE function would
include those zeros in the average.
Using NonData and a Single Criteria
So far, the examples have summarized data using criteria that applied to the data
itself.
More frequently, however, we summarize data by applying our criteria to
related columns of information that describe the data.
Here, for example, we find the number of units of Hats that have been
sold, the number of hat orders, and the average order:
41 
=SUMIF(Prods,"=Hats", Units) 
7 
=COUNTIF(Prods,"=Hats") 
We could use array formulas to return the same results:
41 
{=SUM(IF(Prods="Hats",Units,0))} 
7 
{=SUM(IF(Prods="Hats",1,0))} 
As before, "Hats" could be in a cell that the formulas could
reference.
And again, notice that to count the number of hat sales, we we SUM an
array of ones and zeros rather than using a COUNT function. And we also
use a null string ("") rather than zeros for the AVERAGEIF array
formula.
Introducing Array Formulas With Multiple Criteria
So far, the examples have used only one criteria. But frequently,
Excel users need to summarize data using multiple criteria. This is
where array formulas really become useful. But before you can use this
method successfully, you need to understand how array formulas perform
their calculations.
Arrayenter the following formula in some cell. Then, in the formula
bar, select the highlighted part of the formula:
{=SUM(IF(Prods="Hats",Units,0))}
With this piece of the formula selected, press the F9 key. When you
do so, the formula changes to:
=SUM(IF({FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;
FALSE;FALSE;FALSE;FALSE;FALSE;FALSE},Units,0))
(You can use a similar approach to calculate one or more sections of any
formula in your formula bar. After checking your formula, press Esc
to return to your original formula. If you accidentally press Enter
rather than Esc, just choose Edit, Undo.)
Now select Units in your formula bar and press F9 again. After you do so, the formula
expands to:
=SUM(IF({FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;
FALSE;FALSE;FALSE;FALSE;FALSE;FALSE},{0;7;5;4;7;4;8;6;1;
1;5;2;5;3;0},0))
Including the gray borders in the table, the database shown above and
below has 15 rows of data.
As a consequence, the green section of the formula has 15 TRUE and FALSE
results, and the brown section has 15 numbers. Each TRUE in the green
section marks where it's true that the product equals "Hats".
Press Esc to dismiss your changes in the formula bar. Now select the
highlighted part of this formula:
{=SUM(IF(Prods="Hats",Units,0))}
Again, press F9. When you do so your formula bar displays:
=SUM({0;7;5;4;7;4;8;6;0;0;0;0;0;0;0})
Now compare the final orange section with the final green and brown
sections above.
Notice that whenever the green section is TRUE, the orange
section contains the corresponding value from the brown section;
otherwise, the orange section contains zero.
From
another perspective, notice that whenever the product is Hats in this
figure, its number of units appears in the orange section above.
Finally, the SUM function
returns the sum of the numbers in the orange section.
In general, this is how all array formulas work. They apply the logic
of your array formula to each corresponding value in the multicell ranges
that your formula references.
Don't worry if this idea isn't too clear yet, the following examples
should help to give you a better understanding of how this works.
Using Array Formulas With Multiple Criteria
Let's take a simple example of how array functions can summarize
multiple criteria:
15 
{=SUM(IF((Clients="Smith")*(Colors="Blue"),Units,0))} 
Return the total number of blue products sold to
Smith. 
4 
{=SUM(IF((Clients="Smith")*(Colors="Blue"),1,0))} 
Return the number of times we've sold blue
products to Smith. 
From the previous discussion you already know that the Excel formula
converts both Clients="Smith" and Colors="Blue" into a series of TRUE
and FALSE values. But the gray section of the formula above multiplies
these two arrays. You can see the results of this process if you
calculate the gray section in your formula bar.
That is, after you arrayenter the formula by pressing Ctrl+Shift+Enter, select the gray section shown above and press F9. When
you do so, your formula bar will show this result:
=SUM(IF({0;0;0;0;0;1;0;1;0;0;0;1;0;1;0},Units,0))
Here, the array displays 1 only where the client equals "Smith" and
where the color equals "Blue". Otherwise, the formula returns 0. The
following display shows how this works.
(Clients="Smith") 
{FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;
FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;FALSE} 
(Colors="Blue") 
{FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;
FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;FALSE} 
(Clients="Smith")
*(Colors="Blue") 
{0;0;0;0;0;1;0;1;
0;0;0;1;0;1;0} 
Here, I've highlighted the four results in each array where the TRUEs
are in sync. These results correspond with the four values of 1 found in
the third row of the table.
Notice in the original array formula that each logical test is enclosed in parentheses.
These parentheses are
required. That is, within a multicriteria array formula:
This works: 
(Clients="Smith")*(Colors="Blue") 
This fails: 
Clients="Smith"*Colors="Blue" 
Here are some more examples of array formulas in action:
7 
{=SUM(IF((Prods="Coats")*(Colors="White"),Units,0))} 
Total number of white
coats sold 
6 
=SUM(IF((Prods="Coats")*(Colors="White")*
(Clients="Jones"),Units,0)) 
Total number of white
coats sold to Jones. 
6 
=SUM(IF((Prods="Coats")*(Colors="White")*
(Clients="Jones")*(MONTH(Dates)=12),Units,0)) 
Total number of white
coats sold to Jones in any December. 
Notice in the last example above that you can perform operations on
values in a column, and then compare the results to other values.
Specifically, the green section of the formula finds the month value of each
date and then compares it to 12.
However, this particular date test isn't very practical, because we typically need the month and year in a test like this...not
merely the month. Here is a
more practical date test:
19 
=SUM(IF((Dates>=DATE(2007,12,1))*(Dates<=DATE(2008,1,0))*
(Clients="Jones"),Units,0)) 
Total number of units
sold to Jones in December, 2007. 
(Notice that DATE(2008,1,0) returns the last day of
December, 2007.)In actual practice, you probably wouldn't include the DATE
function in your formula, for two reasons.
First, the formula forces Excel to calculate the same date value for each cell in
the array. This slows calculation slightly. Second, you would need to change
the formula to change the dates you want to report.
Instead, you probably would enter the date functions in two
cells of your spreadsheet. You could name the first cell FirstDate and
the second cell LastDate. Your array formula then could be:
19 
=SUM(IF((Dates>=FirstDate)*(Dates<=LastDate)*
(Clients="Jones"),Units,0)) 
Total number of units
sold to Jones within the specified dates. 
Using Array Formulas in a Report
So far, I haven't demonstrated the real value of using array formulas
with multiple criteria: They allow you to set up standard reports that
work much like PivotTables, but with more flexibility in many ways.
Here,
for example, the actual report is in the range A7:D11. The area at the
top of the page contains setup values and side calculations.
The two values highlighted in yellow act like PivotTable page fields.
You can change their values to change the report.
Cell B5 checks the report for errors. If the grand total in cell D11
doesn't match the total of all sales for the specified client and month,
the Error value turns TRUE, and the report title changes to "ERROR!!".
The key formulas for this report are shown with their cell addresses
below. These formulas rely on names assigned to the range B1:B5. To
assign these names, select the range A1:B5; choose Insert, Name, Create;
in the Create Names dialog make sure
that only Left Column is checked; then choose OK.
(You also can choose Ctrl+Shift+F3 to launch the Create Names
dialog.)
Here are the key values and formulas:
B1: 12/1/2007
B2: Jones
Enter these values as shown. To format the date in cell B1, select
that cell; choose Format, Cells; in the Number tab choose the Custom
category; enter this format string as the Type value:
mmmyy; then choose OK.
J3: =DATE(YEAR(ReportMonth),MONTH(ReportMonth),1)
B4: =DATE(YEAR(ReportMonth),MONTH(ReportMonth)+1,0)
The array formulas that follow use the values returned by these
standard formulas, as I described above.
B5: {=SUM(IF((Dates>=FirstDate)*(Dates<=LastDate)*
(Clients=Client),Units,0))<>$D$11}
This array formula finds total unit sales for the specified month and
client, and then compares that value to the grand total in cell D11. If the
values don't match  if there's an error  the formula returns TRUE; otherwise, it returns
FALSE.
A7: =IF(Error,"ERROR!!","Unit Sales, "&Client)
Normally, this formula displays the current client name with the
report title. But if there's an error, this formula returns "ERROR!!".
D7: =TEXT(ReportMonth,"mmmm yyyy")
This formula returns the date specified in cell B1. It uses the TEXT
function to format that date as shown in the report. The cell is
rightjustified.
B9: {=SUM(IF((Dates>=FirstDate)*(Dates<=LastDate)*
(Clients=Client)*(Prods=$A9)*(Colors=B$8),Units,0))}
This array formula uses five criteria to return the correct value to
the cell.
When you copy this formula to the rest of the report, you must paste
it in two steps. This is because Excel doesn't allow a cell with an
array to be copied to a range that includes the copied cell.
Therefore, copy cell B9 and paste it to cell D9. Then copy the range
B9:C9 and paste it to the range B10:C10.
D9: =SUM(B9:C9)
B11: =SUM(B9:B10)
These formulas provide the row and column totals. Copy them down or to the right as needed.
Other Uses of MultiCriteria Array Formulas
Array formulas aren't limited to merely SUM and AVERAGE. In fact, you
can use them for many different summary functions. Here are some examples:
6 
{=MAX(IF((Clients="Smith")*(Colors="Blue"),Units,""))} 
2 
{=MIN(IF((Clients="Smith")*(Colors="Blue"),Units,""))} 
The largest and the
smallest number of blue products sold to Smith. 
5 
{=LARGE(IF((Clients="Jones")*(Colors="Blue"),Units,""),2)} 
The second largest number of blue products sold to
Jones. 
7 
{=SMALL(IF((Clients="Jones")*(Prods="Hats"),Units,""),2)} 
The second smallest number of
hats sold to
Jones. 
6 
{=MEDIAN(IF((Colors="White")*(Prods="Hats"),Units,""))} 
The median number of
white hats sold. 
Potential Problems with Arrays
Array formulas are very powerful. But they can have several problems:
 Only a small number of Excel users use them regularly. So if
you'll be using them for the first time, you might not be able to
find a coworker who can help you with them.
 You can't copy a cell with an array and then paste it to a range
that includes the cell. Therefore, if you want to copy the cell to a
range with several rows and columns you typically will do so in two steps. First,
you copy the topleft cell to the rest of the row that needs an
array formula. Second, you copy that row of formulas downwards as
needed.
 Array formulas do a lot of work. Depending on the number
of rows of data and on the speed of your computer, arrays can slow
your workbook's calculation to a noticeable degree. However, they
are so powerful that you probably won't mind if they require a few
extra seconds to calculate.
 If you forget to arrayenter an array formula,
you'll probably get a #VALUE! error. And in rare cases, you might
get an incorrect answer. Therefore, it's always a good idea to set
up an error test as I explained above. (Error testing is a good idea
in any case, of course.)
Even with these potential problems, using arrays to summarize your
data can reduce Spreadsheet Hell. This is because much of Spreadsheet Hell involves sorting and summarizing data.
By using formulas like the ones shown here to summarize your data, you can begin to reduce
the more timeconsuming chores associated with Excel reporting.
The sample workbook with the data also includes the report described
above. You can
download the
workbook using this link.
Also, for more about arrays, check out
Summarize Spreadsheet Data
With Excel's Array Formulas.
