Excel Data Queries
Use MS Query with Text Files for Dynamic Excel Reporting
You can treat text files in a folder as relational tables in a database. You can join them by their common fields.
Query them using SQL. And use the queries with PivotTables. Here's how.
by Charley Kyd, MBA
Microsoft Excel MVP, 2005-2014
The Father of Spreadsheet Dashboard Reports
In the article, Use MS Query to
Treat Excel As a Relational Data Source, Marty Ryerson
introduced a mini-database
of three tables, and used Excel as the database container.
This time, let’s look at the same three tables,
only this time they'll be stored as text files. For this example,
they're tab-delimited text files, but they could just as easily be
comma-delimited or fixed length.
As in Marty's article, we rely heavily on the book that Timothy Zapawa
published this year,
Excel Advanced Report Development.
If you want to follow along with this description, you can
download the sample text files
using this link.
Create a Connection
The first order of business is to create a connection to your text file database.
This is more involved than the connection to an Excel workbook,
for two reasons.
First, an Excel workbook is capable of storing all of the information
about the data that you're connecting to. With a text file, you will need
to tell Excel how to interpret each field.
Second, the MS Query Engine is based on Access version 1.0, and it
has a bug that has never been corrected. Therefore, we need to
work around the bug.
The first step is to create a new connection -- a "data source" -- to
add to the list in the Choose Data Source dialog below.
In Excel, open a new workbook. Choose Data, Import External
Data, New Database Query, which displays the Choose Data Source dialog.
Then select the <New Data
Source> item, and click OK.
Excel displays the Create New Data Source
dialog. In the first box, enter a name for your connection. This is the name that will appear in the Choose Data Source list later
After you enter a name, Excel enables the second box where you will
use the drop-down list to select a driver. Since we intend to connect to a text file,
choose the Microsoft Text Driver (*.txt, *.csv).
After you select the driver, Excel enables the Connect button. When you click on it,
Excel launches the ODBC Text Setup dialog.
The default behavior is to use the current
directory to store your text files, but in practice, we almost always
will want to navigate to the directory where the text file resides. This could be on your computer or on a shared network drive. Deselect
the checkbox for Use Current Directory. This will enable the Select
Directory button, which you should click next.
Navigate to the folder where your text files are
located. After you do so, you should
see the names of your text files listed in the left pane.
Choose OK. After you do so, you will be returned to the ODBC Text Setup dialog.
Click the Options>> button in
the lower-right corner. This will expand the dialog, as shown here.
Click on the Define Format button at the bottom of the dialog. This
displays the Define Text Format dialog, shown below.
In the left pane of the dialog, select the first text file you will
use and then make three changes to the dialog's settings:
- If the text file begins with a line of headers (column names),
place a check in the box next to Column Name Header.
- Select the type of delimiter or "Fixed Length" from the Format
dropdown list box.
- Enter a number, if necessary, for the Rows to Scan. This
specifies the number of rows scanned when you preview data. At
times, Excel sets this number to 25 by default, which is fine.
The setting for ANSI or OEM seldom matters. You can ignore it.
Click the Guess button in the upper right of the dialog. You should see
the headers from your text file, if it has them. Otherwise, you will see
"F1", "F2", "F3", etc. for Field One, Field Two, Field Three, and so forth.
In this dialog you often must correct the Data Types generated by Excel's
One reason for this is that many number codes have leading zeros, like "000123", "000234", etc.
Excel typically guesses that fields with these codes contain integer
data. But you
must change them to Char to preserve these zeros when you export to
Second, Excel often guesses that dates are integers or characters.
You must change them to dates.
After you modify any field in the right page, you must click the Modify button
to force Excel to remember that change.
After you define each field associated with each file in the left
panel, click OK until you return to the Choose Data Source
Modify the Schema, If Necessary
If Excel didn't have a certain bug, you could create your query
immediately. However, you might need to take one more step before you create the
query. Therefore, choose Cancel to dismiss the Choose Data Source
When you create a connection, Excel saves that information in the schema.ini
file, which is saved in the folder that contains your
text files. Unfortunately, when Excel creates this file, it fails to
label the Char (character) fields on some computers.
To illustrate, on Charley's computer, Excel does not include
the Char label in the ini file. But on Marty's computer, Excel does
include the Chart label.
Therefore, before you can continue, you must label the Char fields
manually, if necessary. To do so, use Notepad or some other text editor to open schema.ini. The file will look something like the Before column shown
Here, we've formatted the column definitions you don't need to
modify with a green font. Those that you do need to modify have a
brown font. You can see that the green items in the left pane include a
data type and the brown items don't.
Therefore, as shown in the right pane, simply add a space plus "Char"
after each field that should be a character field, if necessary. Also, if you notice
any corrections that you need to make to the other data types for the
other columns, you can make those corrections here.
Once you make these simple corrections, save schema.ini.
Create a Query
In Excel, activate a new workbook. Choose Data, Import External Data,
New Database Query, which displays the Choose Data Source dialog, shown below. Make sure the
checkbox at its bottom, labeled Use the Query Wizard to create/edit
queries, is not checked. Select the name you assigned to your connection. Then click OK.
Excel displays the Microsoft Query graphical interface,
where you can begin to build your query. Excel also displays the Add Tables
dialog, as shown below.
Click on each table you want to add, and then click on the Add
button. Repeat this process for each table -- text file -- you
want to add. (You also can double-click on each table you want to add.)
The trick of holding down the CTRL key to select multiple files doesn't
For our example, we'll add all three of these
tables. After doing so, close the Add Tables dialog.
The tables you added will be displayed in the top pane of the
MS Query interface, the Table Pane. In each
table box, there will be a list of fields.
If you've selected more than one table for your
report, as in this example, you'll need to “join” the tables. To do so, you'll
need to know which fields each pair of tables have in common. Often -- but by no
means always -- the two common fields will have the same name. What is
much more important is that they both have the same data format
(including width) and that they contain the same “kind” of information.
In our case, we'll join CUSNUM (Customer Number) in the CUST
(Customer) table to the CUSNUM field in the ORD (Orders) table. The
easiest way to do this is to click on one of them and drag it to
the other. This will draw a line between the tables.
We'll also join SALESREP (Sales Rep ID#) in the ORD table to
the SALID (Sales Rep ID#) field in the SREP (Sales Reps) table. Note
that, although these last two fields do not have the same name, they do
have the same data format and contain the same domain of data; that is,
the numbers in the two tables match.
Add Fields to the Report
Our next step is to specify the fields to add to our report.
At the top of the list in each table box is an
asterisk. Double-click on the asterisk if you want to add all
of the fields in this table to your report. Otherwise, double-click each
field you want in your report. This will cause
the field to appear in the column heading of the Data Pane, at the
bottom of the display.
For this example, we want the fields illustrated in this figure:
You'll find it easier to select the fields in the order
that you want them to appear (left to right) in the report. If you decide
you want to rearrange them after you've selected them, you can select
a column heading and then drag it to the position you want. The program
is a little fussy at that point, but it can be done.
You'll see two buttons with exclamation
points in the toolbar. The right-most one includes two semi-circular arrows,
as shown here. Make
sure this button does not appear to be pushed in. If it is, MS Query
will attempt to run the query every time you make a change to it. This
can be annoying.
Add Criteria to the Report
For this example, we want to add criteria.
To display the Criteria Pane, click on the Show/Hide Criteria
button shown here. After you do so, the Criteria Pane will appear between the Table
Pane and the Data Pane.
For this example, we want to run a separate report for each
state. Therefore, we'll add a criterion to restrict the returned
data set by state. To do so, drag the “ST” field from the CUST
table box to the Criteria Field of the Criteria Pane.
If this were a one-time report, and you only wanted the records for,
say, West Virginia, you could just type "WV" in the Value line of the
Criteria Pane. But in this example, we want a refreshable report that
will allow us to choose each state as needed. This means we must add
a parameter instead of a value in the value line.
To indicate a
parameter place a "prompt" string -- which is enclosed in square brackets
the value field. In this example, we've added "[State]" in the value
Entering a prompt string will cause a prompt box to come up when we run this query. The string between the brackets will be displayed in the box to remind
us what to type in the text box. The query will then run with that
value as its criterion.
In MS Query you can have multiple, complex criteria, and
more than one of them can be parameterized. The only requirement is that
the query must be
generated using the graphic interface. More complex queries require a
different method, beyond our scope.
Test Your Query
At this point, we've finished our query, and MS Query has constructed
an SQL statement for us. We can see it by clicking on the "SQL" button
in the toolbar. Notice that the parameter is represented by a question
mark. After you view the query, Cancel this box.
test your query, click on the exclamation point icon in the toolbar,
shown here. MS Query will display a dialog titled Enter Parameter Value.
This dialog will contain a text box for each prompt string you've
specified. For our example, it will say "State". Enter "WV", "OH", or
"PA". When you click on the OK button, the data will be displayed in the
Save the Query
To save the query, choose File and Save As. This works pretty much as
you would expect, with one exception: When you use a prompt, MS Query
prompts you at inappropriate times. Specifically, when you save your
query or when you close it, Excel issues the prompt. When this happens,
just give Excel the information it asks for and then save the file as
you normally would.
It's a good idea to save your query in the default directory. If you
keep all your queries in this directory, MS Query can find them on its
own without your having to search for them.
Create an Excel Report
that the query is done, it's time to return our data to Excel. Click on
the "open door" icon shown here. When you do so, MS Query activates
Excel and launches the Import Data dialog.
You now can select where you want your data to appear. After you
click OK, the top of your range of data looks something like this:
One way to refresh your report is to right-click anywhere within it
and select Refresh Data from the menu. You also can choose Data, Refresh
Better yet, you can use the External Data toolbar shown here. To
launch the toolbar, right-click on any toolbar and choose External Data
from the list. Then, to refresh the data, click the tool with the red
Working With Your Report
MS Query provides several useful features for working with Excel
One valuable feature is that MS Query maintains a named range where
the data is displayed in a spreadsheet. This range is dynamic,
which means that it re-sizes automatically each time you refresh your
Because the default range name tends to be quite long, you probably
will want to change it. You can right-click on the data range and select
Data Range Properties. Then, in the External Data Range Properties
dialog, you can rename the range and perform many other customizations.
The range name is sheet-level, so if you want to refer to it from
another sheet, you will need to include the sheet name.
To illustrate, we changed the name to MyData. Then, we use this
formula to return the grand total of the AMT column:
Here, the OFFSET function returns a reference to the column with the
AMT data and SUM returns the total. The arguments for the
OFFSET function are: OFFSET( reference, rows, cols, height, width)
Another useful feature is that you can change the query parameter.
You can do so in three ways, as shown in the Parameters dialog below. To
launch this dialog, click the third tool from the left in the External
Data toolbar shown above, or right-click your data range and choose
The third choice in this dialog offers a particularly convenient
solution. You can specify a cell that contains your parameter value. You
can enter this value manually, return it by formula, return it by using
a dropdown list box, and so on.
Finally, you can use MS Query to feed a PivotTable. However, you can
not use a PivotTable with a parameterized query. Therefore, click the
first button (the Edit Query button) in the External Data toolbar.
Modify your query to either hard-code a desired criteria or to remove it
entirely. Then save the query with a new name.
Then, to use that query with a PivotTable, follow the general
instructions that Marty provided in the last page of
Use MS Query to Treat Excel As a Relational
Data Source. However, when Excel launches Microsoft Query, close the
Add Tables window as shown here.
Next, in the Microsoft Query window, choose File, Open to open the
query you created for use with PivotTables. (Instead, of course, you
could at this point create and save a new query for your PivotTable to
use.) Close your Microsoft Query window. You then can create a
PivotTable the way you normally would.
Zapawa's book offers additional ways to bring external data into Excel.
You'll find a link to it at the beginning of this article.
And again, you can
download the sample text files
using this link.