ExcelUser logo Free guides and templates
Work In Progress...After nearly ten years, I'm redesigning ExcelUser.com. This is the new design. You can learn more here. Also, if you find something wrong with the site, please tell me about the problems. And thanks for your patience.--Charley Kyd

Excel Data Queries

Use MS Query with Text Files for Dynamic Excel Reporting

You can treat text files in a folder as relational tables in a database. You can join them by their common fields. Query them using SQL. And use the queries with PivotTables. Here's how.


Charley Kyd is a Microsoft Excel MVP by Charley Kyd, MBA
Microsoft Excel MVP
The Father of Spreadsheet Dashboards

In the article, Use MS Query to Treat Excel As a Relational Data Source, Marty Ryerson introduced a mini-database of three tables, and used Excel as the database container.

This time, let’s look at the same three tables, only this time they'll be stored as text files. For this example, they're tab-delimited text files, but they could just as easily be comma-delimited or fixed length.

As in Marty's article, we rely heavily on the book that Timothy Zapawa published this year, Excel Advanced Report Development.

If you want to follow along with this description, you can download the sample text files using this link.

Create a Connection

The first order of business is to create a connection to your text file database. This is more involved than the connection to an Excel workbook, for two reasons.

First, an Excel workbook is capable of storing all of the information about the data that you're connecting to. With a text file, you will need to tell Excel how to interpret each field.

Second, the MS Query Engine is based on Access version 1.0, and it has a bug that has never been corrected. Therefore, we need to work around the bug.

The first step is to create a new connection -- a "data source" -- to add to the list in the Choose Data Source dialog below.

In Excel, open a new workbook. Choose Data, Import External Data, New Database Query, which displays the Choose Data Source dialog. Then select the <New Data Source> item, and click OK.

Excel displays the Create New Data Source dialog. In the first box, enter a name for your connection. This is the name that will appear in the Choose Data Source list later on.

After you enter a name, Excel enables the second box where you will use the drop-down list to select a driver.  Since we intend to connect to a text file, choose the Microsoft Text Driver (*.txt, *.csv).

After you select the driver, Excel enables the Connect button. When you click on it, Excel launches the ODBC Text Setup dialog.

The default behavior is to use the current directory to store your text files, but in practice, we almost always will want to navigate to the directory where the text file resides. This could be on your computer or on a shared network drive. Deselect the checkbox for Use Current Directory. This will enable the Select Directory button, which you should click next.

Navigate to the folder where your text files are located. After you do so, you should see the names of your text files listed in the left pane.

Choose OK. After you do so, you will be returned to the ODBC Text Setup dialog.

Click the Options>> button in the lower-right corner. This will expand the dialog, as shown here.

Click on the Define Format button at the bottom of the dialog. This displays the Define Text Format dialog, shown below.

In the left pane of the dialog, select the first text file you will use and then make three changes to the dialog's settings:

  1. If the text file begins with a line of headers (column names), place a check in the box next to Column Name Header.
     
  2. Select the type of delimiter or "Fixed Length" from the Format dropdown list box.
     
  3. Enter a number, if necessary, for the Rows to Scan. This specifies the number of rows scanned when you preview data. At times, Excel sets this number to 25 by default, which is fine.

The setting for ANSI or OEM seldom matters. You can ignore it.

Click the Guess button in the upper right of the dialog. You should see the headers from your text file, if it has them. Otherwise, you will see "F1", "F2", "F3", etc. for Field One, Field Two, Field Three, and so forth.

In this dialog you often must correct the Data Types generated by Excel's guessing logic.

One reason for this is that many number codes have leading zeros, like "000123", "000234", etc. Excel typically guesses that fields with these codes contain integer data. But you must change them to Char to preserve these zeros when you export to Excel.

Second, Excel often guesses that dates are integers or characters. You must change them to dates.

After you modify any field in the right page, you must click the Modify button to force Excel to remember that change.

After you define each field associated with each file in the left panel, click OK until you return to the Choose Data Source dialog.

Modify the Schema, If Necessary

If Excel didn't have a certain bug, you could create your query immediately. However, you might need to take one more step before you create the query. Therefore, choose Cancel to dismiss the Choose Data Source dialog.

When you create a connection, Excel saves that information in the schema.ini file, which is saved in the folder that contains your text files. Unfortunately, when Excel creates this file, it fails to label the Char (character) fields on some computers.

To illustrate, on Charley's computer, Excel does not include the Char label in the ini file. But on Marty's computer, Excel does include the Chart label.

Therefore, before you can continue, you must label the Char fields manually, if necessary. To do so, use Notepad or some other text editor to open schema.ini. The file will look something like the Before column shown below.

Before After
[cust.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=CUSNUM
Col2=FNAME
Col3=LNAME
Col4=ST
Col5=TYPE

[ord.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=ORDNUM
Col2=ORDDATE Date
Col3=DELDATE Date

Col4=CUSNUM
Col5=SALESREP
Col6=AMT Currency
[srep.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=SALID
Col2=SALNAME
[cust.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=CUSNUM Char
Col2=FNAME Char
Col3=LNAME Char
Col4=ST Char
Col5=TYPE Char

[ord.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=ORDNUM Char
Col2=ORDDATE Date
Col3=DELDATE Date

Col4=CUSNUM Char
Col5=SALESREP Char
Col6=AMT Currency
[srep.txt]
ColNameHeader=True
Format=TabDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=SALID Char
Col2=SALNAME Char

Here, we've formatted the column definitions you don't need to modify with a green font. Those that you do need to modify have a brown font. You can see that the green items in the left pane include a data type and the brown items don't.

Therefore, as shown in the right pane, simply add a space plus "Char" after each field that should be a character field, if necessary. Also, if you notice any corrections that you need to make to the other data types for the other columns, you can make those corrections here.

Once you make these simple corrections, save schema.ini. 

Create a Query

In Excel, activate a new workbook. Choose Data, Import External Data, New Database Query, which displays the Choose Data Source dialog, shown below. Make sure the checkbox at its bottom, labeled Use the Query Wizard to create/edit queries, is not checked. Select the name you assigned to your connection. Then click OK.

Excel displays the Microsoft Query graphical interface, where you can begin to build your query. Excel also displays the Add Tables dialog, as shown below.

Click on each table you want to add, and then click on the Add button. Repeat this process for each table -- text file -- you want to add. (You also can double-click on each table you want to add.) The trick of holding down the CTRL key to select multiple files doesn't work here.

For our example, we'll add all three of these tables. After doing so, close the Add Tables dialog.

The tables you added will be displayed in the top pane of the MS Query interface, the Table Pane. In each table box, there will be a list of fields.

If you've selected more than one table for your report, as in this example, you'll need to “join” the tables. To do so, you'll need to know which fields each pair of tables have in common. Often -- but by no means always -- the two common fields will have the same name. What is much more important is that they both have the same data format (including width) and that they contain the same “kind” of information.

In our case, we'll join CUSNUM (Customer Number) in the CUST (Customer) table to the CUSNUM field in the ORD (Orders) table. The easiest way to do this is to click on one of them and drag it to the other. This will draw a line between the tables.

We'll also join SALESREP (Sales Rep ID#) in the ORD table to the SALID (Sales Rep ID#) field in the SREP (Sales Reps) table. Note that, although these last two fields do not have the same name, they do have the same data format and contain the same domain of data; that is, the numbers in the two tables match.

Add Fields to the Report

Our next step is to specify the fields to add to our report.

At the top of the list in each table box is an asterisk. Double-click on the asterisk if you want to add all of the fields in this table to your report. Otherwise, double-click each field you want in your report. This will cause the field to appear in the column heading of the Data Pane, at the bottom of the display.

For this example, we want the fields illustrated in this figure:

You'll find it easier to select the fields in the order that you want them to appear (left to right) in the report. If you decide you want to rearrange them after you've selected them, you can select a column heading and then drag it to the position you want. The program is a little fussy at that point, but it can be done.

You'll see two buttons with exclamation points in the toolbar. The right-most one includes two semi-circular arrows, as shown here. Make sure this button does not appear to be pushed in. If it is, MS Query will attempt to run the query every time you make a change to it. This can be annoying.

Add Criteria to the Report

For this example, we want to add criteria. To display the Criteria Pane, click on the Show/Hide Criteria button shown here. After you do so, the Criteria Pane will appear between the Table Pane and the Data Pane.

For this example, we want to run a separate report for each state. Therefore, we'll add a criterion to restrict the returned data set by state. To do so, drag the “ST” field from the CUST table box to the Criteria Field of the Criteria Pane.

If this were a one-time report, and you only wanted the records for, say, West Virginia, you could just type "WV" in the Value line of the Criteria Pane. But in this example, we want a refreshable report that will allow us to choose each state as needed. This means we must add a parameter instead of a value in the value line.

To indicate a parameter place a "prompt" string -- which is enclosed in square brackets -- in the value field. In this example, we've added "[State]" in the value field.

Entering a prompt string will cause a prompt box to come up when we run this query. The string between the brackets will be displayed in the box to remind us what to type in the text box. The query will then run with that value as its criterion.

In MS Query you can have multiple, complex criteria, and more than one of them can be parameterized. The only requirement is that the query must be generated using the graphic interface. More complex queries require a different method, beyond our scope.

Test Your Query

At this point, we've finished our query, and MS Query has constructed an SQL statement for us. We can see it by clicking on the "SQL" button in the toolbar. Notice that the parameter is represented by a question mark. After you view the query, Cancel this box.

To test your query, click on the exclamation point icon in the toolbar, shown here. MS Query will display a dialog titled Enter Parameter Value. This dialog will contain a text box for each prompt string you've specified. For our example, it will say "State". Enter "WV", "OH", or "PA". When you click on the OK button, the data will be displayed in the Data Pane.

Save the Query

To save the query, choose File and Save As. This works pretty much as you would expect, with one exception: When you use a prompt, MS Query prompts you at inappropriate times. Specifically, when you save your query or when you close it, Excel issues the prompt. When this happens, just give Excel the information it asks for and then save the file as you normally would.

It's a good idea to save your query in the default directory. If you keep all your queries in this directory, MS Query can find them on its own without your having to search for them.

Create an Excel Report

Now that the query is done, it's time to return our data to Excel. Click on the "open door" icon shown here. When you do so, MS Query activates Excel and launches the Import Data dialog.

 

You now can select where you want your data to appear. After you click OK, the top of your range of data looks something like this:

One way to refresh your report is to right-click anywhere within it and select Refresh Data from the menu. You also can choose Data, Refresh Data.

Better yet, you can use the External Data toolbar shown here. To launch the toolbar, right-click on any toolbar and choose External Data from the list. Then, to refresh the data, click the tool with the red exclamation symbol.

Working With Your Report

MS Query provides several useful features for working with Excel reports.

One valuable feature is that MS Query maintains a named range where the data is displayed in a spreadsheet. This range is dynamic, which means that it re-sizes automatically each time you refresh your report.

Because the default range name tends to be quite long, you probably will want to change it. You can right-click on the data range and select Data Range Properties. Then, in the External Data Range Properties dialog, you can rename the range and perform many other customizations. The range name is sheet-level, so if you want to refer to it from another sheet, you will need to include the sheet name.

To illustrate, we changed the name to MyData. Then, we use this formula to return the grand total of the AMT column:

=SUM(OFFSET(MyData,0,6,,1))

Here, the OFFSET function returns a reference to the column with the AMT data and SUM returns the total. The arguments for the OFFSET function are: OFFSET( reference, rows, cols, height, width)

Another useful feature is that you can change the query parameter. You can do so in three ways, as shown in the Parameters dialog below. To launch this dialog, click the third tool from the left in the External Data toolbar shown above, or right-click your data range and choose Parameters.

The third choice in this dialog offers a particularly convenient solution. You can specify a cell that contains your parameter value. You can enter this value manually, return it by formula, return it by using a dropdown list box, and so on.

Finally, you can use MS Query to feed a PivotTable. However, you can not use a PivotTable with a parameterized query. Therefore, click the first button (the Edit Query button) in the External Data toolbar. Modify your query to either hard-code a desired criteria or to remove it entirely. Then save the query with a new name.

Then, to use that query with a PivotTable, follow the general instructions that Marty provided in the last page of Use MS Query to Treat Excel As a Relational Data Source. However, when Excel launches Microsoft Query, close the Add Tables window as shown here.

Next, in the Microsoft Query window, choose File, Open to open the query you created for use with PivotTables. (Instead, of course, you could at this point create and save a new query for your PivotTable to use.) Close your Microsoft Query window. You then can create a PivotTable the way you normally would.


Zapawa's book offers additional ways to bring external data into Excel. You'll find a link to it at the beginning of this article.

 

Use MS Query to Treat Excel as a Relational Data Source




Dashboard Reporting With Excel


Charley's SwipeFile charts