I’ve used Excel long enough to recognize when I’ve stumbled across an unusual formula. And it happened again a few minutes ago.
In the past, I made mental notes to tell you about other such formulas—sometime in the future. But honestly, my mental notes are like writing on fog. So today, I decided to stop what I’m doing and tell you about my new formula.
My data is from the OECD, the Organisation for Economic Co-operation and Development, which is an intergovernmental economic organization with 38 member countries.
Oversimplified, my data series looks like this Table, but with more than 4,000 rows and with many more columns of data about each series.
My challenge was to write a formula in an adjacent column, a formula that would list the name of each country buried somewhere in each title in that list of more than 4,000 titles.
I started about the same way that you probably would. That is, I wrote very long formulas that extracted the text that followed words like “from”, “in”, and “for”. But in the actual data, I found many exceptions, including some titles that followed no particular pattern I could use to determine where the name of a country might appear in each title.
However, the good thing about my initial efforts was that I had developed a list of 46 names of countries, a list that began like the one in this table.
And when I looked at that list of countries, something occurred to me: Rather than trying to extract the name of a country from a long title, why not switch the process around?
That is, why not search for each of the 46 names of countries in each title? Logically, only one country name should be found in each. So if I could figure out how to identify the one name that was found by those 46 searches, I could list that name in a new column of country names.
I started with a formula like this…
…where t.Country was the name of my list of 46 countries, and A was the name of the worksheet that contained the list.
Do you see something weird about that formula? Let me give you a hint. Here’s the syntax for SEARCH:
=SEARCH(find_text, within_text, [start_num])
Here, the argument in brackets is optional. So, because we won’t need it, I’ll ignore that third argument in the rest of this article.
The first argument in a SEARCH function expects one value: the text for which we’re searching. But my weird formula isn’t searching for one text value; it’s searching for an entire list of them.
Here’s what’s happening…
Some arguments in Excel functions expect to receive single values, and other arguments expect to receive arrays or ranges of values. And as it turns out, all the arguments for the SEARCH function expect to receive single values.
But if we pass an array of values—which typically would be a row or column of values—to an argument that’s expecting a single value, Excel will calculate that function for EACH value in the array. And then it will return an array of results.
So to see what my SEARCH formula returned, I highlighted the SEARCH part of my formula in my formula bar, like this:
Then I tapped the F9 key. And that returned this array to my formula bar:
That is, it returned error values for the names of 45 countries that it couldn’t find, and a number for the beginning position in the title for the one country that it could find.
What a mess!
To clean it up, I first tapped on the Esc key to restore my original formula and return Excel to the Ready mode.
The first change I needed to make to the formula, I realized, was to get rid of those error values. That was easy, I just wrapped an IFERROR function around the SEARCH function, like this:
D3: =IFERROR( SEARCH(A!t.Country,[@Title]), 0 )
Because I told IFERROR to return a zero for any error, the formula returned this array to my formula bar:
That’s an improvement. But how, precisely, could I use this array to return the name of the country that’s represented by that one non-zero value?
The answer depends on which version of Excel we’re using.
Versions of Excel Without the FILTER Function
If you don’t have Excel 365, and if you don’t have a version of Excel later than Excel 2019, then your version of Excel probably doesn’t support the FILTER function. But that’s no problem, because you DO have the SIGN function.
The SIGN function has this syntax:
SIGN returns +1 if number is positive, -1 if number is negative, and 0 if it’s zero. And because SIGN is expecting one value for number, we can wrap the SIGN function around the formula in cell D3 like this:
D3: =SIGN( IFERROR(SEARCH(A!t.Country,[@Title]),0) )
And that gives us this array of results in the formula bar:
Believe it or not, that’s a huge improvement!
That array now contains a known value that we can look up!
So, if we wrap a MATCH function around the formula, like this…
D3: =MATCH(1, SIGN(IFERROR(SEARCH(A!t.Country,[@Title]),0)), 0)
…we’re asking MATCH to look up the value of 1 in our array, and to return the row index number within the t.Country list where that value is found.
So our Table of results now looks like this:
Here, in cell D3, we have the index number of the position within the t.Country list where Australia—in this case—can be found. That is, cell D3 tells us that Australia is the second item in the list.
So, to complete the formula, all we need to do is to wrap an INDEX function around it, like this…
D3: =INDEX( A!t.Country,
MATCH(1, SIGN(IFERROR(SEARCH(A!t.Country,[@Title]),0)), 0) )
…to return “Australia” from its second index position within the t.Country list.
And that gives us our column of countries, like this:
As you can see, we get the names of the countries as expected. INDEX returns the name of the country from the t.Country range for the index position returned by the MATCH function.
And in our new test in row 9, we get an #N/A value. This is because neither “Test” nor “Montana” is listed in the t.Country list as the name of a country, of course.
Versions of Excel With the FILTER Function
Here’s our beginning formula again:
D3: =IFERROR( SEARCH(A!t.Country,[@Title]), 0 )
And again, it returns this result in your formula bar:
That array is exactly what we need for the FILTER function! Here’s why…
The FILTER function has this syntax:
=FILTER(my_array, include_array, [if_empty])
As before, I’ll ignore the optional [if_empty] argument.
Here’s how FILTER works:
FILTER filters my_array based on the values in the include_array. Specifically, it returns an array of values from my_array only for each row in which include_array has a value of TRUE.
Now, here’s the key point about how FILTER does this:
When Excel evaluates values in the include_array, it treats zero values as FALSE and all other numeric values as TRUE.
Therefore, we can copy the previous formula for cell D3 to cell E3, and then wrap it with a FILTER function, like this:
E3: =FILTER( A!t.Country, IFERROR(SEARCH(A!t.Country,[@Title]),0) )
That is, if we use the t.Country column in sheet A as my_array, and we use the results from our previous formula as the include_array, we get this result in column E:
As you can see, the values in column D and E match, with the exception of those in row 9. In cell E9, Excel has returned the #CALC! error for our Montana test, rather than #N/A.
However, we do have one last issue to deal with. If include_array returns more than one result, we would get a #SPILL! error in column E. In the Table above, that would occur if both “Russia” and “the Russian Federation” are in our list, or if both “the Republic of Korea” and “Korea” are in our list.
The way I’m currently dealing with that issue is to add both versions of the name for Russia, Korea, and so on, and then sort my t.Country list by the number of characters in its name, from highest to lowest. (To do so, I added a column of LEN function formulas, and then sorted on that column.) And then, I wrap the previous formula for E3 with an INDEX function, like this:
E3: =INDEX( FILTER( A!t.Country, IFERROR(SEARCH(A!t.Country,[@Title]),0) ), 1)
By returning the first INDEX value, this formula always returns the longest name. If that ever becomes a problem, I can deal with it easily as a second step.
(The Not365 version won’t have this problem because MATCH with a zero for its third argument always returns the row index for the first match it finds.)
And finally, both the Not365 and 365 formulas in the workbook are surprisingly fast. In my actual workbook, which has 4209 rows of data, both formulas returned their results in less than a second.
Just to cover all bases, I tried the same method with the FIND function, and it worked without a hitch.
Keep this steroidal SEARCH example in mind. I’m sure you’ll find other reasons to use either the actual formulas or the logic behind them.