A Two-Part Excel Puzzle, with Arrays, Text, ASCII, and Concatenation

It should have been a simple Excel formula. But it became a two-part puzzle needing arrays, ASCII codes, and text concatenation to solve.

3894

It should have been a simple Excel formula. But it became a two-part puzzle needing arrays, ASCII, and text concatenation to answer.Recently, I needed to compare two long columns of text of various lengths to make sure that each row of text in column A and column B were identical.

That is, only “Ant” = “Ant” and “ant” = “ant” should pass the test. But “Ant” = “ant” should fail the test.

I knew that a formula like =A2=B2 wouldn’t work because the formula =”Ant”=”ant” returns TRUE. That is, Excel’s normal formula is case-insensitive.

So what formula would give me a case-sensitive comparison?

For some reason, the EXACT function had slipped my mind:

=EXACT( text1, text2 )

This function would have given me the results I needed immediately.

Instead, to solve this puzzle, I took a journey through arrays, concatenation, ASCII, and the differences between the current version of Excel and an older version, like Excel 2016 or earlier.

Solving those two puzzles turned out to be great fun for me. If you enjoy puzzles like this, before you read on, try to think of one solution for Excel 365 or 2019, and another solution for Excel 2016.

Getting Started

Suppose I want to test whether “Ant” is exactly the same as “ant”—without using the EXACT function.

The only way that I could think to do that would be to convert both text values into their ASCII equivalents, and then compare those two sets of values somehow. I couldn’t compare the sums of those values, because the sums could turn out to be the same for non-identical text—like “Aa” and “aA”. Instead, I had to compare their ASCII values in some other way.

Puzzle 1: The Excel 365 Solution

Because I had to convert each character in a cell to its ASCII value, I had to address each character one at a time. That meant that I needed to use these two functions:

MID( text, start_num, num_chars )

CODE( text )

The MID function returns the specified character or characters from the text. Therefore, to get started, I would need to use these two functions somewhat like this:

=CODE( MID( text, 1, 1 ) )
=CODE( MID( text, 2, 1 ) )
=CODE( MID( text, 3, 1 ) )
Etc.

However, writing all those formulas would be a painful task. I needed to simplify the process. To do that, I decided to use the SEQUENCE function:

SEQUENCE( rows, [columns], [start], [step] )

Excel 365 introduced the SEQUENCE function when it introduced dynamic arrays in 2018. The function returns an array or range of numbers as defined by its arguments. That is, the formula…

=SEQUENCE(3)

…returns:

1
2
3

In worksheet functions, some arguments expect a range of values and other arguments expect just one value. For example, the second argument of the MID function…

=MID( text, start_num, num_chars )

…expects only one value: the start_num.

With that in mind, here’s why the SEQUENCE function was important to solving my puzzle:

In a function, if we pass an array of values to an argument that expects only one value, Excel calculates the result for each of the values in the array, and then the function returns an array of results.

In this case, I needed the SEQUENCE function to return an array of sequential numbers so that CODE and MID could return an ASCII value for each character in the text I wanted to test.

But how many sequential numbers would I need each SEQUENCE function to return for each row of text? To answer that question, I needed to use:

LEN( text )

This function returns the number of characters in text. And therefore this formula would return the sequential array I needed in each row of text:

=SEQUENCE( LEN( text ) )

I now needed a function to concatenate the characters in that array, but I had three functions to choose from in Excel 2019 and 365:

CONCATENATE( text1, [text2], … )
CONCAT( text1, … )
TEXTJOIN( delimiter, ignore_empty, text1, … )

Which should I use?

First, I knew right away that CONCATENATE would be useless. This is because that function won’t combine text in a range of values, and that’s essentially what I needed. Even so, I tested it using an array constant…

=CONCATENATE({65,97})

…where those two numbers in the array are the ASCII values for “A” and “a” respectively. But as I expected, CONCATENATE returned: {65,97}

That is, CONCATENATE didn’t concatenate the two values in my array.

(Honestly, I’ve never found a use for the CONCATENATE function. Where it CAN be used, it’s much easier to use the ampersand instead, like this: =A1 & B1 )

Second, we have the CONCAT function, which was introduced in Excel 365 and included in Excel 2019. This function combines text in one or more ranges, or in an array, while adding no delimiters. Therefore, this formula…

=CONCAT({65,97})

…returns this result as text:

6597

This would probably work, but I thought it would be safer to add a delimiter between each chunk of ASCII values.

Third, therefore, I went with…

=TEXTJOIN(“-“,TRUE,{65,97})

…which returned the text…

65-97

…which was exactly what I needed.

And so, if I enter “Aa” in cell A1, I can use this formula…

C1:  =TEXTJOIN(“-“,TRUE,CODE( MID( A1, SEQUENCE(LEN(A1)), 1 ) ))

…to return 65-97.

And if I entered “aA” in cell B1, I’d get 97-65 in cell D1, after I copied the formula there.

And therefore, I had a formula that would allow me to compare text in two different columns to see if they were exactly the same.

Puzzle 2: A Solution for Excel 2016 and Earlier

In Excel 2016, I still could use a formula pattern like this:

=CODE( MID( text, start_num, 1 ) )

But because SEQUENCE isn’t available, I needed to find another way to use an array for the start_num values. The only way that I could think of was to use an array constant, like this:

=CODE( MID( text,{1;2;3;4;5;6;7;8;9;10}, 1 ) )

Here, the maximum length of any string I wanted to test was only ten characters long. And shorter strings would be no problem for the MID function. That’s because when MID tries to return, say, the tenth character of a four-character string, it returns a null string.

However, when CODE tries to return the ASCII value for a null string, Excel returns that pesky #VALUE! error. So that meant that I had to wrap the formula in an IFERROR function, like this:

=IFERROR(CODE( MID( “Aa”,{1;2;3;4;5;6;7;8;9;10}, 1 ) ),””)

To confirm what that formula was returning, I pressed F2, which activated my formula bar, and then F9 to calculate the formula in my formula bar. And that gave me this result:

{65;97;””;””;””;””;””;””;””;””}

That is, the formula returned an array that consisted of two numbers and eight null strings…as I had expected.

Then I pressed the Esc key, which restored my original formula in my formula bar and returned me to Excel’s Ready mode.

But now what? I couldn’t think of an easy way to concatenate those ten items as I could in a modern version of Excel. So that meant that I’d need to compare the two arrays in their original form.

That was no problem, however. Here’s how I did that:

When we enter a formula like this…

={1;2;3}<>{3;2;1}

…into a cell, Excel compares each array row-by-row. Depending on which version of Excel we have, Excel will DISPLAY the results differently. But the underlying calculation is the same.

In all versions of Excel, however, we can test the formula in our formula bar. That is, if we press F2 then F9, we’ll get:

={TRUE;FALSE;TRUE}

And then, if we modify the formula like this…

=({1;2;3}<>{3;2;1})+0

…we’ll get:

={1;0;1}

(That ending zero, of course, converts the TRUE and FALSE values generated by the “<>” test to 1 and 0.)

And THAT result means that if this formula…

=SUM(({1;2;3}<>{3;2;1})+0)

…returns a value greater than zero, the two arrays don’t match.

And that’s the formula pattern I would need to use in earlier versions of Excel.

And so, if the columns of text I wanted to compare were in columns A and B, then this formula…

C1:  =SUM((IFERROR(CODE(MID(A1,{1;2;3;4;5;6;7;8;9;10}, 1)),””)
<>IFERROR(CODE(MID(B1,{1;2;3;4;5;6;7;8;9;10}, 1)),””))+0)

…will return a non-zero value whenever the text in cells A1 and B1 don’t match. And this formula should work in any version of Excel that supports the IFERROR function—as long as none of the text is longer than 10 characters.

Of course, using the EXACT function would have been a lot faster to set up. But solving those two puzzles turned out to be a lot more fun!