How to Split and Index Delimited Text with Excel Formulas

Here's how you can split delimited text into an array with a short Excel formula, and then wrap the formula with the INDEX function to return any value from the array.

1190

Here's how you can split delimited text into an array with a short Excel formula, and then wrap the formula with the INDEX function to return any value from the array. For the longest time I’ve wished that Microsoft would add a SPLIT function to Excel. The function would have a syntax like this…

=SPLIT(text, delimiter, [index])

…where text is the text to split, delimiter is the delimiter that separates each chunk of text, and index is the optional number that returns the specified chunk of text.

Maybe someday.

But in the meantime, there are at least two other ways to split text. One of them has been floating around the web for a long time. It’s kind of complex and difficult to explain, but I’ll explain it in detail soon.

The other way works only with versions of Excel that have the FILTERXML function. If you have that function, here’s how to do it…

Splitting Text with FILTERXML

Suppose that the text A:B:C:D is in cell A1 and you want to split it into four cells of a dynamic array that begins in cell A3. You could do this easily by using FILTERXML.

The first step is to convert the text into XML. This is easy to do with this formula…

A3: =”<p><c>” & SUBSTITUTE($A$1, “:”, “</c><c>” ) & “</c></p>”

…where SUBSTITUTE’s second argument contains the delimiter and where “p” stands for parent and “c” stands for child. The formula returns:

<p><c>A</c><c>B</c><c>C</c><c>D</c></p>

Or, in a structured format, the return value would look like this:

XML Showing a Parent-Child Relationship

This looks very similar to HTML. But XML describes data and HTML describes how data is displayed. And unlike HTML, we can use any tag values we want. So, rather than “p” and “c” I could have used “parent” and “child”…or even “dog” and “cat”.

To filter that XML, we wrap a FILTERXML function around the formula. The function has this syntax:

=FILTERXML( xml, xpath )

And the full formula looks like this:

A3: = FILTERXML( “<p><c>” & SUBSTITUTE( $A$1, “:”, “</c><c>” ) & “</c></p>”,
“//c” )

What’s that second XPath argument about? I’ll get to that in a second. But first, let’s complete the formula.

The formula above returns the text in cell A1 split into multiple rows in one column. To return one row of results split into multiple columns, you would wrap TRANSPOSE around it like this:

A3: = TRANSPOSE(
FILTERXML( “<p><c>” & SUBSTITUTE( $A$1, “:”, “</c><c>” ) & “</c></p>”,
“//c” )  )

And if you want to return, say, the third value from the array of values, you could, instead, wrap INDEX around the original formula, like this:

A3: = INDEX(
FILTERXML( “<p><c>” & SUBSTITUTE( $A$1, “:”, “</c><c>” ) & “</c></p>”,
“//c” ), 3 )

And in all cases, if you copy and paste any of the formulas for cell A3 from this page, you’ll need to delete the carriage return characters before Excel will accept the formulas.

Because I know little about XML, and because Microsoft’s documentation about FILTERXML provides little help about using the XPath argument, I searched for that help online. I found a long page with the title A Really, Really, Really Good Introduction to XML. I also found that W3 Schools has a free tutorial about XML, which includes an extensive XPath Tutorial.

In the W3 tutorial, I learned that the XPath expression “//book” would select all book elements no matter where they are in a document. And sure enough, the XPath “//c” selected all c elements in the formula.

If you’d like to learn more about XML, both the “Really, Really” page and the W3 tutorial would be good places to begin. On the other hand, if you just want to split text by a delimiter, you already have what you need—if your version of Excel has the FILTERXML function.