Excel’s CLEAN Function is More Powerful Than You Think

Excel's CLEAN function does more than Excel's help topic says. In fact, it cleans all but two nonprinting characters in the ASCII and Unicode character sets. Here's how to deal with nonprinting characters.

7598

Excel's CLEAN function does more than Excel's help topic says. In fact, it cleans all but two nonprinting characters in the ASCII and Unicode character sets. Here's how to deal with nonprinting characters.The Excel 2016 help file for the CLEAN function provides more information than earlier versions:

“Removes all nonprintable characters from text. Use CLEAN on text imported from other applications that contains characters that may not print with your operating system. For example, you can use CLEAN to remove some low-level computer code that is frequently at the beginning and end of data files and cannot be printed.

“Important:  The CLEAN function was designed to remove the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31) from text. In the Unicode character set … there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters.”

Actually, CLEAN is more powerful than this information describes. In fact, it might be more powerful than you want.

To prove this for yourself, start by using Excel’s Data Series command to create a column of values from 1 to 255.  To do so, first enter the value 1 in cell A1; hold down Ctrl and Shift and then press the down arrow to select the rest of column A with cell A1 as the active cell.

Next launch the Series dialog. To do so, in the Home, Editing group, choose Fill, Series; or else press Alt, H, F, I, S. Then, in the Series dialog, specify a Step value of 1 and a Stop value of 255. Then press OK.

Enter a formula to return the ASCII character associated with the value in cell A1:

B1:    =CHAR(A1)

Now enter a formula to clean the text in cell B1 and to return the length of the cleaned text. That is, if CLEAN removes the character, the value in column C will be zero; otherwise it will be 1.

C1:    =LEN(CLEAN(B1))

Similarly, enter two more formulas to do the same thing for the Unicode character set:

D1:    =UNICHAR(A1)
E1:    =LEN(CLEAN(D1))

Now copy the range B1:E1 down the column to row 255. And then compare the ASCII and Unicode columns.

When you do so, you’ll see that the CLEAN function certainly does clean characters 1-31 in both character sets. But from there, CLEAN often cleans different ASCII and Unicode characters, and often, characters that are different from what Excel’s documentation describes.

For example, CLEAN actually cleans all Unicode characters from 128 through 159, which is a good thing, because they’re all non-printing.

In fact, if the CHAR or UNICHAR functions return a non-printing character, CLEAN removes it—with two exceptions: CHAR(127) and CHAR(160).

To remove CHAR(127), you can use one of these formulas (assuming your text is in cells B127 and D127):

=SUBSTITUTE(B127,CHAR(127),””)
=SUBSTITUTE(D127,UNICHAR(127),””)

CHAR(160) is the non-breaking-space character, which is used extensively on the Web. To handle this character, we don’t want to remove it. Instead, we want to replace it with a normal space character and then remove any extra spaces that might accompany it.

Therefore, assuming your original ASCII or Unicode text is in cell G1, we can use one of these formulas:

=TRIM(SUBSTITUTE(G1,CHAR(160),” “))
=TRIM(SUBSTITUTE(G1,UNICHAR(160),” “))

Also, of course, you can use Excel’s Find and Replace command to replace either of these characters. To do so, you need to generate the character so you can use it in the dialog. One way to do that is first to type…

=CHAR(160)
…or…
=UNICHAR(160)

…in any cell; and then press F9 without pressing Enter.

When you do so, Excel changes the formula to a value in your formula bar and selects that value. Now press Ctrl+C to copy that selection. Then press ESC to return to the Ready mode.

Launch Excel’s Find and Replace dialog; paste your copied character into the Find What edit box; enter a space in the Replace With edit box; then choose Replace All.

Finally, to replace multiple spaces with one space, use Find and Replace to replace two spaces with one space. Run this command several times until Excel tells you that it can’t find multiple spaces.