Programming Solutions

Your Source for Information

Finding Invisible Characters in TSQL String and Remove Invisible Characters with Space

by Maeenul 24. April 2010 14:32

This post shows the invisible characters that we might encounter in tsql string.

We often face problem in data matching when invisible characters are available in string. When we select the string we see something that seems to us as space. But sometimes they are not really spaces. So when we compare strings having these invisible characters, we think that they should be same but sql server does not find them equal and does not match. Often in our queries we face problems due to this reason.

It often occurs when we load data from some flat file or csv file and from some other sources in relational tables. Most often the invisible characters are newline, carriage return, line feed or tab. But I have faced a problem where the invisible character was a character whose Unicode value is 160. So first of all we will see how we can find which the invisible characters are.

We can use loop through from 1 to 255 and see the character using sql CHAR() function.

declare @loop_counter int= 1, @all_unicode_characters nvarchar(max)=''

declare @loop_counter int = 1, @all_unicode_characters nvarchar(max) = ''

 

while @loop_counter<256

begin

      set @all_unicode_characters = @all_unicode_characters + ',' +

                                    cast(@loop_counter as varchar) + '=' + char(@loop_counter)

 

      set @loop_counter = @loop_counter + 1

end

 

select stuff(@all_unicode_characters, 1, 1, '')

we will get the following output that shows us the characters together with their values.

1=, 2=, 3=, 4=, 5=, 6=, 7=, 8=, 9= , 10= , 11=, 12=, 13= , 14=, 15=, 16=, 17=, 18=, 19=, 20=, 21=, 22=, 23=, 24=, 25=, 26=, 27=, 28=, 29=, 30=‑, 31=­, 32= , 33=!, 34=", 35=#, 36=$, 37=%, 38=&, 39=', 40=(, 41=), 42=*, 43=+, 44=,, 45=-, 46=., 47=/, 48=0, 49=1, 50=2, 51=3, 52=4, 53=5, 54=6, 55=7, 56=8, 57=9, 58=:, 59=;, 60=<, 61==, 62=>, 63=?, 64=@, 65=A, 66=B, 67=C, 68=D, 69=E, 70=F, 71=G, 72=H, 73=I, 74=J, 75=K, 76=L, 77=M, 78=N, 79=O, 80=P, 81=Q, 82=R, 83=S, 84=T, 85=U, 86=V, 87=W, 88=X, 89=Y, 90=Z, 91=[, 92=\, 93=], 94=^, 95=_, 96=`, 97=a, 98=b, 99=c, 100=d, 101=e, 102=f, 103=g, 104=h, 105=i, 106=j, 107=k, 108=l, 109=m, 110=n, 111=o, 112=p, 113=q, 114=r, 115=s, 116=t, 117=u, 118=v, 119=w, 120=x, 121=y, 122=z, 123={, 124=|, 125=}, 126=~, 127=, 128=€, 129=, 130=‚, 131=ƒ, 132=„, 133=…, 134=†, 135=‡, 136=ˆ, 137=‰, 138=Š, 139=‹, 140=Œ, 141=, 142=Ž, 143=, 144=, 145=‘, 146=’, 147=“, 148=”, 149=•, 150=–, 151=—, 152=˜, 153=™, 154=š, 155=›, 156=œ, 157=, 158=ž, 159=Ÿ, 160= , 161=¡, 162=¢, 163=£, 164=¤, 165=¥, 166=¦, 167=§, 168=¨, 169=©, 170=ª, 171=«, 172=¬, 173=­, 174=®, 175=¯, 176=°, 177=±, 178=², 179=³, 180=´, 181=µ, 182=¶, 183=·, 184=¸, 185=¹, 186=º, 187=», 188=¼, 189=½, 190=¾, 191=¿, 192=À, 193=Á, 194=Â, 195=Ã, 196=Ä, 197=Å, 198=Æ, 199=Ç, 200=È, 201=É, 202=Ê, 203=Ë, 204=Ì, 205=Í, 206=Î, 207=Ï, 208=Ð, 209=Ñ, 210=Ò, 211=Ó, 212=Ô, 213=Õ, 214=Ö, 215=×, 216=Ø, 217=Ù, 218=Ú, 219=Û, 220=Ü, 221=Ý, 222=Þ, 223=ß, 224=à, 225=á, 226=â, 227=ã, 228=ä, 229=å, 230=æ, 231=ç, 232=è, 233=é, 234=ê, 235=ë, 236=ì, 237=í, 238=î, 239=ï, 240=ð, 241=ñ, 242=ò, 243=ó, 244=ô, 245=õ, 246=ö, 247=÷, 248=ø, 249=ù, 250=ú, 251=û, 252=ü, 253=ý, 254=þ, 255=ÿ

From this we see that there are a lot of characters that seem to us as space but they are not actually space. Unicode 32 is space only.

So here we see a tsql function that will take a string and return whether the string has any invisible characters other than space.

We will see here several utility functions related to this.

First we see a table valued function that returns a table containing all the characters in a string.

The output table has 2 columns: one contains the character and the other contains the Unicode value of that character.

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- =============================================

-- Author:        Maeenul

-- Create date:

-- Description:   This table - valued function returns a table containing the individual characters of a string

-- and corresponding unicode value

-- =============================================

CREATE FUNCTION AllCharactersInString

(    

      @str nvarchar(max)

)

RETURNS TABLE

AS

RETURN

(

      select substring(B.main_string, C.int_seq, 1) as character,

      Unicode(substring(B.main_string, C.int_seq, 1)) as unicode_value

      from

      ( select @str as main_string ) B,

      (

            select A.int_seq

            from

            (

               select row_number() over (order by name) as int_seq

               from sys.all_objects

            ) A

            where A.int_seq <= len(@str)

      ) C

)

GO

 

select *

from AllCharactersInString('PRESCRIPTION STATUS')

This will give us the following output:

The Following function we can use for checking whether our string contains any invisible character or not. This function uses the previous function.

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- =============================================

-- Author:        Maeenul

-- Create date:

-- Description:   This function returns 1 if the string contains invisible character other than space.

-- Returns 0 is it does not contain invisible character

-- =============================================

CREATE FUNCTION ContainsInvisibleCharacter

(

      @str nvarchar(max)

)

RETURNS int

AS

BEGIN

 

      DECLARE @Result int

 

      if exists(

            select *

            from AllCharactersInString(@str)

            where unicode_value in (1, 9, 10, 11, 12, 13, 14, 28, 29, 31, 129, 141, 143, 144, 157, 160)

      )

      select @Result = 1

      else

      select @Result = 0

 

      RETURN @Result

 

END

GO

 

select dbo.ContainsInvisibleCharacter('This string has invisible  character')

n  Returns 1

select dbo.ContainsInvisibleCharacter('This string does not have invisible    character')

n  Returns 0

Now we will see a function that we can use to remove our invisible characters with space. Afterwards we can decide whether we want to keep the spaces or want to trim. When all of our invisible characters are replaced with space, we can use tsql ltrim or rtrim functions or take other steps to fit our requirements.

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- =============================================

-- Author:        Maeenul

-- Create date:

-- Description:   This function replaces the invisible characters from a string with space (unicode 32)

-- =============================================

CREATE FUNCTION ReplaceInvisibleCharactersWithSpace

(

      @str nvarchar(max)

)

RETURNS nvarchar(max)

AS

BEGIN

     

      SELECT @str = replace(@str, char(invisible_char), ' ')

      from (

            select 1 as invisible_char

            union

            select 9

            union

            select 10

            union

            select 11

            union

            select 12

            union

            select 13

            union

            select 14

            union

            select 28

            union

            select 31

            union

            select 129

            union

            select 141

            union

            select 143

            union

            select 144

            union

            select 157

            union

            select 160

      ) A

 

      RETURN @str

 

END

GO

 

select dbo.ReplaceInvisibleCharactersWithSpace('Testing '+CHAR(10)+' and '+CHAR(160))

Tags: , , ,

Category: TSQL



Comments (11) -

  1. Good article! keep blogging Smile
    One feedback though, could you please add a blog archive, so that we can browse through the old articles by month or so?
  1. TierHeltern TierHeltern says:
  2. Just want to say what a great blog you got here!
    I've been around for quite a lot of time, but finally decided to show my appreciation of your work!

    Thumbs up, and keep it going!

    Cheers
    Christian, http://www.iwatchsouthparkonline.net" rel="nofollow">iwspo.net
  1. excellent!
  1. I usually do not leave comment on website or blog. But your program is awesome. You are a genious!!!

    Hats off to you!!!

    Keep it coming
  1. Ahmed Rahim Ahmed Rahim says:
  2. Just excellent! Thanks so much. That saved my day.
  1. Thanks dear,

    This may help me out in solving my issue. I had an issue to remove some junk character from certain fields before it gets displayed in site.

    Good efforts.
  1. Hi,

    I have a regional language which is called telugu in that one invisible character is the like square box. Using this function i am not able to remove it.

    Please help me how can i remove it..?
  1. Okay, maybe I am dense.  I see that this works well enough with a static string that is fed into the function, but how can I get it to cycle through the values of a field in one of my tables?  I need to check all instances of a note field to make sure there are no invisible characters lurking in the field.
  1. Hi Nici, I think you can have a very simple solution for this. Once you have a function that takes a string and can remove all the invisible characters from it, you can just use this function for you table column like this.

    select dbo.ReplaceInvisibleCharactersWithSpace(my_table_column_name)
    from my_table

    It should be this simple.
  1. Hi there Smile Very nice article, it helped me a lot.
    I have modified Your function like so:
    ALTER FUNCTION ContainsInvisibleCharacter  (@str nvarchar(max))
       RETURNS int
    AS
       BEGIN
         DECLARE @Result Int
         IF exists
    (SELECT
       *
    FROM
       AllCharactersInString(@str)
    WHERE
           unicode_value NOT IN (32,81,87,69,82,84,89,85,73,79,80,113,119,101,114,116,121,117,105,111,112            ,97,115,100,102,103,104,106,107,108,65,83,68,70,71,72,74,75,76
            ,122,120,99,118,98,110,109,90,88,67,86,66,78,77
            ,49,50,51,52,53,54,55,56,57,48,45,61,33,64,35,36,37,94,38,42,40,41,95,43
            ,281,243,261,347,322,380,378,263,324,280,211,260,346,321,379,377,262,323
            ,126,96,91,123,93,125,92,124,59,58,39,34,44,60,46,62,47,63))
            BEGIN SET @Result = 1
            END
         ELSE
            BEGIN SET @Result = 0
            END
          RETURN @Result
       END

    So that is is checking if there are any characters outside my list.

    No I am wondering how to modify ReplaceInvisibleCharactersWithSpace to replace all characters outside my list.
    Could You help me with that?
  1. Very helpful. Thanks a lot...

Add comment

biuquote
  • Comment
  • Preview
Loading

Alpha Tags