Re: Unicode Normalization

On Tue, 3 Feb 2009, Robert J Burns wrote:

> Unicode depends on two canonically equivalent but byte-wise different strings 
> matching. We cannot hope to eliminate such strings from the internet, so this 
> is something that implementations have to deal with. I think most everyone 
> here is on the same page on that, but I want you to understand too.

well, you have convinced me :)

since programming and all modern software based on abstract data types
and structural equivalence w/o knowledge of particular data semantic, your
"normalization" is worthless.
there's no way to detect all points in project when integer expression 
semantically turns into codepoint and when vector of codepoints 
semantically turns into "unicode text", making it uncomparable with 
peers on any mutation.

looks like this topic is just seekeing workaround for keyboard/IME 
developer bugs.


>
> Take care,
> Rob
>

Received on Wednesday, 4 February 2009 14:48:43 UTC