RE: Link-6: Addressing at the sub-element level

>>If you count characters of the source content, why won't everyone count them
>>the same way? The only problem I see would be with a system that silently
>>"normalized" data somehow, and lost track of what the original data was.
>
>I think a system might quite reasonably choose normalize precomposed
>characters into composite characters or vice-versa, since there's no real
>difference in the semantics. If I've got a base character followed by a
>non-spacing character, and the combination of the two is available in
>Unicode as a precomposed character, and I've got a system that doesn't have
>the machinery to render non-spacing characters, then it might well be
>desirable for it to do the precomposition.

We could get around this by requiring that data be normlised per the
Unicode standard, and define a "word" quantum to be the result of
the boundary discovery algorithm in Unicode 2.0. Not that I think this
is a good idea though...

Received on Wednesday, 21 May 1997 08:57:23 UTC