RE: Link-6: Addressing at the sub-element level from Gavin Nicol on 1997-05-21 (w3c-sgml-wg@w3.org from May 1997)

From: Gavin Nicol <gtn@eps.inso.com>
Date: Wed, 21 May 1997 08:55:55 -0400
To: jjc@jclark.com
CC: w3c-sgml-wg@w3.org
Message-Id: <199705211255.IAA01945@nathaniel.ebt>

>>If you count characters of the source content, why won't everyone count them
>>the same way? The only problem I see would be with a system that silently
>>"normalized" data somehow, and lost track of what the original data was.
>
>I think a system might quite reasonably choose normalize precomposed
>characters into composite characters or vice-versa, since there's no real
>difference in the semantics. If I've got a base character followed by a
>non-spacing character, and the combination of the two is available in
>Unicode as a precomposed character, and I've got a system that doesn't have
>the machinery to render non-spacing characters, then it might well be
>desirable for it to do the precomposition.

We could get around this by requiring that data be normlised per the
Unicode standard, and define a "word" quantum to be the result of
the boundary discovery algorithm in Unicode 2.0. Not that I think this
is a good idea though...

Received on Wednesday, 21 May 1997 08:57:23 UTC