Clarification of I18N issues (was RE: Weekly Call for Agenda Items)

> I propose that:
>
> - The Unicode strings within RDF literals are required to be in NFC.
> - We note that literals whose unicode strings start with a combining
> character may not be serializable in an XML document that conforms with
> forthcoming Character Model Recommendations.
> - We include a test case of such a literal as legal, to be reviewed if
> Charmod reaches rec before we do.
>


In talking with Dave, it is clear I have omitted some discussion.
Particularly about:
- early uniform normalization
- normalizing transcoders, SHOULD language

I note that in M&S para 219:

http://lists.w3.org/Archives/Public/www-archive/2001Jun/att-0021/00-part#219
[[[
Note: The W3C I18N WG is working on a definition for string identity
matching. This definition will most probably be based on canonical
equivalences according to the Unicode standard and on the principle of early
uniform normalization. Users of RDF should not rely on any applications
matching using the canonical equivalents, but should try to make sure that
their data is in the normalized form according to the upcoming definitions.
]]]

Early uniform normalization is now clear from charmod.

Taking non-normal strings and making them NFC is the responsibility of the
first Unicode component in the pipeline, latter components should reject
stuff that is not NFC.

Thus for a UTF-8 or UTF-16 RDF/XML document, or an N-triple document, it is
the responsibility of the document author.

For a foobar character set RDF/XML document it is the responsibility of the
transcoder that converts into Unicode. A transcoder that meets that
responsibility is called a normalizing transcoder. The existence of
sufficient number of such transcoders should in my opinion be an exit
criteria for charmod from CR. It should not be an exit criteria for RDF from
CR.

Hence I feel happier with SHOULD language for that part.

I'll send another message with proposed text.

Dave also expressed worry about the code footprint required for NFC
checking.

A small footprint RDF/XML implementation should in my view not implement
this (unless it fits easily in the available space). It could expect UTF-8
input, hence the responsibility is the document author's. The RDF spec does
not divide up responsibilities and we can remain silent on this case.

Jeremy

Received on Thursday, 14 March 2002 09:04:30 UTC