RE: Outstanding Issues - rdf-charmod-literals from Jeremy Carroll on 2002-02-20 (w3c-rdfcore-wg@w3.org from February 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Wed, 20 Feb 2002 13:27:40 -0000
To: "Brian McBride" <bwm@hplb.hpl.hp.com>, "RDF Core" <w3c-rdfcore-wg@w3.org>
Cc: <w3c-i18n-ig@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDGEBACDAA.jjc@hplb.hpl.hp.com>

> rdf-charmod-literals: Does the treatment of literals conform to charmod ?

> We need an owner to check this.


While I would prefer not to own this, our earlier analysis did arrive at a
conclusion.

My earlier analysis was (again) in the (still too long):


http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0378.html

[[[
[1a]
The Unicode String in an RDF Literal is normalized according
to Unicode Normalization Form C [NFC, NFC-Corrigendum], using
a framework of early uniform normalization.
]]]

This amounts to:
- all RDF syntaxes are constrained to only permit Unicode strings that are
normalized.
- So that a graph with a node labelled with a Unicode string with a
character c followed by a non-spacing cedilla character in it is not an RDF
graph.
- RDF/XML syntax includes this restriction on literals
- N-Triple syntax includes this restriction.
- implementations that read documents in non-unicode characeter sets (e.g.
many XML implementations) need to use a normalizing transcoder when
converting into Unicode.

"Early uniform normalization" is simply another way of saying that. Early
uniform normalization means that it is the original document's author's
responsiblity to not create non-normalized strings.

I could provide some test cases of ill-formed RDF/XML under this proposal.

My understanding of how this differs from requiring an RDF/XML document to
be fully normalized XML is as follows:

A] Fully normalized XML permits the sequence character c, XML comment,
non-spacing cedilla.
  This proposal prohibits that sequence.

B] Within an XML comment fully normalized XML prohibits the sequence
character c, non-spacing cedilla. This proposal does not prohibit this.


I think [A] is a defect with the current concept of fully normalized XML.
Either there should be more acknowledgement that XML documents are read in
different ways depending on quite what sort of XML document it is, or
minimally it should be required that all string values within the XPath node
set are normalized.

I think [B] could be added as an additional requirement on RDF/XML, i.e.
that the XML as XML is fully normalized as per charmod.


I do not believe that this presents implementators with too much work.

The thorough implementator will need:
- normalizing transcoders.
  These should be off-the-shelf components, since in a while all W3C specs
will require their use. (Are any available?)

- ability to detect Unicode strings that are not NFC.
  I thought I had seen some fairly simply open source code to do this.
Anyone have a pointer?



The DPH (less thorough implementor) can:
- ignore the whole issue if they are only interested in a US market, and get
reasonably interoperability.
[This, as I understand it, is one of the goals of early uniform
normalization].



Jeremy

Received on Wednesday, 20 February 2002 08:28:09 UTC