draft response to pfps re nfc from Jeremy Carroll on 2003-09-09 (w3c-rdfcore-wg@w3.org from September 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 09 Sep 2003 09:32:57 +0100
To: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Message-ID: <3F5D9039.1040203@hplb.hpl.hp.com>
Copying to i18n to request help on correct application of charmod. See the 
two paragraphs between ****.


This is a proposed draft, note I suggest additional text for concepts, and 
and still need additional text for syntax -

[[

Dear Peter

thanks for your comments concerning NFC
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225


These comments also apply to XSD datatypes derived from xsd:string and 
xsd:anyURI, so we will respond in full generality.

(e.g. The two character string { e, NON SPACING ACUTE } is a legal 
xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not 
correspond to a legal RDF graph.)

We also agree that there are XML 1.0 fragments that can be written within a 
an rdf:parseType="Literal" element in an XML 1.0 document that conforms to 
the RDF/XML syntax except that this fragment is not in NFC.
However, this would not be an RDF/XML document, since there is no 
corresponding RDF graph.

You are correct to point out that this constraint is not made explicitly in 
the syntax document, and this is a bug.


Concepts places a similar constraint on the lexical form of all datatypes 
e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g.
7.2.16
http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt
[[
If the rdf:datatype attribute d is given then o := 
typed-literal(literal-value := t.string-value, literal-datatype := 
d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string 
in Normal Form C[NFC], o := literal(literal-value := t.string-value, 
literal-language := e.language)
]]

This text needs modifying.



1. NFC constraint in general

You suggest that RDF should drop the NFC constraint completely.
This would clearly solve the problems you raise.

However, the RDF Core WG has endeavoured to follow charmod
(http://www.w3.org/TR/charmod)
as much as possible, as one of the key inputs from the I18N community.

See
4.4 Responsibility for Normalization
http://www.w3.org/TR/charmod/#sec-NormalizationApplication

[[
[S]  Specifications of text-based formats and protocols SHOULD, as part of 
their syntax definition, require that the text be in normalized form.
]]
[[
[S]  Specifications of text-based languages and protocols SHOULD define 
precisely the construct boundaries necessary to obtain a complete 
definition of full-normalization. These definitions SHOULD include at least 
the boundaries between markup and character data as well as entity 
boundaries (if the language has any include mechanism) and SHOULD include 
any other boundary that may create denormalization when instances of the 
language are processed.
]]

****
The RDF Core WG has previously identified the lexical form of literals as 
the relevant construct, around which NFC should be required.
While we have been aware of transitional issues, since the specs we build 
on (XML 1.0 and XSD) do not require NFC, we do not see those issues as 
insufficient to not migrate the RDF recommendation.

It is clear that applications working with XML 1.0 and the current version 
of XSD datatypes may choose to be more lenient than this part of our 
specification, and then what they should do, is also clarified in charmod. 
i.e. they must not normalize. Since the recommendation is clear that these 
are errors, the responsibility for fixing them is clear.
****

2. Clarity of RDF Concepts document

We have made the following changes to concepts:

In section 5
[[
The lexical space of a datatype is a set of Unicode [UNICODE] strings.
]]
to
[[
The lexical space of a datatype is a set of Unicode [UNICODE] strings in 
Normal Form C [NFC].
]]

and in 5.1
[[
The lexical space
is the set of all strings:
]]
to

[[
The lexical space
is the set of all strings:
- in Normal Form C [NFC].
]]



3. syntax document

[TBD]


]]
Received on Tuesday, 9 September 2003 05:53:02 UTC