[Fwd: draft response to pfps re nfc] from Jeremy Carroll on 2003-09-17 (www-rdf-comments@w3.org from July to September 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Wed, 17 Sep 2003 10:22:23 +0100
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, Dave Beckett <dave.beckett@bristol.ac.uk>
Message-ID: <3F6827CF.90502@hplb.hpl.hp.com>

Dear Peter

thanks for your comments concerning NFC
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225


These comments also apply to XSD datatypes derived from xsd:string and
xsd:anyURI, so we will respond in full generality.

(e.g. The two character string { e, NON SPACING ACUTE } is a legal
xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not
correspond to a legal RDF graph.)

We agree that there are XML 1.0 fragments that can be written within a
an rdf:parseType="Literal" element in an XML 1.0 document that conforms to
the RDF/XML syntax except that this fragment is not in NFC.
However, this would not be an RDF/XML document, since there is no
corresponding RDF graph.

You are correct to point out that this constraint is not made explicitly in
the syntax document, and this is a bug.


Concepts places a similar constraint on the lexical form of all datatypes
e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g.
7.2.16
http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt
[[
If the rdf:datatype attribute d is given then o :=
typed-literal(literal-value := t.string-value, literal-datatype :=
d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string
in Normal Form C[NFC], o := literal(literal-value := t.string-value,
literal-language := e.language)
]]

This text needs modifying.



1. NFC constraint in general

You suggest that RDF should drop the NFC constraint completely.
This would clearly solve the problems you raise.

However, the RDF Core WG has endeavoured to follow charmod
(http://www.w3.org/TR/charmod)
as much as possible, as one of the key inputs from the I18N community.

See
4.4 Responsibility for Normalization
http://www.w3.org/TR/charmod/#sec-NormalizationApplication

[[
[S]  Specifications of text-based formats and protocols SHOULD, as part of
their syntax definition, require that the text be in normalized form.
]]
[[
[S]  Specifications of text-based languages and protocols SHOULD define
precisely the construct boundaries necessary to obtain a complete
definition of full-normalization. These definitions SHOULD include at least
the boundaries between markup and character data as well as entity
boundaries (if the language has any include mechanism) and SHOULD include
any other boundary that may create denormalization when instances of the
language are processed.
]]

The RDF Core WG has previously identified the lexical form of literals as
the relevant construct, around which NFC should be required.
While we have been aware of transitional issues, since the specs we build
on (XML 1.0 and XSD) do not require NFC, we do not see those issues as
sufficient to not migrate the RDF recommendation.

It is clear that applications working with XML 1.0 and the current version
of XSD datatypes may choose to be more lenient than this part of our
specification, and then what they should do, is also clarified in charmod.
i.e. they must not normalize. Since the RDF documents will be clear that 
these are errors, the responsibility for fixing them is clear.


2. Clarity of RDF Concepts document

We will make the following changes to concepts:

In section 5
[[
The lexical space of a datatype is a set of Unicode [UNICODE] strings.
]]
to
[[
The lexical space of a datatype is a set of Unicode [UNICODE] strings in
Normal Form C [NFC].
]]

and in 5.1
[[
The lexical space
is the set of all strings:
]]
to

[[
The lexical space
is the set of all strings:
- in Normal Form C [NFC].
]]



3. syntax document

The editor will make appropriate changes in due course.

Please respond, copying www-rdf-comments, indicating whether this response 
is satisfactory or not.

thanks again

Jeremy Carroll

Received on Wednesday, 17 September 2003 05:33:48 UTC