- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Tue, 09 Sep 2003 09:32:57 +0100
- To: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Copying to i18n to request help on correct application of charmod. See the
two paragraphs between ****.
This is a proposed draft, note I suggest additional text for concepts, and
and still need additional text for syntax -
[[
Dear Peter
thanks for your comments concerning NFC
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225
These comments also apply to XSD datatypes derived from xsd:string and
xsd:anyURI, so we will respond in full generality.
(e.g. The two character string { e, NON SPACING ACUTE } is a legal
xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not
correspond to a legal RDF graph.)
We also agree that there are XML 1.0 fragments that can be written within a
an rdf:parseType="Literal" element in an XML 1.0 document that conforms to
the RDF/XML syntax except that this fragment is not in NFC.
However, this would not be an RDF/XML document, since there is no
corresponding RDF graph.
You are correct to point out that this constraint is not made explicitly in
the syntax document, and this is a bug.
Concepts places a similar constraint on the lexical form of all datatypes
e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g.
7.2.16
http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt
[[
If the rdf:datatype attribute d is given then o :=
typed-literal(literal-value := t.string-value, literal-datatype :=
d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string
in Normal Form C[NFC], o := literal(literal-value := t.string-value,
literal-language := e.language)
]]
This text needs modifying.
1. NFC constraint in general
You suggest that RDF should drop the NFC constraint completely.
This would clearly solve the problems you raise.
However, the RDF Core WG has endeavoured to follow charmod
(http://www.w3.org/TR/charmod)
as much as possible, as one of the key inputs from the I18N community.
See
4.4 Responsibility for Normalization
http://www.w3.org/TR/charmod/#sec-NormalizationApplication
[[
[S] Specifications of text-based formats and protocols SHOULD, as part of
their syntax definition, require that the text be in normalized form.
]]
[[
[S] Specifications of text-based languages and protocols SHOULD define
precisely the construct boundaries necessary to obtain a complete
definition of full-normalization. These definitions SHOULD include at least
the boundaries between markup and character data as well as entity
boundaries (if the language has any include mechanism) and SHOULD include
any other boundary that may create denormalization when instances of the
language are processed.
]]
****
The RDF Core WG has previously identified the lexical form of literals as
the relevant construct, around which NFC should be required.
While we have been aware of transitional issues, since the specs we build
on (XML 1.0 and XSD) do not require NFC, we do not see those issues as
insufficient to not migrate the RDF recommendation.
It is clear that applications working with XML 1.0 and the current version
of XSD datatypes may choose to be more lenient than this part of our
specification, and then what they should do, is also clarified in charmod.
i.e. they must not normalize. Since the recommendation is clear that these
are errors, the responsibility for fixing them is clear.
****
2. Clarity of RDF Concepts document
We have made the following changes to concepts:
In section 5
[[
The lexical space of a datatype is a set of Unicode [UNICODE] strings.
]]
to
[[
The lexical space of a datatype is a set of Unicode [UNICODE] strings in
Normal Form C [NFC].
]]
and in 5.1
[[
The lexical space
is the set of all strings:
]]
to
[[
The lexical space
is the set of all strings:
- in Normal Form C [NFC].
]]
3. syntax document
[TBD]
]]
Received on Tuesday, 9 September 2003 05:53:02 UTC