- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Wed, 17 Sep 2003 08:33:18 -0400 (EDT)
- To: jjc@hplb.hpl.hp.com
- Cc: www-rdf-comments@w3.org, dave.beckett@bristol.ac.uk
From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Subject: [Fwd: draft response to pfps re nfc]
Date: Wed, 17 Sep 2003 10:22:23 +0100
>
>
> Dear Peter
>
> thanks for your comments concerning NFC
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225
>
>
> These comments also apply to XSD datatypes derived from xsd:string and
> xsd:anyURI, so we will respond in full generality.
>
> (e.g. The two character string { e, NON SPACING ACUTE } is a legal
> xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not
> correspond to a legal RDF graph.)
>
> We agree that there are XML 1.0 fragments that can be written within a
> an rdf:parseType="Literal" element in an XML 1.0 document that conforms to
> the RDF/XML syntax except that this fragment is not in NFC.
> However, this would not be an RDF/XML document, since there is no
> corresponding RDF graph.
> You are correct to point out that this constraint is not made explicitly in
> the syntax document, and this is a bug.
The bug appears to me to be somewhat different. RDF/XML Syntax has a
grammar for RDF/XML documents that does not provide a correct mapping to
RDF graphs. This appears to introduce problems in the conformance section
of RDF/XML Syntax.
> Concepts places a similar constraint on the lexical form of all datatypes
> e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g.
> 7.2.16
> http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt
> [[
> If the rdf:datatype attribute d is given then o :=
> typed-literal(literal-value := t.string-value, literal-datatype :=
> d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string
> in Normal Form C[NFC], o := literal(literal-value := t.string-value,
> literal-language := e.language)
> ]]
>
> This text needs modifying.
>
>
>
> 1. NFC constraint in general
>
> You suggest that RDF should drop the NFC constraint completely.
> This would clearly solve the problems you raise.
>
> However, the RDF Core WG has endeavoured to follow charmod
> (http://www.w3.org/TR/charmod)
> as much as possible, as one of the key inputs from the I18N community.
>
> See
> 4.4 Responsibility for Normalization
> http://www.w3.org/TR/charmod/#sec-NormalizationApplication
>
> [[
> [S] Specifications of text-based formats and protocols SHOULD, as part of
> their syntax definition, require that the text be in normalized form.
> ]]
> [[
> [S] Specifications of text-based languages and protocols SHOULD define
> precisely the construct boundaries necessary to obtain a complete
> definition of full-normalization. These definitions SHOULD include at least
> the boundaries between markup and character data as well as entity
> boundaries (if the language has any include mechanism) and SHOULD include
> any other boundary that may create denormalization when instances of the
> language are processed.
> ]]
>
> The RDF Core WG has previously identified the lexical form of literals as
> the relevant construct, around which NFC should be required.
> While we have been aware of transitional issues, since the specs we build
> on (XML 1.0 and XSD) do not require NFC, we do not see those issues as
> sufficient to not migrate the RDF recommendation.
>
> It is clear that applications working with XML 1.0 and the current version
> of XSD datatypes may choose to be more lenient than this part of our
> specification, and then what they should do, is also clarified in charmod.
> i.e. they must not normalize. Since the RDF documents will be clear that
> these are errors, the responsibility for fixing them is clear.
>
>
> 2. Clarity of RDF Concepts document
>
> We will make the following changes to concepts:
>
> In section 5
> [[
> The lexical space of a datatype is a set of Unicode [UNICODE] strings.
> ]]
> to
> [[
> The lexical space of a datatype is a set of Unicode [UNICODE] strings in
> Normal Form C [NFC].
> ]]
>
> and in 5.1
> [[
> The lexical space
> is the set of all strings:
> ]]
> to
>
> [[
> The lexical space
> is the set of all strings:
> - in Normal Form C [NFC].
> ]]
>
>
>
> 3. syntax document
>
> The editor will make appropriate changes in due course.
>
> Please respond, copying www-rdf-comments, indicating whether this response
> is satisfactory or not.
As there are no proposed changes to the syntax document, this response is
incomplete.
> thanks again
>
> Jeremy Carroll
Peter F. Patel-Schneider
Received on Wednesday, 17 September 2003 08:33:34 UTC