W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > September 2003

Re: draft response to pfps re nfc

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Thu, 11 Sep 2003 09:57:48 +0100
Message-ID: <3F60390C.7060503@hplb.hpl.hp.com>
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, Pat Hayes <phayes@ihmc.us>
Cc: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org

Are there any knock on effects in semantics.

Brian


Jeremy Carroll wrote:
> 
> 
> Copying to i18n to request help on correct application of charmod. See 
> the two paragraphs between ****.
> 
> 
> This is a proposed draft, note I suggest additional text for concepts, 
> and and still need additional text for syntax -
> 
> [[
> 
> Dear Peter
> 
> thanks for your comments concerning NFC
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225
> 
> 
> These comments also apply to XSD datatypes derived from xsd:string and 
> xsd:anyURI, so we will respond in full generality.
> 
> (e.g. The two character string { e, NON SPACING ACUTE } is a legal 
> xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not 
> correspond to a legal RDF graph.)
> 
> We also agree that there are XML 1.0 fragments that can be written 
> within a an rdf:parseType="Literal" element in an XML 1.0 document that 
> conforms to the RDF/XML syntax except that this fragment is not in NFC.
> However, this would not be an RDF/XML document, since there is no 
> corresponding RDF graph.
> 
> You are correct to point out that this constraint is not made explicitly 
> in the syntax document, and this is a bug.
> 
> 
> Concepts places a similar constraint on the lexical form of all 
> datatypes e.g. xsd:string, whereas syntax suggests that there is no such 
> constraint e.g.
> 7.2.16
> http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt
> [[
> If the rdf:datatype attribute d is given then o := 
> typed-literal(literal-value := t.string-value, literal-datatype := 
> d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] 
> string in Normal Form C[NFC], o := literal(literal-value := 
> t.string-value, literal-language := e.language)
> ]]
> 
> This text needs modifying.
> 
> 
> 
> 1. NFC constraint in general
> 
> You suggest that RDF should drop the NFC constraint completely.
> This would clearly solve the problems you raise.
> 
> However, the RDF Core WG has endeavoured to follow charmod
> (http://www.w3.org/TR/charmod)
> as much as possible, as one of the key inputs from the I18N community.
> 
> See
> 4.4 Responsibility for Normalization
> http://www.w3.org/TR/charmod/#sec-NormalizationApplication
> 
> [[
> [S]  Specifications of text-based formats and protocols SHOULD, as part 
> of their syntax definition, require that the text be in normalized form.
> ]]
> [[
> [S]  Specifications of text-based languages and protocols SHOULD define 
> precisely the construct boundaries necessary to obtain a complete 
> definition of full-normalization. These definitions SHOULD include at 
> least the boundaries between markup and character data as well as entity 
> boundaries (if the language has any include mechanism) and SHOULD 
> include any other boundary that may create denormalization when 
> instances of the language are processed.
> ]]
> 
> ****
> The RDF Core WG has previously identified the lexical form of literals 
> as the relevant construct, around which NFC should be required.
> While we have been aware of transitional issues, since the specs we 
> build on (XML 1.0 and XSD) do not require NFC, we do not see those 
> issues as insufficient to not migrate the RDF recommendation.
> 
> It is clear that applications working with XML 1.0 and the current 
> version of XSD datatypes may choose to be more lenient than this part of 
> our specification, and then what they should do, is also clarified in 
> charmod. i.e. they must not normalize. Since the recommendation is clear 
> that these are errors, the responsibility for fixing them is clear.
> ****
> 
> 2. Clarity of RDF Concepts document
> 
> We have made the following changes to concepts:
> 
> In section 5
> [[
> The lexical space of a datatype is a set of Unicode [UNICODE] strings.
> ]]
> to
> [[
> The lexical space of a datatype is a set of Unicode [UNICODE] strings in 
> Normal Form C [NFC].
> ]]
> 
> and in 5.1
> [[
> The lexical space
> is the set of all strings:
> ]]
> to
> 
> [[
> The lexical space
> is the set of all strings:
> - in Normal Form C [NFC].
> ]]
> 
> 
> 
> 3. syntax document
> 
> [TBD]
> 
> 
> ]]
Received on Thursday, 11 September 2003 05:06:09 EDT

This archive was generated by hypermail pre-2.1.9 : Thursday, 11 September 2003 05:06:16 EDT