- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Tue, 09 Sep 2003 09:32:57 +0100
- To: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Copying to i18n to request help on correct application of charmod. See the two paragraphs between ****. This is a proposed draft, note I suggest additional text for concepts, and and still need additional text for syntax - [[ Dear Peter thanks for your comments concerning NFC http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283 http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225 These comments also apply to XSD datatypes derived from xsd:string and xsd:anyURI, so we will respond in full generality. (e.g. The two character string { e, NON SPACING ACUTE } is a legal xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not correspond to a legal RDF graph.) We also agree that there are XML 1.0 fragments that can be written within a an rdf:parseType="Literal" element in an XML 1.0 document that conforms to the RDF/XML syntax except that this fragment is not in NFC. However, this would not be an RDF/XML document, since there is no corresponding RDF graph. You are correct to point out that this constraint is not made explicitly in the syntax document, and this is a bug. Concepts places a similar constraint on the lexical form of all datatypes e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g. 7.2.16 http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt [[ If the rdf:datatype attribute d is given then o := typed-literal(literal-value := t.string-value, literal-datatype := d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string in Normal Form C[NFC], o := literal(literal-value := t.string-value, literal-language := e.language) ]] This text needs modifying. 1. NFC constraint in general You suggest that RDF should drop the NFC constraint completely. This would clearly solve the problems you raise. However, the RDF Core WG has endeavoured to follow charmod (http://www.w3.org/TR/charmod) as much as possible, as one of the key inputs from the I18N community. See 4.4 Responsibility for Normalization http://www.w3.org/TR/charmod/#sec-NormalizationApplication [[ [S] Specifications of text-based formats and protocols SHOULD, as part of their syntax definition, require that the text be in normalized form. ]] [[ [S] Specifications of text-based languages and protocols SHOULD define precisely the construct boundaries necessary to obtain a complete definition of full-normalization. These definitions SHOULD include at least the boundaries between markup and character data as well as entity boundaries (if the language has any include mechanism) and SHOULD include any other boundary that may create denormalization when instances of the language are processed. ]] **** The RDF Core WG has previously identified the lexical form of literals as the relevant construct, around which NFC should be required. While we have been aware of transitional issues, since the specs we build on (XML 1.0 and XSD) do not require NFC, we do not see those issues as insufficient to not migrate the RDF recommendation. It is clear that applications working with XML 1.0 and the current version of XSD datatypes may choose to be more lenient than this part of our specification, and then what they should do, is also clarified in charmod. i.e. they must not normalize. Since the recommendation is clear that these are errors, the responsibility for fixing them is clear. **** 2. Clarity of RDF Concepts document We have made the following changes to concepts: In section 5 [[ The lexical space of a datatype is a set of Unicode [UNICODE] strings. ]] to [[ The lexical space of a datatype is a set of Unicode [UNICODE] strings in Normal Form C [NFC]. ]] and in 5.1 [[ The lexical space is the set of all strings: ]] to [[ The lexical space is the set of all strings: - in Normal Form C [NFC]. ]] 3. syntax document [TBD] ]]
Received on Tuesday, 9 September 2003 05:53:02 UTC