- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 16 Sep 2003 18:37:22 -0400
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Hello Jeremy, Sorry to be late with my reply; as I wrote in another mail, I was traveling. At 09:32 03/09/09 +0100, Jeremy Carroll wrote: >Copying to i18n to request help on correct application of charmod. See the >two paragraphs between ****. Are these two paras supposed to go into a spec, or just serve as the official answer? For the second, they seem quite appropriate to me. Regards, Martin. >This is a proposed draft, note I suggest additional text for concepts, and >and still need additional text for syntax - > >[[ > >Dear Peter > >thanks for your comments concerning NFC >http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283 >http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225 > > >These comments also apply to XSD datatypes derived from xsd:string and >xsd:anyURI, so we will respond in full generality. > >(e.g. The two character string { e, NON SPACING ACUTE } is a legal >xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not >correspond to a legal RDF graph.) > >We also agree that there are XML 1.0 fragments that can be written within >a an rdf:parseType="Literal" element in an XML 1.0 document that conforms >to the RDF/XML syntax except that this fragment is not in NFC. >However, this would not be an RDF/XML document, since there is no >corresponding RDF graph. > >You are correct to point out that this constraint is not made explicitly >in the syntax document, and this is a bug. > > >Concepts places a similar constraint on the lexical form of all datatypes >e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g. >7.2.16 >http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt >[[ >If the rdf:datatype attribute d is given then o := >typed-literal(literal-value := t.string-value, literal-datatype := >d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string >in Normal Form C[NFC], o := literal(literal-value := t.string-value, >literal-language := e.language) >]] > >This text needs modifying. > > > >1. NFC constraint in general > >You suggest that RDF should drop the NFC constraint completely. >This would clearly solve the problems you raise. > >However, the RDF Core WG has endeavoured to follow charmod >(http://www.w3.org/TR/charmod) >as much as possible, as one of the key inputs from the I18N community. > >See >4.4 Responsibility for Normalization >http://www.w3.org/TR/charmod/#sec-NormalizationApplication > >[[ >[S] Specifications of text-based formats and protocols SHOULD, as part of >their syntax definition, require that the text be in normalized form. >]] >[[ >[S] Specifications of text-based languages and protocols SHOULD define >precisely the construct boundaries necessary to obtain a complete >definition of full-normalization. These definitions SHOULD include at >least the boundaries between markup and character data as well as entity >boundaries (if the language has any include mechanism) and SHOULD include >any other boundary that may create denormalization when instances of the >language are processed. >]] > >**** >The RDF Core WG has previously identified the lexical form of literals as >the relevant construct, around which NFC should be required. >While we have been aware of transitional issues, since the specs we build >on (XML 1.0 and XSD) do not require NFC, we do not see those issues as >insufficient to not migrate the RDF recommendation. > >It is clear that applications working with XML 1.0 and the current version >of XSD datatypes may choose to be more lenient than this part of our >specification, and then what they should do, is also clarified in charmod. >i.e. they must not normalize. Since the recommendation is clear that these >are errors, the responsibility for fixing them is clear. >**** > >2. Clarity of RDF Concepts document > >We have made the following changes to concepts: > >In section 5 >[[ >The lexical space of a datatype is a set of Unicode [UNICODE] strings. >]] >to >[[ >The lexical space of a datatype is a set of Unicode [UNICODE] strings in >Normal Form C [NFC]. >]] > >and in 5.1 >[[ >The lexical space >is the set of all strings: >]] >to > >[[ >The lexical space >is the set of all strings: >- in Normal Form C [NFC]. >]] > > > >3. syntax document > >[TBD] > > >]]
Received on Tuesday, 16 September 2003 19:13:57 UTC