- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Wed, 17 Sep 2003 08:33:18 -0400 (EDT)
- To: jjc@hplb.hpl.hp.com
- Cc: www-rdf-comments@w3.org, dave.beckett@bristol.ac.uk
From: Jeremy Carroll <jjc@hplb.hpl.hp.com> Subject: [Fwd: draft response to pfps re nfc] Date: Wed, 17 Sep 2003 10:22:23 +0100 > > > Dear Peter > > thanks for your comments concerning NFC > http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0283 > http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0225 > > > These comments also apply to XSD datatypes derived from xsd:string and > xsd:anyURI, so we will respond in full generality. > > (e.g. The two character string { e, NON SPACING ACUTE } is a legal > xsd:string that can be 'written' in RDF/XML (in XML 1.0) but does not > correspond to a legal RDF graph.) > > We agree that there are XML 1.0 fragments that can be written within a > an rdf:parseType="Literal" element in an XML 1.0 document that conforms to > the RDF/XML syntax except that this fragment is not in NFC. > However, this would not be an RDF/XML document, since there is no > corresponding RDF graph. > You are correct to point out that this constraint is not made explicitly in > the syntax document, and this is a bug. The bug appears to me to be somewhat different. RDF/XML Syntax has a grammar for RDF/XML documents that does not provide a correct mapping to RDF graphs. This appears to introduce problems in the conformance section of RDF/XML Syntax. > Concepts places a similar constraint on the lexical form of all datatypes > e.g. xsd:string, whereas syntax suggests that there is no such constraint e.g. > 7.2.16 > http://www.w3.org/TR/rdf-syntax-grammar/#literalPropertyElt > [[ > If the rdf:datatype attribute d is given then o := > typed-literal(literal-value := t.string-value, literal-datatype := > d.string-value) otherwise t.string-value MUST be a Unicode[UNICODE] string > in Normal Form C[NFC], o := literal(literal-value := t.string-value, > literal-language := e.language) > ]] > > This text needs modifying. > > > > 1. NFC constraint in general > > You suggest that RDF should drop the NFC constraint completely. > This would clearly solve the problems you raise. > > However, the RDF Core WG has endeavoured to follow charmod > (http://www.w3.org/TR/charmod) > as much as possible, as one of the key inputs from the I18N community. > > See > 4.4 Responsibility for Normalization > http://www.w3.org/TR/charmod/#sec-NormalizationApplication > > [[ > [S] Specifications of text-based formats and protocols SHOULD, as part of > their syntax definition, require that the text be in normalized form. > ]] > [[ > [S] Specifications of text-based languages and protocols SHOULD define > precisely the construct boundaries necessary to obtain a complete > definition of full-normalization. These definitions SHOULD include at least > the boundaries between markup and character data as well as entity > boundaries (if the language has any include mechanism) and SHOULD include > any other boundary that may create denormalization when instances of the > language are processed. > ]] > > The RDF Core WG has previously identified the lexical form of literals as > the relevant construct, around which NFC should be required. > While we have been aware of transitional issues, since the specs we build > on (XML 1.0 and XSD) do not require NFC, we do not see those issues as > sufficient to not migrate the RDF recommendation. > > It is clear that applications working with XML 1.0 and the current version > of XSD datatypes may choose to be more lenient than this part of our > specification, and then what they should do, is also clarified in charmod. > i.e. they must not normalize. Since the RDF documents will be clear that > these are errors, the responsibility for fixing them is clear. > > > 2. Clarity of RDF Concepts document > > We will make the following changes to concepts: > > In section 5 > [[ > The lexical space of a datatype is a set of Unicode [UNICODE] strings. > ]] > to > [[ > The lexical space of a datatype is a set of Unicode [UNICODE] strings in > Normal Form C [NFC]. > ]] > > and in 5.1 > [[ > The lexical space > is the set of all strings: > ]] > to > > [[ > The lexical space > is the set of all strings: > - in Normal Form C [NFC]. > ]] > > > > 3. syntax document > > The editor will make appropriate changes in due course. > > Please respond, copying www-rdf-comments, indicating whether this response > is satisfactory or not. As there are no proposed changes to the syntax document, this response is incomplete. > thanks again > > Jeremy Carroll Peter F. Patel-Schneider
Received on Wednesday, 17 September 2003 08:33:34 UTC