Re: mismatch between test and RDF/XML syntax

On Thu, 07 Aug 2003 07:25:55 -0400 (EDT)
"Peter F. Patel-Schneider" <pfps@research.bell-labs.com> wrote:

> From: Dave Beckett <dave.beckett@bristol.ac.uk>
> Subject: Re: mismatch between test and RDF/XML syntax
> Date: Thu, 7 Aug 2003 11:47:16 +0100
> 
> > On Tue, 05 Aug 2003 14:05:09 -0400 (EDT)
> > "Peter F. Patel-Schneider" <pfps@research.bell-labs.com> wrote:
> >
> > > The RDF/XML Syntax Specification (Revised), draft of 4 August 2003 appears
> > > to allow strings that are not in Normal Form C.  This is counter to test
> > > rdf-charmod-literals/error001.rdf
> > > 
> > > 
> > > The relevant productions for this example are
> > > 
> > > 7.2.14 propertyElt		which parses <eg:Creater eg:named="..."/>
> > > 7.2.21 emptyPropertyElt		which parses <eg:Creater eg:named="..."/>
> > > 7.2.25 propertyAttr		which parses eg:named="..."
> > > 
> > > the last of which allows anyString (defined as ``Any string.'') as the
> > > value of the attribute.
> > 
> > Indeed.  I think this would best done with a note next to the actions
> > where triples with literal values are added to the graph.  This is
> > when literal() event is used in nodeElement (now section 7.2.11 in
> > the editor's draft), literalPropertyElt (7.2.16) emptyPropertyElt
> > (7.2.21).
> 
> I was trying to imagine under what circumstances the empty string would not
> be allowed and could not come up with any.  I think that the caution is
> thus not needed for emptyPropertyElt.

That's correct, and It isn't used there but the later case in the action
of emptyPropertyElt where literals are made from the property attribute
values.

> > For each of these triples additions I will add a note of the form
> > 
> >   The string <em>t</em>.string-value MUST be a Unicode [UNICODE] String
> >   in Normal Form C (NFC) [NFC].
> > 
> > before the literal() term is used.  I will also add the two new normative references:
> > 
> > [UNICODE]
> >     The Unicode Standard, Version 3, The Unicode Consortium,
> >     Addison-Wesley, 2000. ISBN 0-201-61633-5, as updated from time to
> >     time by the publication of new versions. (See
> >     http://www.unicode.org/unicode/standard/versions/ for the latest
> >     version and additional information on versions of the standard
> >     and of the Unicode Character Database).
> > 
> >   [NFC] 
> >     Unicode Normalization Forms, Unicode Standard Annex #15, Mark
> >     Davis, Martin Duerst. (See
> >     http://www.unicode.org/unicode/reports/tr15/ for the latest
> >     version).
> > 
> > Dave
> 
> I was actually hoping that is was the Syntax Specification that was
> correct, and am disappointed that this is not the case.   I expect that
> there are very many XML documents that cannot be handled in RDF because of
> this requirement for NFC.  What does this do to the XML-scraping use case
> for RDF?  If this requirement sticks, I expect much confusion, particularly
> as there is no mention of it in the primer.

The use of NFC for literals was with agreement with the I18N group
as best practice.  If you are scraping XML you better know what you
have (i.e. check, validate it) and NFC would be a good idea there too.

All these issues are discussed in extensive detail in the
  Character Model for the World Wide Web 1.0
  http://www.w3.org/TR/charmod/
although I understand there are some changes after that, such as now
not recommending NFC for URIs in XML.

It wasn't mentioned in syntax by accident since more of the XML
literal processing was moved from RDF concepts.

> It also appears to me that there is an inconsistency with the treatment of
> XML Literals, with the direct use of o.string-value being inconsistent with
> the last sentence of 7.2.17.  Further, Exclusive XML Canonicalization
> results in a sequence of octets, which are probably not allowed as lexical
> forms of RDF literals.

I've already drafted words to fix that - yet to be approved by the WG -
replacing the 7.2.17 content.  The lexical form in the graph is intended
to be a Unicode string, not a sequence of octets, if I am following
correctly.

Dave

Received on Thursday, 7 August 2003 07:42:29 UTC