Re: XML literals

XML literals
  ----- Original Message ----- 
  From: ext pat hayes 
  To: Brian_McBride ; Jeremy Carroll ; Graham Klyne 
  Cc: w3c-rdfcore-wg@w3.org ; Peter F. Patel-Schneider 
  Sent: 01 August, 2003 06:41
  Subject: XML literals


  Gentlemen:


  I am completely sick of all these debates about XML literals. 

Join the club ;-)
  Allow me to suggest a possible solution, along the lines suggested by Peter, which will serve to resolve them without making any substantial changes to the current RDF design and to everyone's general satisfaction.  This is a wording change to the Concepts document; I do not believe it amounts to any real change in our current design, and may be easier to follow.


  1. Concepts section 5.1 modified as follows (change starts at ***)
  .....


  Such content is indicated in an RDF graph using a typed literal  whose datatype is a special built-in datatype rdf:XMLLiteral ,  defined as follows.
  A URI reference for identifying this datatype
  is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral .
  The lexical space is the set of all strings which:
  are well-balanced, self-contained XML data [ XML ];
  correspond to exclusive Canonical XML (with comments, with empty InclusiveNamespaces PrefixList )[XML-XC14N] ;
  when embedded between an arbitrary XML start tag and an end tag form a document conforming to XML Namespaces [XML-NS]
  ***

Fine so far (i.e. the unchanged bit ;-)

  The value space is some set of entities, called XML values, which is:
  disjoint from the lexical space

OK.
  disjoint from the value space of any XML schema datatype (refer XSD)

Debatable, and we should be careful about making this claim. Better to say nothing
than something that may end up biting us later.

  disjoint from the set of Unicode character strings (refer Unicode)

Maybe.  

  in 1:1 correspondence with the lexical space.
Right.
  The exact nature of XML values is not specified.

No. This bothers me. Alot.

It is our responsibility to define what the values of XML Literals are.
It's *our* datatype, and no'one else should have to define it,  or guess.

Taking this position is irresponsible, at best.

We can either take the present position, whereby XML literals have
lexical forms constituting canonical XML fragments encoded as
Unicode strings and values constituting the lexical form in UTF-8.

Or, if that bothers I18N, we could take a position whereby XML literals 
have  lexical forms constituting canonical XML fragments encoded as
Unicode strings and values corresponding to infosets.

This latter option has always struck me as the correct solution, as
after all, that's really what we're talking about, XML documents, 
not just uninterpreted sequences of characters, and is not the
very point of the Infoset spec to provide a consistent interpretation
of sequences of characters in terms of XML?

I've never understood the opposition to having a value space
consisting of infosets. I wish someone would tell me what significant
problem or issue I'm missing...

--

At the very least, we *must* define what the value space of rdfs:XMLLiteral
is.  We can't simply cop out.

Patrick

Received on Friday, 1 August 2003 03:07:27 UTC