Re: XML literals from pat hayes on 2003-08-01 (w3c-rdfcore-wg@w3.org from August 2003)

From: pat hayes <phayes@ihmc.us>
Date: Fri, 1 Aug 2003 15:34:30 -0500
To: "Patrick Stickler" <patrick.stickler@nokia.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p06001a03bb50734db262@[10.0.100.23]>
>
>
>----- Original Message -----
>From: <mailto:phayes@ihmc.us>ext pat hayes
>To: <mailto:bwm@hplb.hpl.hp.com>Brian_McBride ; 
><mailto:jjc@hpl.hp.com>Jeremy Carroll ; 
><mailto:gk@ninebynine.org>Graham Klyne
>Cc: <mailto:w3c-rdfcore-wg@w3.org>w3c-rdfcore-wg@w3.org ; 
><mailto:pfps@research.bell-labs.com>Peter F. Patel-Schneider
>Sent: 01 August, 2003 06:41
>Subject: XML literals
>
>Gentlemen:
>
>I am completely sick of all these debates about XML literals.
>
>
>Join the club ;-)
>
>Allow me to suggest a possible solution, along the lines suggested 
>by Peter, which will serve to resolve them without making any 
>substantial changes to the current RDF design and to everyone's 
>general satisfaction.  This is a wording change to the Concepts 
>document; I do not believe it amounts to any real change in our 
>current design, and may be easier to follow.
>
>1. Concepts section 5.1 modified as follows (change starts at ***)
>.....
>
>Such content is indicated in an RDF graph using a typed literal 
>whose datatype is a special built-in datatype rdf:XMLLiteral , 
>defined as follows.
>A URI reference for identifying this datatype
>is http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral .
>The lexical space is the set of all strings which:
>are well-balanced, self-contained XML data [ XML ];
>correspond to exclusive Canonical XML (with comments, with empty 
>InclusiveNamespaces PrefixList )[XML-XC14N] ;
>when embedded between an arbitrary XML start tag and an end tag form 
>a document conforming to XML Namespaces [XML-NS]
>***
>
>
>Fine so far (i.e. the unchanged bit ;-)
>
>
>The value space is some set of entities, called XML values, which is:
>disjoint from the lexical space
>
>
>OK.
>
>disjoint from the value space of any XML schema datatype (refer XSD)
>
>
>Debatable,

I'm saying we are *defining* it this way. No debate is possible.

>and we should be careful about making this claim. Better to say nothing
>than something that may end up biting us later.
>
>
>disjoint from the set of Unicode character strings (refer Unicode)
>
>
>Maybe.  

See above

>
>
>in 1:1 correspondence with the lexical space.
>
>Right.
>The exact nature of XML values is not specified.
>
>
>No. This bothers me. Alot.
>
>It is our responsibility to define what the values of XML Literals are.
>It's *our* datatype, and no'one else should have to define it,  or guess.
>

Thats exactly what I am doing. Its OUR set, and we are saying that 
its not the same as any Unicode or XML Schema set. However, it seems 
to me that we could say:

..... is not specified, but can  be thought of as the Xpath nodeset 
of the XML literal.

>Taking this position is irresponsible, at best.
>
>We can either take the present position, whereby XML literals have
>lexical forms constituting canonical XML fragments encoded as
>Unicode strings and values constituting the lexical form in UTF-8.
>
>Or, if that bothers I18N, we could take a position whereby XML literals
>have  lexical forms constituting canonical XML fragments encoded as
>Unicode strings and values corresponding to infosets.
>

The problem with saying that it IS almost anything defined by someone 
else, is that we get into arcane debates about the exact edges of the 
identity criteria for those things. Peter and Brian and Jeremy and 
Graham can legitimately disagree about issues like whether <br></br> 
and <br /> indicate the same one of these things or not. The only way 
to break out of this tangle is either to re-write the entire XML 
document suite more coherently as though it were written  with a 
single voice (not an option - CMSMcQ just quoted Saussure at me, for 
God's sake; when it gets to Derrida I am out of here) or else to just 
say as clearly as possible what assumptions WE are making about these 
things.  And my suggestion for doing the latter is to make them 
distinct from everything else they can possibly be confused with. 
Then later on, other spec writers are free to say that what they are 
talking about is the same as what we are talking about, if they want 
to do that.

>This latter option has always struck me as the correct solution, as
>after all, that's really what we're talking about, XML documents,
>not just uninterpreted sequences of characters, and is not the
>very point of the Infoset spec to provide a consistent interpretation
>of sequences of characters in terms of XML?

I have no idea.

I am surprised that you would object to the above on the grounds that 
we arent being honest, but you are willing to swallow the idea of an 
infoset. I have read this document more times than I want to admit, 
and I still have absolutely no idea what infosets ARE.   The document 
never tells you : it just says they are an abstract set satisfying 
certain rather complicated conditions. My proposal is to do the same, 
but with much simpler conditions.

>
>I've never understood the opposition to having a value space
>consisting of infosets. I wish someone would tell me what significant
>problem or issue I'm missing...

My only worry is, how do I tell when two XML literals denote the same 
infoset? And will Peter, you , me and uncle Tom Cobbley come to the 
same conclusions?

Pat

-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 1 August 2003 16:34:33 UTC