Re: pfps-04 (why the thread is germane to pfps-04)

From: Martin Duerst <duerst@w3.org>
Subject: Re: pfps-04 (why the thread is germane to pfps-04)
Date: Fri, 25 Jul 2003 15:13:51 -0400

> Hello Peter,
> 
> Many thanks for your very clear and detailed explanations.
> 
> At 07:54 03/07/25 -0400, Peter F. Patel-Schneider wrote:
> 
> >This quesion is related to pfps-04 because pfps-04 is concerned with
> >equality between XML literals in RDF.
> >
> >
> >The root of this problem is that a complete treatment of XML literals in
> >RDF needs a complete theory of equality for them.  This theory of equality
> >cannot just determine equality between XML literals in RDF but also has to
> >determine equality between XML literals and other objects in the RDF domain
> >of discourse, in particular plain RDF literals and the value space for the
> >XML Schema string datatype.
> 
> [This is a very general concern]
> As far as I understand, RDF does not really mention XML Schema datatypes
> in any normative way, so how would it normatively specify equivalences
> to these datatypes? Also, what about other datatype systems that have
> very similar constructs? A lot of datatype systems will have some kind
> of 'string' type, and a lot of such systems will have some kind of
> numeric types (which you mention below). What about these equivalences?

I don't think that this is quite correct.  RDF has facilities to include
datatypes, and even mentions some of the XML Schema datatypes as suitable
for use in RDF as well as providing names for them.  I didn't think that
this would cause any problems, because the suitable XML Schema datatypes
have a well-defined lexical to value mapping.  

However, this value mapping means that "1.5"^^xsd:decimal and
"15E-1"^^xsd:float are equal which caused some consternation.  (I think
that the XML Schema people wanted to disallow the use of a decimal when
something of type float was required.)  Taken to the extreme, this would
make the value spaces of all RDF datatypes disjoint (except, perhaps, those
somehow derived from another).

The situation with respect to strings and numbers is not really a problem
here.  Strings and numbers are disjoint (even though the lexical forms of
numbers are indeed strings).

> >Some of these answers can (now) be fairly easily determined from a simple
> >perusal of the RDF documents and the canonicalization documents.
> >
> >Two XML literals are (now) equal in RDF precisely when their Exclusive
> >XML Canonicalizations are the same octet sequence.
> 
> Okay. The equivalences would stay exactly the same if XML literals
> would be represented a character sequences rather than as octet
> sequences.

Yes, but change would make XML literals be the same as certain strings or
literals. 

> >However other answers are harder to determine.
> >
> >1/ When is an XML literal equal to a plain RDF literal?  A plain RDF
> >literal is a Unicode string (sequence of Unicode characters), so this
> >question boils down to whether octets and Unicode characters are disjoint.
> >I found it difficult to answer this question, because of hints in the
> >exclusive canonicalization document that they are not.
> 
> Can you point to the places where you saw such hints. If there are
> such hints, then they definitely have to be fixed, and I'll make
> sure that this happens.

The examples in Section 2 of
http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/ give canonical XML
documents as if they were sequences of Unicode characters.  This indicates
that octets are Unicode characters.

> Apart from that, it is very important to make sure that the plain
> string "<br/>" (in XML written as "&lt;br/&gt;") is not the
> same as the XML markup "<br/>" (in XML written as "<br/>").
> So it is indeed important to make sure this question can easily
> be answered.
> 
> However, I think it is absolutely inappropriate to solve this
> problem by saying that one of them is characters and the other
> is encoded in octets. If there is no other solution here than
> with some kind of hack, I think it would be preferable to say
> e.g. that characters in plain literals are green, and characters
> representing XML literals are red. (and add a note to clarify
> that green characters and red characters are not the same).

I agree, and would be much happier with a different kind of solution.

> >2/ When is an XML literal equal to an XML Schema string?  This would appear
> >to be the same as the previous question, as the value space for the XML
> >Schema string datatype is Unicode strings, but there have been some
> >comments from those involved in XML Schema that the values for XML Schema
> >datatypes are more than just, for example, numbers.  In particular, there
> >have been messages to the effect that in XML Schema decimal 1.5 is
> >different from float 3E-1,
> 
> I guess this should read 15E-1 ?

Yes, I went too fast her.

> >even though the first is defined as 15x10^(-1),
> >where 15 is the integer 15 and 1 is the integer 1 (and 10 is the integer
> >10), and the second is defined as 3x2^(-1), where 3 is the integer 3 and -1
> >is the integer -1 (and 2 is the integer 2).
> >
> >I could probably dig up references for the above, but it would be
> >considerable work.  If anyone is really interested just ask, and I'll get
> >around to it soon.
> 
> I agree that this is an important question. I don't really need the
> references, I can understand both the advantages and the problems of
> such a position. I'm surprised to see that this still isn't clear;
> I would have assumed that the RDF Core WG and the XML Schema WG
> would have clarified this quite some time ago.

Agreed.  

> Regards,     Martin.

peter

Received on Friday, 25 July 2003 21:35:08 UTC