Re: pfps-04 (why the thread is germane to pfps-04) from Peter F. Patel-Schneider on 2003-07-25 (www-rdf-comments@w3.org from July to September 2003)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Fri, 25 Jul 2003 07:54:59 -0400 (EDT)
To: duerst@w3.org
Cc: bwm@hplb.hpl.hp.com, jjc@hplb.hpl.hp.com, phayes@ai.uwf.edu, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org
Message-Id: <20030725.075459.08252870.pfps@research.bell-labs.com>
This quesion is related to pfps-04 because pfps-04 is concerned with
equality between XML literals in RDF.  


The root of this problem is that a complete treatment of XML literals in
RDF needs a complete theory of equality for them.  This theory of equality
cannot just determine equality between XML literals in RDF but also has to
determine equality between XML literals and other objects in the RDF domain
of discourse, in particular plain RDF literals and the value space for the
XML Schema string datatype.

Some of these answers can (now) be fairly easily determined from a simple
perusal of the RDF documents and the canonicalization documents.

Two XML literals are (now) equal in RDF precisely when their Exclusive
XML Canonicalizations are the same octet sequence.

However other answers are harder to determine.

1/ When is an XML literal equal to a plain RDF literal?  A plain RDF
literal is a Unicode string (sequence of Unicode characters), so this
question boils down to whether octets and Unicode characters are disjoint.
I found it difficult to answer this question, because of hints in the
exclusive canonicalization document that they are not.

2/ When is an XML literal equal to an XML Schema string?  This would appear
to be the same as the previous question, as the value space for the XML
Schema string datatype is Unicode strings, but there have been some
comments from those involved in XML Schema that the values for XML Schema
datatypes are more than just, for example, numbers.  In particular, there
have been messages to the effect that in XML Schema decimal 1.5 is
different from float 3E-1, even though the first is defined as 15x10^(-1),
where 15 is the integer 15 and 1 is the integer 1 (and 10 is the integer
10), and the second is defined as 3x2^(-1), where 3 is the integer 3 and -1
is the integer -1 (and 2 is the integer 2).

I could probably dig up references for the above, but it would be
considerable work.  If anyone is really interested just ask, and I'll get
around to it soon.

Peter F. Patel-Schneider
Bell Labs Research
Lucent Technologies




From: Martin Duerst <duerst@w3.org>
Subject: Re: pfps-04
Date: Thu, 24 Jul 2003 16:06:09 -0400

> Hello Brian, others,
> 
> At 16:54 03/07/24 +0100, Brian McBride wrote:
> >On Thu, 2003-07-24 at 16:31, Peter F. Patel-Schneider wrote:
> 
> > > So the question boils down to whether octets and Unicode characters are
> > > disjoint.
> >
> >I believe they are.  From
> >
> >   http://www.unicode.org/book/uc20ch1.html
> >
> >[[
> >The character identified by a Unicode code value is an abstract entity,
> >such as "LATIN CAPITAL LETTER A" or "BENGALI DIGIT 5".
> >]]
> >
> >i.e. characters are distinct from their encodings.
> >
> >Martin, Jeremy: confirm?
> 
> 
> I have looked at
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#pfps-04
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0091.html
> 
> and wasn't sure why the question below is relevant for adressing issue pfps-04.
> 
> Based on a conversation with Brian that I had a week or two ago,
> I suspect that it may be related to some technical issue of how
> to distinguish between the values of plain literals, string, and
> XML literals. Looking at
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0064.html
> seems to confirm this suspicion:
> 
>  >>>>>>>>
> Peter:
>  > > > Therefore for the RDF entailment rules to be complete, no XML 
> Literal can
>  > > > have a character string as its denotation.
> 
> Brian:
>  > > Right.  The denotation of an XML Literal is an octet sequence, as
>  > > defined by the xml canonicalization spec, see the note in:
>  > >
>  > >
>  > > 
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLi 
> teral
> 
> Peter:
>  > Unfortunately this does not answer the question.  Octet sequence is
>  > undefined in http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/.  At
>  > least some places in this document appear to indicate that an octet
>  > sequence is just a sequence of (Unicode?) characters.
>  >>>>>>>>
> 
> (the short and simple summary of the above discussion is:
> "In order to be able to say that there is a difference between
> plain text and XML, can we claim that plain text is sequences
> of characters and XML is sequences of octets?"
> 
> 
> My answer to the question that Brian asked is: Yes, octets and
> Unicode characters are different. The Unicode standard certainly
> explains that, as does the Character Model:
> http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Storage
> 
> But this is the wrong question to ask. It is totally inappropriate
> to use different layers of an encoding model to make semantic
> distinctions that are not related to this encoding model.
> Although such a statement is not explicitly made in the Character
> Model (because, frankly speaking, we didn't immagine that anybody
> would come up with such an idea), it should be quite clear from
> Section 3.5 Reference Processing Model
> (http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-RefProcModel)
> that this is very inappropriate.
> 
> It seems that the encoding to UTF-8, inherited by Exclusive XML
> Canonicalization from Canonical XML, and very suitable as a
> preparation for digital signing and encryption or for parser
> testing, is confusing. I will request a clarification to that
> specification and will cc the RDF Core WG on that request.
> 
> I am sure that a different and more appropriate way to make the
> distinction can be found.
> 
> 
> Regards,    Martin.
> 
>
Received on Friday, 25 July 2003 07:55:16 UTC