W3C home > Mailing lists > Public > www-rdf-comments@w3.org > July to September 2003

Re: pfps-04 (why the thread is germane to pfps-04)

From: pat hayes <phayes@ihmc.us>
Date: Sun, 27 Jul 2003 17:01:26 -0500
Message-Id: <p06001a27bb49f2d6a524@[]>
To: Martin Duerst <duerst@w3.org>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org

>Hello Peter,
>Many thanks for your very clear and detailed explanations.
>At 07:54 03/07/25 -0400, Peter F. Patel-Schneider wrote:
>>This quesion is related to pfps-04 because pfps-04 is concerned with
>>equality between XML literals in RDF.
>>The root of this problem is that a complete treatment of XML literals in
>>RDF needs a complete theory of equality for them.  This theory of equality
>>cannot just determine equality between XML literals in RDF but also has to
>>determine equality between XML literals and other objects in the RDF domain
>>of discourse, in particular plain RDF literals and the value space for the
>>XML Schema string datatype.
>[This is a very general concern]
>As far as I understand, RDF does not really mention XML Schema datatypes
>in any normative way, so how would it normatively specify equivalences
>to these datatypes? Also, what about other datatype systems that have
>very similar constructs? A lot of datatype systems will have some kind
>of 'string' type, and a lot of such systems will have some kind of
>numeric types (which you mention below). What about these equivalences?
>>Some of these answers can (now) be fairly easily determined from a simple
>>perusal of the RDF documents and the canonicalization documents.
>>Two XML literals are (now) equal in RDF precisely when their Exclusive
>>XML Canonicalizations are the same octet sequence.
>Okay. The equivalences would stay exactly the same if XML literals
>would be represented a character sequences rather than as octet

'equal' here means 'denote the same thing', not 'is identical to' . 
Nobody is suggesting interfering with how literal strings are 
represented or encoded. We had to choose some criterion to refer to 
in order to establish questions of identity between referents.

>>However other answers are harder to determine.
>>1/ When is an XML literal equal to a plain RDF literal?  A plain RDF
>>literal is a Unicode string (sequence of Unicode characters), so this
>>question boils down to whether octets and Unicode characters are disjoint.
>>I found it difficult to answer this question, because of hints in the
>>exclusive canonicalization document that they are not.
>Can you point to the places where you saw such hints. If there are
>such hints, then they definitely have to be fixed, and I'll make
>sure that this happens.
>Apart from that, it is very important to make sure that the plain
>string "<br/>" (in XML written as "&lt;br/&gt;") is not the
>same as the XML markup "<br/>" (in XML written as "<br/>").
>So it is indeed important to make sure this question can easily
>be answered.

If we were to specify that plain literals and XML literals both 
denote Unicode character sequences, then "<br/>" and 
"<br/>"^^rdf:XMLLiteral would be equal and neither of them would bear 
any RDF relationship to a literal whose character string was 
"&lt;br/&gt;" So it sounds like you want to say that XML values and 
Unicode character strings must be distinct; which is the situation we 
currently have.

>However, I think it is absolutely inappropriate to solve this
>problem by saying that one of them is characters and the other
>is encoded in octets.

We aren't saying that XML literals denote things that are encoded in 
octets: we are saying that XML literals denote the octets themselves.

>If there is no other solution here than
>with some kind of hack, I think it would be preferable to say
>e.g. that characters in plain literals are green, and characters
>representing XML literals are red. (and add a note to clarify
>that green characters and red characters are not the same).

The characters *in* the literals are the same.  All literal strings 
in RDF are character sequences in one single uniform sense (sequences 
of Unicode in normal form C). The discussion is about what the 
various kinds of literal *denote*.

The point is, we have a distinction between two kinds of literals. To 
put it crudely, a string (the literal string) can be labelled as 
'plain' in which case it (rather oddly) denotes itself, or as 
'XML-ish', in which case it might denote something else. The question 
is, what? The issue is not to do with how the literal itself is 
encoded or represented.

IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 27 July 2003 18:01:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:15:21 UTC