Re: pfps-04 (why the thread is germane to pfps-04) from Martin Duerst on 2003-07-28 (www-rdf-comments@w3.org from July to September 2003)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 28 Jul 2003 14:36:37 -0400
To: pat hayes <phayes@ihmc.us>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
Message-Id: <4.2.0.58.J.20030728140032.05761d10@localhost>
Hello Pat,

I have copied one part of your mail from the middle to the top
to discuss it first.

>>However, I think it is absolutely inappropriate to solve this
>>problem by saying that one of them is characters and the other
>>is encoded in octets.
>
>We aren't saying that XML literals denote things that are encoded in 
>octets: we are saying that XML literals denote the octets themselves.

Sorry I wasn't precise enough. I think the reason for this is
that it's just very difficult for me to think that XML fragments
could denote octets. The way this usually works is that the
octets on the wire or on a disk denote characters, and some
of these characters then in turn denote things such as start
tags, element names, attribute names, attribute values, or
character content, and the overall sequence then denotes an
XML document or an XML fragment.

There are some specific cases where characters denote characters
(in particular with escaping), or characters denote octets
(escaping in some special cases such as URIs, and things
such as base64), but they are exceptions.

This just lets me wonder: If XML fragments denote octets, then
what about the XML Schema base64Binary datatype? From XML Schema,
part 2 (http://www.w3.org/TR/xmlschema-2/#base64Binary):

 >>>>
3.2.16 base64Binary

[Definition:]   base64Binary represents Base64-encoded arbitrary binary data.
The .value space. of base64Binary is the set of finite-length sequences of
binary octets. For base64Binary data the entire binary stream is encoded using
the Base64 Content-Transfer-Encoding defined in Section 6.8 of [RFC 2045].
 >>>>

Are 'binary octets' different from 'octets'?


At 17:01 03/07/27 -0500, pat hayes wrote:

>>At 07:54 03/07/25 -0400, Peter F. Patel-Schneider wrote:

>> > Two XML literals are (now) equal in RDF precisely when their Exclusive
>>>XML Canonicalizations are the same octet sequence.
>>
>>Okay. The equivalences would stay exactly the same if XML literals
>>would be represented a character sequences rather than as octet
>>sequences.
>
>'equal' here means 'denote the same thing', not 'is identical to' . Nobody 
>is suggesting interfering with how literal strings are represented or 
>encoded. We had to choose some criterion to refer to in order to establish 
>questions of identity between referents.

But why not just say that XML Literals are XML Literals to establish
their identity? Or call them XML fragments, or text with markup, or
whatever you think will work best.


>>Apart from that, it is very important to make sure that the plain
>>string "<br/>" (in XML written as "&lt;br/&gt;") is not the
>>same as the XML markup "<br/>" (in XML written as "<br/>").
>>So it is indeed important to make sure this question can easily
>>be answered.
>
>If we were to specify that plain literals and XML literals both denote 
>Unicode character sequences, then "<br/>" and "<br/>"^^rdf:XMLLiteral 
>would be equal and neither of them would bear any RDF relationship to a 
>literal whose character string was "&lt;br/&gt;" So it sounds like you 
>want to say that XML values and Unicode character strings must be 
>distinct; which is the situation we currently have.

Let me again try to explain how I think this should have worked
[Because we should have said that during last call, but missed it,
we are explicitly not insisting on this point. I just want to
make sure that we can eliminate misunderstandings]:

 >>>>
XML Literals denote text (character content) with markup
(start tags, end tags, empty tags, PIs, comments). XML
Literals that contain only character content denote the
same thing as plain literals with the same character
sequence (and language information).
 >>>>

By this, "<br/>" denotes a sequence of five characters.
"<br/>"^^rdf:XMLLiteral denotes an empty 'br' tag.
"&lt;br/&gt;"^^rdf:XMLLiteral again denotes a sequence
of five characters, the same five characters as in the
"<br/>" plain literal.

Even if you disagree that the later two are the same,
because you want to preserve the distinction between
plain literals and the 'XML-ness' of text in XML
literals, a slightly tweaked denotation should give
you that distinction.


>The point is, we have a distinction between two kinds of literals. To put 
>it crudely, a string (the literal string) can be labelled as 'plain' in 
>which case it (rather oddly) denotes itself, or as 'XML-ish', in which 
>case it might denote something else. The question is, what? The issue is 
>not to do with how the literal itself is encoded or represented.

I was at one point worrying about the actual representation,
and still worry about that a bit, because some implementers
might confuse these things. But I guess such confusion can
never be completely avoided.

Anyway, if XML Literals are labeled as XML-ish, it seems most
natural to let them denote something XML-ish, rather than something
octet-ish.


Regards,     Martin.
Received on Monday, 28 July 2003 17:24:28 UTC