- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 28 Jul 2003 14:36:37 -0400
- To: pat hayes <phayes@ihmc.us>
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
Hello Pat, I have copied one part of your mail from the middle to the top to discuss it first. >>However, I think it is absolutely inappropriate to solve this >>problem by saying that one of them is characters and the other >>is encoded in octets. > >We aren't saying that XML literals denote things that are encoded in >octets: we are saying that XML literals denote the octets themselves. Sorry I wasn't precise enough. I think the reason for this is that it's just very difficult for me to think that XML fragments could denote octets. The way this usually works is that the octets on the wire or on a disk denote characters, and some of these characters then in turn denote things such as start tags, element names, attribute names, attribute values, or character content, and the overall sequence then denotes an XML document or an XML fragment. There are some specific cases where characters denote characters (in particular with escaping), or characters denote octets (escaping in some special cases such as URIs, and things such as base64), but they are exceptions. This just lets me wonder: If XML fragments denote octets, then what about the XML Schema base64Binary datatype? From XML Schema, part 2 (http://www.w3.org/TR/xmlschema-2/#base64Binary): >>>> 3.2.16 base64Binary [Definition:] base64Binary represents Base64-encoded arbitrary binary data. The .value space. of base64Binary is the set of finite-length sequences of binary octets. For base64Binary data the entire binary stream is encoded using the Base64 Content-Transfer-Encoding defined in Section 6.8 of [RFC 2045]. >>>> Are 'binary octets' different from 'octets'? At 17:01 03/07/27 -0500, pat hayes wrote: >>At 07:54 03/07/25 -0400, Peter F. Patel-Schneider wrote: >> > Two XML literals are (now) equal in RDF precisely when their Exclusive >>>XML Canonicalizations are the same octet sequence. >> >>Okay. The equivalences would stay exactly the same if XML literals >>would be represented a character sequences rather than as octet >>sequences. > >'equal' here means 'denote the same thing', not 'is identical to' . Nobody >is suggesting interfering with how literal strings are represented or >encoded. We had to choose some criterion to refer to in order to establish >questions of identity between referents. But why not just say that XML Literals are XML Literals to establish their identity? Or call them XML fragments, or text with markup, or whatever you think will work best. >>Apart from that, it is very important to make sure that the plain >>string "<br/>" (in XML written as "<br/>") is not the >>same as the XML markup "<br/>" (in XML written as "<br/>"). >>So it is indeed important to make sure this question can easily >>be answered. > >If we were to specify that plain literals and XML literals both denote >Unicode character sequences, then "<br/>" and "<br/>"^^rdf:XMLLiteral >would be equal and neither of them would bear any RDF relationship to a >literal whose character string was "<br/>" So it sounds like you >want to say that XML values and Unicode character strings must be >distinct; which is the situation we currently have. Let me again try to explain how I think this should have worked [Because we should have said that during last call, but missed it, we are explicitly not insisting on this point. I just want to make sure that we can eliminate misunderstandings]: >>>> XML Literals denote text (character content) with markup (start tags, end tags, empty tags, PIs, comments). XML Literals that contain only character content denote the same thing as plain literals with the same character sequence (and language information). >>>> By this, "<br/>" denotes a sequence of five characters. "<br/>"^^rdf:XMLLiteral denotes an empty 'br' tag. "<br/>"^^rdf:XMLLiteral again denotes a sequence of five characters, the same five characters as in the "<br/>" plain literal. Even if you disagree that the later two are the same, because you want to preserve the distinction between plain literals and the 'XML-ness' of text in XML literals, a slightly tweaked denotation should give you that distinction. >The point is, we have a distinction between two kinds of literals. To put >it crudely, a string (the literal string) can be labelled as 'plain' in >which case it (rather oddly) denotes itself, or as 'XML-ish', in which >case it might denote something else. The question is, what? The issue is >not to do with how the literal itself is encoded or represented. I was at one point worrying about the actual representation, and still worry about that a bit, because some implementers might confuse these things. But I guess such confusion can never be completely avoided. Anyway, if XML Literals are labeled as XML-ish, it seems most natural to let them denote something XML-ish, rather than something octet-ish. Regards, Martin.
Received on Monday, 28 July 2003 17:24:28 UTC