- From: pat hayes <phayes@ihmc.us>
- Date: Tue, 29 Jul 2003 00:46:24 -0500
- To: Martin Duerst <duerst@w3.org>
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
>Hello Pat, > >I have copied one part of your mail from the middle to the top >to discuss it first. > >>>However, I think it is absolutely inappropriate to solve this >>>problem by saying that one of them is characters and the other >>>is encoded in octets. >> >>We aren't saying that XML literals denote things that are encoded >>in octets: we are saying that XML literals denote the octets >>themselves. > >Sorry I wasn't precise enough. I think the reason for this is >that it's just very difficult for me to think that XML fragments >could denote octets. Well, I also have some trouble figuring out what XML is supposed to refer to, I admit. >The way this usually works is that the >octets on the wire or on a disk denote characters, and some >of these characters then in turn denote things such as start >tags, element names, attribute names, attribute values, or >character content, and the overall sequence then denotes an >XML document or an XML fragment. ??? You must be using 'denotes' in a different way that I tend to think of it. You are here referring to a text-assembly/lexical-analysis/parsing process, right? Going up layers of encoding from bytes up to some kind of syntactically defined structure - in this case, XML. I don't usually think of that as what denotation is about. Denotation starts when you have got the the syntax worked out, then you ask what it *means*. Now, the cases we are considering here are weird precisely because when you ask what a string-typed literal means, you get right back to the syntax: the whole point of using text to denote strings is that the string in the text pretty much denotes itself. Hence the RDF plain literal semantics. If we could say that XML literals denoted themselves, I would have just *loved* that idea. We almost did at one time, in our innocence: at that time XML literals were just like plain literals except they had a kind of XML 'bit' which registered them as being XML instead of just being text: but they *were* text, in every other way: they denoted themselves, they were character strings, etc.. (One difference was that if the character string of a plain literal weren't legal XML markup, nothing happens, but if the same is true of an XML literal then the literal itself behaves differently, eg its not in the class rdf:XMLLiteral, things like that.) But that got rejected as being much too fine-grained, since all kinds of character-string differences (like whitespace in markup) would make literals be distinct that XML would consider indistinguishable. >There are some specific cases where characters denote characters >(in particular with escaping), or characters denote octets >(escaping in some special cases such as URIs, and things >such as base64), but they are exceptions. > >This just lets me wonder: If XML fragments denote octets, then >what about the XML Schema base64Binary datatype? From XML Schema, >part 2 (http://www.w3.org/TR/xmlschema-2/#base64Binary): > >>>>> >3.2.16 base64Binary > >[Definition:] base64Binary represents Base64-encoded arbitrary binary data. >The .value space. of base64Binary is the set of finite-length sequences of >binary octets. For base64Binary data the entire binary stream is encoded using >the Base64 Content-Transfer-Encoding defined in Section 6.8 of [RFC 2045]. >>>>> > >Are 'binary octets' different from 'octets'? I have absolutely no idea. :-) >At 17:01 03/07/27 -0500, pat hayes wrote: > >>>At 07:54 03/07/25 -0400, Peter F. Patel-Schneider wrote: > >>> > Two XML literals are (now) equal in RDF precisely when their Exclusive >>>>XML Canonicalizations are the same octet sequence. >>> >>>Okay. The equivalences would stay exactly the same if XML literals >>>would be represented a character sequences rather than as octet >>>sequences. >> >>'equal' here means 'denote the same thing', not 'is identical to' . >>Nobody is suggesting interfering with how literal strings are >>represented or encoded. We had to choose some criterion to refer to >>in order to establish questions of identity between referents. > >But why not just say that XML Literals are XML Literals to establish >their identity? Or call them XML fragments, or text with markup, or >whatever you think will work best. What would YOU like them to be, in order to have them work best? Suppose they are text with markup. Now, consider "<br />"^^rdf:XMLLiteral "<br/>"^^rdf:XMLLiteral are these equal or not? If text-with-markup is defined in terms of character sequences then they are not. So how is it defined, so as to make these be equal? >>>Apart from that, it is very important to make sure that the plain >>>string "<br/>" (in XML written as "<br/>") is not the >>>same as the XML markup "<br/>" (in XML written as "<br/>"). >>>So it is indeed important to make sure this question can easily >>>be answered. >> >>If we were to specify that plain literals and XML literals both >>denote Unicode character sequences, then "<br/>" and >>"<br/>"^^rdf:XMLLiteral would be equal and neither of them would >>bear any RDF relationship to a literal whose character string was >>"<br/>" So it sounds like you want to say that XML values and >>Unicode character strings must be distinct; which is the situation >>we currently have. > >Let me again try to explain how I think this should have worked >[Because we should have said that during last call, but missed it, >we are explicitly not insisting on this point. I just want to >make sure that we can eliminate misunderstandings]: > >>>>> >XML Literals denote text (character content) with markup >(start tags, end tags, empty tags, PIs, comments). XML >Literals that contain only character content denote the >same thing as plain literals with the same character >sequence (and language information). Well, OK, I agree that would be nice. But it seems to me that text with markup *is* text. If you can write it down as a sequence of characters, that's text. XML is text, by that criterion. If that's not the right criterion, then what is? Another way to ask the same question: what does it mean for two pieces of XML to be the same *considered as XML* that differs from them being the same *considered as text*? I would be happy for XML literals to denote themselves, but if that means what I understand it to mean, then your qualification about 'only character content' is beside the point: any XML literal will denote a character string, markup or no markup. . > >>>> > >By this, "<br/>" denotes a sequence of five characters. >"<br/>"^^rdf:XMLLiteral denotes an empty 'br' tag. OK, but stop there. What *is* that thing? Does an empty 'br' tag count as a character in a character string? Or is this an entity in some abstract XML structural space? Where is this space defined? What kind of stuff does it have in it, and what sorts of structures do they have? Until we get questions like this straight, we can't begin to write formal semantics. I guess this is the central question. We all know what XML *is*: its text plus markup. But what does it *denote* ? What *kind* of thing dos it denote, even? I don't know how to begin to answer that question. >"<br/>"^^rdf:XMLLiteral again denotes a sequence >of five characters, the same five characters as in the >"<br/>" plain literal. That works for examples where the XML markup resolves into XML-encoded Unicode text strings, but is that always true? What about attributes on tags with values.....?? >Even if you disagree that the later two are the same, >because you want to preserve the distinction between >plain literals and the 'XML-ness' of text in XML >literals, a slightly tweaked denotation should give >you that distinction. Maybe, but I would like to see the details. >>The point is, we have a distinction between two kinds of literals. >>To put it crudely, a string (the literal string) can be labelled as >>'plain' in which case it (rather oddly) denotes itself, or as >>'XML-ish', in which case it might denote something else. The >>question is, what? The issue is not to do with how the literal >>itself is encoded or represented. > >I was at one point worrying about the actual representation, >and still worry about that a bit, because some implementers >might confuse these things. But I guess such confusion can >never be completely avoided. > >Anyway, if XML Literals are labeled as XML-ish, it seems most >natural to let them denote something XML-ish, rather than something >octet-ish. I think the problem we have is that we have to say that they denote *something*. XML text as a character string was rejected as unworkable, as it would make tiny character differences ruin XML identities; so we looked for what the XML docs said was the root of XML syntactic identity: when are two pieces of XML "really" the same? And the best answer we (Jeremy) could find was the one we used. We couldn't find anything more XML-ish than this. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 29 July 2003 01:46:28 UTC