- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 28 Jul 2003 15:13:21 -0400
- To: pat hayes <phayes@ihmc.us>
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, bwm@hplb.hpl.hp.com, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
At 17:04 03/07/27 -0500, pat hayes wrote: >>Hello Peter, >> >>At 09:27 03/07/25 -0400, Peter F. Patel-Schneider wrote: >>>I believe that a complete theory of equality for XML literals resolves this >>>comment. I suggest that several test cases be added to the RDF test suite. >>> >>>The related issue of whether the value spaces of xsd:string and plain >>>literals are disjoint also appears to be well on the way to resolution. >> >>Apart from the issue of language information (plain literals can take >>language information, xsd:string can't), what is the reason for making >>these two disjoint? We seem to get into a serious proliferation of >>string-related datatypes that provide no useful distinction. > >True, but I guess my reaction to this is that apparently, this >proliferation exists, and RDF's job is not to try to put the world to rights, I agree that it's not your job to solve other people's problems. But with respect to plain literals, which are a pre-XML-Schema RDF-internal creation, it doesn't seem inappropriate to ask the question whether these are the same as anything similar in the XML Schema type system. To make an analogy, assume that RDF M&S had defined integers as another kind of literal. When integrating this with XML Schema, it would seem natural in such a case that this type was equated with the XML Schema datatype integer. >but to allow anyone to make any assertions they wish to about any topic >they wish to, as far as possible. If therefore there are people out there >who wish to distinguish "Hello World" as character string from "Hello >World" as octet sequence from "Hello World" as XML, or even "Hello >World" as red from "Hello World" as green, who are we to say that they >should not do so? Obviously you already have said yes to some, and no to some others. For example, there is currently no way to distinguish between "Hello World" as XML and "Hello World" as octet sequence because XML Literals denote octet sequences. Also, there is no clear way to distinguish "Hello World" as red from "Hello World" as green. So obviously you make decisions, and these decisions have consequences for everybody. >>In RDF, the simple text "Hello World" (without language information) >>can be a plain literal, an xsd:string, and an XML literal. >>What is the point of them all being different if there is no >>observable difference? > >I am not sure what you mean by 'observable' in this context, or why that >is relevant. Identity does not rely on indistinguishability. Yes, in many cases these are clearly different things. For example, two copies of the same book can look very indistinguishable, yet they are definitely not identical. On the other hand, we very much tend to assume that two integers that are indistinguishable are identical. So one question would be: What are strings closer to, integers or books. >In another message you insist that >" it is very important to make sure that the plain >string "<br/>" (in XML written as "<br/>") is not the >same as the XML markup "<br/>" (in XML written as "<br/>")." >which seems like an unobservable difference to me of exactly the same >kind. How something is written in XML is beside the point: Are you sure about that? We are talking about XML literals, so it very well seems relevant. >the sequence of 5 characters (less-than, lowercase-b, lowercase-r, >forward-slash, greater-than) is what it is. What you seem to be insisting >on is that markup is not text; that indeed makes sense as a parsing >restriction when discussing XML. But (with a passing bow to charmod) >characters are characters. I know which section of charmod you refer to. That section is there to make clear that parsing happens on the character layer, rather than on the octet layer. With US-ASCII, the difference may not be very visible, but looking at EBCDIC or JIS (iso-2022-jp) this difference becomes quite important. In charmod, you will also find ample discussion of escaping, explaining how character sequences can represent other characters. >'<br/>' was a sequence of 5 characters before XML was invented, and its >still the same sequence of 5 characters. When I'm editing XHTML, I will >treat this sequence differently when I see it in the code window than when >I see it in the design window, but its the same 5 characters I am looking >at in each case. The same argument would apply to integers represented as characters, I guess, and other things that would use the same characters but would not be integers. >>>PS: Although the current situation may be technically satisfactory in this >>>area, the pain in getting there suggests that a slightly different >>>description of XML literals might be more useful, perhaps something along >>>the line of making the value space of XML literals in RDF be some abstract >>>set with equality defined as per exclusive XML canonicalization and >>>explicitly determined to be disjoint from the value space of plain RDF >>>literals and also from the XSD value spaces. This would also probably make >>>the XML guys much more happy. >> >>I have proposed something like this just a day or two ago. It would >>definitely make I18N quite a bit happier, because it would not be >>a straightforward violation of the Character Model, and would indeed >>be much more in line with the XML spec. > >I guess we have been working under the tacit assumption that as far as >possible we *should* specify what our RDF-described domains actually are. Yes, I think this is preferable to leaving things completely open. >This abstract set trick does make the semantics easier to state, but all >it does operationally is to guarantee that identities *cannot* be >inferred. If something is in an abstract set then is is definitely not a >XML character sequence or octet sequence or XML markup, for example. Is >this really what i18n wants? So the abstract set trick, if I understand correctly, would say that XML Literals denote elements from an abstract set (let's call it the XML-Literal-abstract-set), and therefore cannot be identical to any other things such as character sequences or octet sequences or whatever? This is not exactly what we would want (because, as discussed above, some desirable identities are missed), but is definitely MUCH better than saying that XML Literals denote sequences of octets. Regards, Martin.
Received on Monday, 28 July 2003 17:24:30 UTC