- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Fri, 13 Sep 2002 08:00:32 +0300
- To: "ext Jeremy Carroll" <jjc@hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com] ----- Original Message ----- From: "ext Jeremy Carroll" <jjc@hpl.hp.com> To: <w3c-rdfcore-wg@w3.org> Sent: 13 September, 2002 05:08 Subject: abstract syntax representation of inline literals > > > > Well it's half past three in the morning, and I can't sleep, and Patrick's > wrong ! > I blame the macchiato that I drank yesterday afternoon. So sorry if I drove you to drink (coffee at least ;-) Maybe my comments below will help you wake up the morning after ;-) > Contents: > A: the choice of lexical form of datatype value is a *comment* > B: We really have a closed set of datatypes > C: infinite variability in representation should be syntactic > D: URI case analgoy is specious > E: round-tripping argument is specious > > A: the choice of lexical form of datatype value is a *comment* > > We all agree that > > <rdf:Description> > <eg:prop>2</eg:prop> > </rdf:Description> > > and > > <rdf:Description> > <!-- I like leading zeros. --> > <eg:prop>2</eg:prop> > </rdf:Description> > > are the same. Yes. But they are the same because the comments are not represented in the abstract graph, but live only in the RDF/XML serialization, so one would not expect to see any different reflected in the graph due to the presence of the comment in the second case. > There is no, a priori, reason why we should not see the choice of lexical > representation of a data value as similarly only a superficial irrelevance, But we have no way to know whether some variation in lexical representation is superficial or not. RDF would have to grok the datatype in its entirety to know that. > and see graph equality test cases as holding (with say > <xsd:int>"2" == <xsd:int>"02") We can say that these two representations are synonymous only by knowing the L2V mapping of xsd:int, and that knowledge is not internal to RDF. Let's try another example that may help separate what is clearly obvious to us, knowing about integers, and what is clear to RDF. Is <foo:blarg>"xxxyyy" == <foo:blarg>"xxxxyyy"??? After all, they only differ by having one extra 'x' in the second case. How could that really matter? Well, we really don't know, because we don't know what the L2V mapping for foo:blarg is, and whether that extra 'x' simply produces a lexical synonym of the shorter lexical form, or results in a different mapping. So, because the semantics of particular datatypes are not known to RDF (and I assert strongly should *never* be known, jus as URI schemes are not known by RDF) all the abstract graph can capture is the language in which typed literals (datatype values) are expressed, and that language employes specific datatype URIrefs and specific lexical forms, and RDF has no knowledge whatsoever to interpret these, only to present them to applications in a consistent and clear manner. If an application that groks the datatype wishes to employ some mechanism of node equality or node merging based on the identitical value interpretation of the two different lexical form representations, fine, but that's the domain of the specific application and not the abstract graph. > > B: We really have a closed set of datatypes > > XSD specifies a closed set of 19 base types, all others, including user > defined types, derive from these. The only values you can have for a user > define type is one of the base types (or a list thereof: at most another 19 > new different types). Derived types are subsets of these (at most 38) types. > If I choose to represent an xsd:decimal "2" as an xsd:int "2" I have only made > a comment that can be automatically verified. There is nothing that is worth > preserving. Hmmmm. OK, re-reading the XSD spec (section 2) we find [Definition:] There exists a conceptual datatype, whose name is anySimpleType, that is the simple version of the ur-type definition from [XML Schema Part 1: Structures]. anySimpleType can be considered as the ·base type· of all ·primitive· types. The ·value space· of anySimpleType can be considered to be the ·union· of the ·value space·s of all ·primitive· datatypes. It is this restriction that every user-defined type must also have a value space that is subsumed by this union of the value spaces of all primitive types that motivates RDF datatyping to be open, and able to work with any conformant datatype, XSD or otherwise. Otherwise, RDF would cease to be extensible and able to address future needs as yet unknown to us, the XML Schema WG, etc. RDF datatyping is not bound to XML Schema datatypes, but supports any datatypes which exibit the required characteristics (lexical space, value space, and N:1 mapping from lexical to value space, where N > 1). So even if the set of XML Schema datatypes was closed, the set of possible RDF compatable datatypes is infinite. Secondly, the WG has not specified that the RDF MT will include the full semantics of all primitive XML Schema datatypes. Thus, even if the set of datatypes is closed, RDF itself knows nothing whatsoever about what xsd:int means and that it is a subtype of xsd:integer, etc. And thus, the RDF MT cannot equate synonymous datatype+lexicalform names by value. The actual values are unknown, and unknowable, to RDF without that full knowledge of the datatypes. > C: Infinite variability > > In XSD there are an infinite number of ways of writing the number 2. > Some of these cosnsit of leading and trailing zeros. Others consist of > defining a new type that directly or indirectly derives from xsd:decimal. > In RDF/XML we already have infinite variability in the choice of how to > serialize a graph (e.g. whitespace and XML comments). > However the model theory is finite in style, and is most easily understood by > adding triples using closure rules. > > Patricks position is that the infinite set of representations of > <xsd:decimal>"2" all are interpreted as the number 2. This means that any > graph involving one of these will entail an infinite number of other graphs. > We would also need a closure rule in the MT of the form: > > If > aaa ppp <ddd>"lll" . > is in the graph, > and > <ddd>"lll" maps to the same value as <DDD>"LLL" under xsd rules then add > aaa ppp <DDD>"LLL" . > to the graph. No. You wouldn't necessarily need to expand the graph to include the potentially infinite set of possible synonomous expressions. And furthermore, you could even write such a closure rule as part of the RDF MT, because the RDF MT -- having no knowledge of which values the datatype+lexicalform names denote -- would not be able to determine such mapping equality. Thus, any such closure rule would depend on some other explicit assertion about the equality mapping for two typed literals, and indirectly at that since literals can't be subjects: IF aaa ppp <ddd>"lll" . aaa ppp <DDD>"LLL" . ppp rdf:type daml:uniqueProperty . bbb qqq <ddd>"lll" . THEN bbb qqq <DDD>"LLL" . But I still don't see the utility of such a closure rule. I certainly don't see that it is necessary for the core RDF MT. In practice, users will tend to use fairly canonical lexical forms and fairly common datatypes to denote particular values, and full interpretation of those denotations (which actual value is denoted) will happen outside the RDF MT, not within it. > RDF closure would be transformed from a fairly easy computation to a merely > theoretical device. Fine for OWL (where we do have to worry about infinity), > unnecessary and a mistake for RDF. I'm not going to say anything more about this issue, being woefully unqualified. Perhaps Pat or others may wish to offer comments about this. > D: URI case analgoy is specious > Patrick said: > > If we are going to do this, then let's be sure that > > http://foo.com/blarg and http://FOO.COM/blarg are > > both mapped to the same URIref node too, eh? > Nowhere in RDF do we suggest any relationship at all between these two URIs. > (Other than they will retrieve the same document - which is implicit in our > specs) And this is my point. RDF *doesn't* state any such relationship. > However, any account of datatyping does say that the datatypevalue nodes in > the graph are interpreted as the values from the value space in the model > theory. I agree that the MT must state that any datatype+lexicalform labeled node denotes *some* value, according to the L2V mapping of the datatype, BUT the MT does not and cannot state *which* value that node denotes. I.e. the MT will say that <xsd:integer>"10" denotes the member of the value space of <xsd:integer> to which the lexical form "10" maps to according to the L2V mapping defined for <xsd:integer> but it does NOT say that <xsd:integer>"10" denotes the value ten. It cannot say that, because it does not include the semantics of <xsd:integer>. > Moreover, we know that we use the XSD rules to work out which values. In the case of the particular XSD datatypes, yes, but it is not RDF that knows that, it is the application that groks the XSD datatypes. RDF itself doesn't know it's *ss from it's elbow insofar as particular datatypes are concerned. It simply knows that a typed literal node denotes *some* value of that datatype. Which value it denotes remains a mystery to the RDF MT. I suspect this perhaps is a key point in our disagreement. You seem to be operating on an incorrect assumption about what a member of rdfs:Datatype is and the degree of knowledge the RDF MT has about it. You seem to be arguing that the RDF MT has full knowledge about the machinery and semantics of XML Schema simple datatypes and that all members of rdfs:Datatype will also be ancestors of xsd:anySimpleType. Both of these assumptions are incorrect, insofar as the new stake-in-the-ground defined by Part 1. In fact, Part 1 is rather pedantic about this point. > Thus, two way entailment between the graphs in the test case is at least > implicit in Part I. > Thus <xsd:int>"2" and <xsd:decimal>"02.0" are much more closely related in our > specs than the two URIs above. But the RDF MT cannot know that any more than it can know about the relationship between the two URIs differing in case. Both URI schemes and datatypes are fully opaque to RDF, and the latter is very clearly specified in Part 1, and that is what the WG has agreed. > > E: round-tripping argument is specious > We already choose that a lot of things are irrelevant to round tripping (e.g. > whitespace, order, xml comments, use of which syntactic rules). > We are free to define another thing that is not included in round tripping. Well, yes, we are. But... Though my argument about round-tripping was mostly concerned with having the abstract graph capture the original statements made in the RDF/XML in the original language in which they were expressed, and if the abstract graph discarded the original datatyping terms, then I consider that far more significant a loss of information than the exclusion of whitespace, comments, etc. (which, by the way, are excluded from having semantic significance by the XML specs, not by RDF). Regards, Patrick PS: I hope this doesn't add to your insomnia... ;-)
Received on Friday, 13 September 2002 01:01:33 UTC