- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 21 Nov 2001 09:08:53 +0200
- To: phayes@ai.uwf.edu
- Cc: w3c-rdfcore-wg@w3.org
> What the S and DC (and URV) proposals do keep simple is > the idea that RDF graphs can be tidy on literal nodes as well as on > uriref nodes, which would indeed allow the RDF graph syntax to be > stated more concisely since there would be (as Dan C. has noted) no > need to bother with the distinction between nodes and labels. Then I point you to my latest recommendation http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0579.html which, in a nutshell, proposes that a combination of the P (not P++), U, and DC (with slightly different vocabulary) proposals be adopted as equivalent representations for asserting the same pairing of data type with lexical form. If folks need maximal graph compression such that all typed literals participate in tidying operations, then use URVs. The S proposal, however interesting and whatever the apparent positive qualities, just raises too many questions and issues and may very well break more things than it fixes. There is also the risk (a big one IMO) that such a radical change at this point in time will undermine much of the recent positive perception and adoption of RDF -- if suddenly folks have to start thinking differently about how they are doing data typing -- and have to go and convert all their data to use e.g. properties rather than anon node idioms or range constraints (both of which are precluded by the S proposal). > >such that the S treatment is preferred because it is (supposedly) > >easier to define in the MT but it does not reflect common usage > >or present definition of the RDF graph model or intuitions about > >the purpose and semantics of terms such as rdfs:range, > rdfs:subClassOf, > > or rdfs:subPropertyOf. > > I think it respects all of this except maybe current usage. Which is a *huge* issue, no? > The S > proposal doesn't require us to revise ... > the meanings of any of the rdfs vocabulary. Sure it does. It says that I can't use rdfs:range to assign type to the value of a "data type property". Thus the definition of rdfs:range has to be modified to state for which "types" of properties it should not or cannot be used. Earlier, I offered an example like: x ex:age "10" . ex:age subPropertyOf xsd:integer . ex:age rdfs:range xsd:integer . and was told it was "wrong" to use rdfs:range because xsd:integer, according to the S proposal, is a property, not a type! Yet current usage such as the following x ex:age [ rdf:value "10", rdf:type xsd:integer ] or x ex:age "10" . ex:age rdfs:range xsd:integer . declares that xsd:integer is a type class and thus a legitimate value for an rdfs:range constraint -- and if the S proposal precludes that, then the S proposal is unacceptable on that point alone. > >And the present idiom based on anonymous nodes is IMO much > >clearer and accomplishes the same purpose but does so per > >the present RDF "tradition" without mucking up type and > >property distinctions: > > > > xxx --rdf:type---> foo:date . > > xxx --rdf:value--> "2001-11-29" . > > > >Thus, the anonymous node (bNode) denotes the value, and it > >has properties for type and lexical form, and thus acts > >as the identity in the graph for that pairing. > > That is the DC idiom. Essentially, yes, though with slightly different vocabulary. The DC idiom as proposed uses rdf:label rather than rdf:value. > But that has some severe problems of its own, > as we have already noted in earlier discussions, and I was under the > impression that the S proposal was generally considered superior. Perhaps superior in some respects, yes, but I don't consider it a superior solution taking into account issues such as common perception and usage, or risk of unknown impact or conflict with current RDF mechanisms -- in which case it is one of the least suitable proposals on the table IMO. (not that my opinion has much value) > The chief problem with the use of rdf:value to link values to lexical > forms is that the link between type and form is too weak, Actually, I see the DC form as providing the stronger link, because even in the context of "careless binding" of values by inference on subPropertyOf relations, all the type information is carried along (as is the case for URVs). One can really view URVs as a URI packaging of the DC idiom. > and if the > same value is specified in two ways, then the link can be completely > lost, eg if > > xxx --rdf:type --> xsd:binary > xxx --rdf:value --> "111" > > and > > xxx --rdf:type --> xsd:integer > xxx --rdf:value --> "7" > > Notice BTW that if we use rdfs:subClassOf on datatypes then > xxx rdf:type xsd:integer . > will be entailed by > xxx rdf:type xsd:binary . Well, actually, there is no such type as xsd:binary, but for the sake of example, let's pretend there is and that it is a subClassOf xsd:integer and has a lexical space based on binary notation of integer values (that's what you meant, right?) Thus above, we have two different TDLs (Typed Data Literals; see link to my last proposal for the full definition). In a nutshell, a TDL is a pairing of lexical form (literal) with data type (URI), which denotes a single value in the value space of the data type. So we have the TDLs ("111",xsd:binary) and ("7",xsd:integer) and each anon node that has these pairs of properties denotes *some* value in the respective value space defined for the TDL. In the first case, the value denoted by the lexical form "111" is in the xsd:binary value space. In the second case, the value denoted by the lexical form "7" is in the xsd:integer value space. The xsd:binary value space is a *separate* space from the xsd:integer space. Right? The relation of xsd:binary subClassOf xsd:integer states that all members of the xsd:binary value space (not lexical space) are also members of the xsd:integer value space -- thus, we can infer from that relation that the two anon nodes denote the same value. *BUT* the two anon nodes do *not* constitute the same TDL! Nor do they denote the *same* value, insofar as the explicitly declared knowledge is concerned. Just as one may infer, by means of a daml:equivalentTo relation that two resources are the same, so too may one infer, by means of a subClassOf that two TDLs denote the same value. But that doesn't mean that the nodes should be merged. Eh? > The fact that xsd:bin > and the upward-incompatibility problems that you raised concerning > the P proposals would apply here in just the same way. I never said that there were upward-incompatability problems with the P(++) proposals, per se, only that the subClassOf relation between data types only applies to value spaces and not lexical spaces. > That objection > applies to *any* datatyping proposal that uses class reasoning on the > value spaces of datatypes; Exactly. That same issue applies to *ALL* of the proposals! Including S. > the S proposal escapes it precisely by > treating datatypes as properties rather than as classes. How does that allow S to escape it?! If xsd:binary is a subPropertyOf xsd:integer, then that doesn't mean that any value of the xsd:binary property is a valid value for the xsd:integer property. xx xsd:binary "111" . does not mean that xx xsd:integer "111" . is valid because xsd:binary subPropertyOf xsd:integer! No? The same issues of inference binding of literal values to superordinate properties with incompatible lexical spaces applies just as much to the S proposal as to all the other proposals -- except the DC or U proposals! Because with the DC and U proposals, the value being bound to the property is either an anon node, "carrying along" with it the needed type information, or a URI in which the type information is encapsulated. So in this regard, the S proposal is just as vulnerable as the P proposals, and only U and DC are "safe". > I can't see any simple way around this problem, by the way. If > datatypes are classes and if we expect to be able to use normal class > reasoning on them - which includes the use of rdf:type - then normal, > valid, RDFS class reasoning is liable to produce wrong datatype > answers, in general. Not if the reasoning is about value spaces only, and it is accepted that even if a given value is deemed to be a member of a particular value space, its lexical representation (literal) may not be a member of the lexical space for that data type. > This is a very general and robust problem, and > there is no simple way to wriggle past it. Then perhaps it should be deferred to a "future working group" rather than making radical changes such as the S proposal in the hopes that maybe it *might* be better overall than the present common usage as reflected by use of rdfs:range and idioms such as DC. > The only ways I can see to > get past all involve somehow isolating datatype reasoning from class > reasoning, either by removing it completely (the URV and S > proposals); or maybe by providing a special subproperty of > rdfs:subClassOf to be used on datatypes , something like > rdfs:subDatatypeOf, with its own special semantic conditions; or > maybe by declaring that RDF is only guaranteed to give correct > answers when used on 'upward compatible' datatyping schemes (ie those > for which > > aaa rdf:type rdfs:datatypeClass . > bbb rdf:type rdfs:datatypeClass . > aaa rdfs:subClassOf bbb . > > together entail > > aaa rdfs:subDatatypeOf bbb . ) Would it not simply be sufficient to state that subClassOf relates values, not lexical forms? Thus, any member of the value space of type 'aaa' is also a member of the value space of 'bbb' -- though the lexical spaces for these two types may have no intersection whatsoever, and the "execution" of the mapping from lexical form to value must occur within the context of the specific data type to which a literal is bound. This allows one to reason about the relations between types and equivalences of values without requiring that there be perfect subsetting of lexical spaces in an upward compatible manner. Eh? Cheers, Patrick
Received on Wednesday, 21 November 2001 02:10:51 UTC