- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Sat, 9 Feb 2002 01:28:45 -0600
- To: Patrick Stickler <patrick.stickler@nokia.com>
- Cc: RDF Core <w3c-rdfcore-wg@w3.org>
>I am very grateful to Pat for his execellent summary of the convergence >proposal in terms of his proposed MT treatment and to Jeremy for his >earlier distillation of it into specific issues. I feel that within >these can be found an excellent solution to the datatyping issue. In case anyone wonders what this is about, take a look at http://www.coginst.uwf.edu/users/phayes/DatatypeSummary.html A newer (slightly simpler) version is being produced at http://www.coginst.uwf.edu/users/phayes/DatatypeSummary2.html >The following (hopefully) explains why I feel it is in the users' >best interest to adopt the convergence proposal without the datatype >triple idiom detailed in section 3 of Pat's summary (which actually >was not included in the original convergence proposal specifically >for the reasons outlined below). > >I've tried to be relatively terse, which I think will be appreciated. >If any particular statement is not clear, just ask and I will gladly >expound upon it until it is clear or until you start throwing heavy >objects ;-) OK. Most of the following points I either disagree with, or think are just mistaken. I don't follow the qname point and would like to have that expounded more fully. I think there are some convincing reasons why datatype triples (ie using the datatype name as a property) have genuine utility, and allowing them is not a heavy burden either on users or implementers. (I have to say, the idea that RDF is *complicated* seems ludicrous, in a world with XSD, Java and DAML+OIL in it; Ive never met anyone who has expressed that view. Mostly it is seen as almost childishly oversimplified, in circles I move in.) >As with my convergence proposal, please read the whole thing before >responding/reacting to a single particular statement, as most are >qualified/explained in subsequent paragraphs. > >OK, here goes... > >1. The datatype triple idiom is not needed, and has no actual utility. > >a) The idiom does not do anything that doublet idiom does not do, >except allow multiple types to be explicitly defined for same bNode, Well, same node, Doesnt have to be a bnode. But OK.... >yet: > >b) The ability to define multiple types for same bNode will, I >think, rarely be useful even even more rarely used, I disagree. I think it has several obvious uses, and will be widely used, largely because it one of the simplest idioms and Linux-grade robust. > and: > >c) The utility of such multiple type definition is an illusion >since it doesn't remove any untidyness in value comparisons because >doublet and global idioms still exist, and also there may be separate >datatype triples which denote same value. There MIGHT be, of course; any two RDF nodes *might* co-refer. But the central point is that RDF provides no way to express this co-reference, which makes it impossible to translate some uses of datatyping triples into RDF at present. For example, consider _:x ex:USdate "05-08-02" . _:x ex:UKdate "08-05-02" . which one would have to expand using two bnodes to avoid datatype clashes: _:x1 rdf:value "05-08-02" . _:x1 rdf:dtype ex:USdate . _:x2 rdf:value "08-05-02" . _:x2 rdf:dtype ex:UKdate . Now, *how* do we say that _:x1 and _:x2 co-refer? There is no way to say this in RDF. So the datatype triple style enables us to express some content that cannot be expressed any other way in RDF. >Thus, the datatype triple idiom actually does not offer any real utility. I think that this ability to record information about entities using a variety of datatypes might be extremely useful when merging information from a number of different sources, and is not illusory. The point is not to eliminate value comparisons, but to provide a way to merge information from disparate sources without needing to worry about datatype consistency; without, in fact, needing to even consider it; since if one uses this style consistently in an application, clashing datatypes can be used with impunity since none of the scopes can possibly overlap. This is the only mode of literal use in RDF that can completely avoid all checking for datatype clashes in a completely open environment. Imagine for example an HTML content scraper that records results initially by treating text fragments as literals, and has a variety of techniques for guessing datatype relations between the things it guesses exist and the test that refers to them. If it were obliged to use doublets, it would need to expend considerable work to keep track of possible clashes, and would need to use some techniques external to the RDF triple store in order to keep track of co-references between sets of bnodes. All this is unnecessary if it uses datatype triples. It can even make up its own 'datatypes' as needed and treat them identically in the triples store. >2. The inclusion of the idiom does not meet the "KISS" desire of users. > >a) Since not needed, it merely adds complexity both for users and >implementors. > >The point of a standard such as RDF is to provide a consistent, >explicit point of intersection between knowledge producers, knowledge >consumers, and application implementors which facilitates efficient >implementation and interoperability. Including needless components >for the sake of style or preference which impact portability of >knowledge and interoperability of systems, or needlessly complicate >implementation or use of such knowledge are contrary the very point >of such a standard. What counts as necessary? (Are containers necessary? Is reification necessary?? Is negation necessary?) It is clear that the 'idiom' is found intuitive by many people, even in this working group; it arises naturally from established XML usages, as Sergey has noted. Why not allow people to use it, if they find it natural, it has clear use cases, and it comes virtually for free? (The work needed to recognize a datatyping triple is about identical to that needed to detect a doublet, and apart from having smaller scopes, they mean the same thing.) >b) It is not as symmetrical with the global idiom, therefore harder >for users to understand its relationship with the global idiom than >is the doublet idiom. I have no idea what this means. What sense of 'symmetrical' is being used here? The meaning of a datatype triple is not hard to grasp or difficult to work with. One can think about it simply in terms of a packaged doublet with a limited naming scope, and never make a mistake in usage. Even the MT (which most users will never read) states the truth-conditions in one small equation. Its simpler than most of RDFS. >RDF is already widely percieved as "difficult to understand" and >"difficult to use". The last thing we want to do is make it any >more difficult by making the datatyping solution needlessly >complicated. See above. I should probably not comment on this further, for fear of giving offense. >We have an opportunity to provide a solution based on two clearly >and intuitively related idioms I think that it is better to think about these as all variations on a theme - basically hanging datatyping information into a value triple in one way or another - than as a catalog of 'idioms'. We have been talking that way, but I think it makes things needlessly awkward-seeming, since one can grasp them all as variations on two basic ideas. >which help users understand the >relation between typed literals and the datatype that provides the >context for that typing. The superfluous, more complex It can hardly be called more complex; its about the simplest form one could imagine. >datatype >triple idiom undermines us providing the simpler, fully symmetrical >solution. > >c) It requires schema definitions to use -- and thus it is not a >schema-free local idiom, which was the whole point of providing a >local/explicit idiom. I think this is wrong on two counts. First, it doesnt *require* using a schema definition - that may be a problem with my exposition in the first draft. Second, what is this about local idiom being 'schema-free'? Ive never heard of that idea in our discussion before, and I don't know what it means. Arent we talking about a schema language here? >One must define each datatype as an rdfs:subPropertyOf rdf:value >in order for the MT interpretation to work. Thus, the idiom does >not meet the desiderada of either a local/explicit or global/implicit >idiom, but is a kind of strange hybrid that needs both local >definition and schema definition to work. I think this is just plain wrong. There is a single, clear, notion of scope that handles all three idioms. The scope of a datatype is the 'area' of the graph within which it imposes an interpretation on literals. The scope of a datatype triple is the triple itself, the most local idiom possible. The scope of a doublet is a kind of 'star' of value triples attached to one node: all value triples with the same subject. The scope of a range datyping on P is a set of stars: all value triples in the graph whose subject is the object of a P triple. If any of these deserves to be called a 'hybrid', it would be the doublet case. > >3. The idiom forces the qname issue. > >The XML Schema community strongly dispute RDF qname practice as >valid and an idiom that requires the use of qnames puts us deep in >the middle of that issue -- which likely cannot be resolved within >the boundries of our present charter. > >One has no choice but to use qnames to use the datatype triple >idiom, whereas the other idioms work with full URIs and avoid >this issue entirely. I do not follow this. We refer to 'urirefs' and Ive always assumed that whatever they are, full URIs always count as urirefs. So it would seem to follow that one could use full URIs in this case as well. In fact, it seems to me that any uriref that can be used as a node label in RDF can also be used as an arc label. So why does one have 'no choice' about using qnames here?? >Why exacerbate this issue needlessly? > >4. Its interactions with rdfs:subPropertyOf are not clear. No, they are perfectly clear and unambiguous. > >a) Datatype properties only make sense with a bNode domain. ? Why? I see no problem in having a datatype triple (or a doublet, for that matter) with a uriref as subject. None of the conditions even mention the nature of the subject node, in fact. >We must >either somehow restrict the use of datatype properties to bNode >subjects or explain to users why they should do that, making more >work for us and more for users to understand. If not restricted in >this fashion, then: > >b) What does it mean to combine datatype semantics with "traditional" >property semantics? ?? They *are* traditional property semantics, with some stuff added that only affects literals. All the 'traditional' rules apply just as before, unchanged. >If some property such as ex:age is declared as >a subproperty of xsd:integer, what does that mean? It means that if xxx ex:age yyy then xxx xsd:integer yyy, which in turn means that xxx has to be an integer and yyy has to be a numeral. That's a very odd way to interpret a property called 'age', but that's what it means. >Should it even >be allowed (per (a) above)? I don't think we can pass laws forbidding people from saying dumb things. >Should users be advised against it, >and why/how? Again, why should we mention it particularly? I mean, we might also point out that it is probably a bad idea to say that ex:marriedTo is a subProperty of, say, ex:favoriteDogIs.... >Again, more questions to be answered and addressed by >the MT, the primer, the spec, or elsewhere. Nope. > >c) Cohabitation with doublet and global idioms not yet certain. > >See my comments in my reply to Pat's summary which detials a >potential erroneous inference based on combined use of all three >idioms (sorry, offline, but should be easy to find). And see my reply explaining why it isn't an erroneous inference. >Even if the MT could be re-re-re-vamped to address these issues, The MT doesn't need to be re-vamped; there are no issues to address. >that's unnecessary work, since we have a local idiom (which is a >true local idiom not requiring any schema assertions to be properly >interpreted) and the single percieved benefit of the datatype triple >idiom is, as explained above, an illusion; so what's the point in >continuing to try to make the idiom work (apart from just stylistic >preference)? > >-- > >Thus, the combined weight of all of these issues regarding the >datatype triple idiom is, I feel, overwhelming, and we will be >serving RDF users far better by giving them a solution based only >on the symmetrical doublet and global idioms. > >If we take the original convergence proposal and Pat's MT for it, >sans the datatype triple idiom, then we can be done with datatyping. >I repeat, done. Whoohoo! We can be done with datatyping without trashing that idiom, even bigger whoohoo. >Please, please, please, let's drop this extra idiom and move on. OK? Lets keep it and move on. Patrick doesn't have to use it if he doesn't want to. Sergey and I will use it, and I suspect that many other people will also use it. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Saturday, 9 February 2002 10:37:06 UTC