- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Mon, 11 Feb 2002 12:35:59 +0200
- To: Pat Hayes <phayes@ai.uwf.edu>
- CC: RDF Core <w3c-rdfcore-wg@w3.org>
On 2002-02-09 9:28, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote: > (I have to say, the idea that RDF is *complicated* seems ludicrous, > in a world with XSD, Java and DAML+OIL in it; Ive never met anyone > who has expressed that view. Mostly it is seen as almost childishly > oversimplified, in circles I move in.) It has always been evident that we move in very different circles... ;-) > largely because it one of the simplest idioms Simplest? How? For whom? > and Linux-grade > robust. I don't see how any of the idioms are any more or less robust than the others. What is the basis of your ascribing this quality to one and not the other idioms? > There MIGHT be, of course; any two RDF nodes *might* co-refer.... > > _:x1 rdf:value "05-08-02" . > _:x1 rdf:dtype ex:USdate . > _:x2 rdf:value "08-05-02" . > _:x2 rdf:dtype ex:UKdate . > > Now, *how* do we say that _:x1 and _:x2 co-refer? There is no way to > say this in RDF. So the datatype triple style enables us to express > some content that cannot be expressed any other way in RDF. You missed my point entirely. How do you say that _:x3 and _:x4 in the following co-refer? _:x3 ex:USdate "05-08-02" . _:x4 ex:UKdate "08-05-02" . The whole point was that (a) in order to consistently address equality of values you must have an application that supports all of the datatypes in the graph and (b) if you have such an application, why muck about with the idioms anyway, just use the values! Thus, the utility that you and others ascribe to the idiom is, I assert, mostly an illusion, insofar as knowledge interchange is concerned. The only utility comes within the context of a datatype aware application -- and in that context there are far better representations than the datatype triple. >> Thus, the datatype triple idiom actually does not offer any real utility. > > I think that this ability to record information about entities using > a variety of datatypes might be extremely useful when merging > information from a number of different sources, and is not illusory. > The point is not to eliminate value comparisons, but to provide a way > to merge information from disparate sources without needing to worry > about datatype consistency; without, in fact, needing to even > consider it; And how, pray tell, can you achieve that merging without value comparisons?! In order to merge, you must compare values. > since if one uses this style consistently in an > application, clashing datatypes can be used with impunity since none > of the scopes can possibly overlap. This is the only mode of literal > use in RDF that can completely avoid all checking for datatype > clashes in a completely open environment. Again, you are mixing implementation space and model space. You are arguing that we should keep the datatype triple idiom because it makes internal graph representation more economical or captures that two lexical representations denote the same value. Could you provide some examples, e.g. queries or similar, where the datatype triple idiom effects either the expression of the query or its accuracy if we are concerned with values, and not the idioms themselves? I doubt it. If we are going to argue idioms for the same of implementational benefit, then I would say that the URV idiom beats all of them hands down -- so forget both the doublet and datatype triple idioms and use URVs which achieve maximal tidyness in the graph and also completely avoids all datatype clashes, etc. etc. > Imagine for example an HTML content scraper that records results > initially by treating text fragments as literals, and has a variety > of techniques for guessing datatype relations between the things it > guesses exist and the test that refers to them. If it were obliged to > use doublets, it would need to expend considerable work to keep track > of possible clashes, How so? I can think of several ways to do this easily. The most obvious is a typed node with a membership property (e.g. a type of container). > and would need to use some techniques external > to the RDF triple store in order to keep track of co-references > between sets of bnodes. Not at all. I wouldn't use the datatyping idioms for the process specific knowledge at all. Rather, I'd define an ontology for the process that keeps track of the literals and the possible intepretations that are suggested by various input content, and then, once done with the scraping, analyze the various possiblilities and express the results in terms of the datatyping idioms. > All this is unnecessary if it uses datatype > triples. It can even make up its own 'datatypes' as needed and treat > them identically in the triples store. Well, I don't really see much more utility or graph compression in _:x ex:datatype1 "foo" . _:x ex:datatype2 "foo" . _:x ex:datatype3 "foo" . _:x ex:datatype4 "foo" . than in _:x rdf:value "foo" . _:x rdf:dtype ex:datatype1 . _:x rdf:dtype ex:datatype2 . _:x rdf:dtype ex:datatype3 . _:x rdf:dtype ex:datatype4 . and in fact, I consider the latter to be more intuitive. The restriction of one rdf:value can be either a constraint of the idiom (an addition to the present definition) or a constraint of the scraper application -- and the multiple types may conflict until the application decides which is/are correct for the literal in question. > What counts as necessary? (Are containers necessary? Yes (even if the present treatment is not optimal) > Is reification > necessary?? Yes > Is negation necessary? Some think so ;-) > ) It is clear that the 'idiom' is > found intuitive by many people, even in this working group; it arises > naturally from established XML usages, as Sergey has noted. Actually, I don't think that is an accurate statement, which I've explained in earlier responses to Sergey on this point. It no more reflects XML usage than any of the other idioms. I could (just as wrongly) make the same claim for each of the other idioms. > Why not > allow people to use it, if they find it natural, it has clear use > cases, and it comes virtually for free? (The work needed to recognize > a datatyping triple is about identical to that needed to detect a > doublet, and apart from having smaller scopes, they mean the same > thing.) Again, the issue is that if the doublet idiom does the job, we do not need two idioms doing the same thing. That needlessly increases the burden on both users and implementors. >> b) It is not as symmetrical with the global idiom, therefore harder >> for users to understand its relationship with the global idiom than >> is the doublet idiom. > > I have no idea what this means. What sense of 'symmetrical' is being > used here? The fact that the doublet and global idioms are identical except for the presence or absence of the rdf:dtype property. I.e. they look similar both in the graph and in the XML serialization. Their relationship is "visually" reinforced for the user. > The meaning of a datatype triple is not hard to grasp or > difficult to work with. One can think about it simply in terms of a > packaged doublet with a limited naming scope, and never make a > mistake in usage. Even the MT (which most users will never read) > states the truth-conditions in one small equation. Its simpler than > most of RDFS. This may come as a surprise to you, Pat, but most users of RDF will neither care to nor be able to read the MT. To say that something is "easy to grasp" because of the MT, while perhaps true in the circles you move in, has little to no weight in the circles I move in. That's not saying the MT is not important, it is, but with all due respect, what you percieve as easy, intiutive, or optimal is not necessarily what the typical RDF user will find easy, intuitive, or optimal. This is no insult to the "typical" RDF user, but rather a compliment to you. >> RDF is already widely percieved as "difficult to understand" and >> "difficult to use". The last thing we want to do is make it any >> more difficult by making the datatyping solution needlessly >> complicated. > > See above. I should probably not comment on this further, for fear of > giving offense. Likewise ;-) >> We have an opportunity to provide a solution based on two clearly >> and intuitively related idioms > > I think that it is better to think about these as all variations on a > theme - basically hanging datatyping information into a value triple > in one way or another - than as a catalog of 'idioms'. We have been > talking that way, but I think it makes things needlessly > awkward-seeming, since one can grasp them all as variations on two > basic ideas. I agree, in that the idioms are just expressions of the same underlying concept -- which has been expressed in the U and TDL proposals for a long long time. That a literal within a datatype context is a lexical form that denotes a value. That's crystal clear. Changing "typed data literal pairing" to "value triple" does not change the underlying idea. And the separation of idioms from that core model was a fundamental goal of the TDL proposal. Though even though the idioms are at a separate layer from the core model, that does not mean we want alot of them. And I have argued consistently that the idioms are secondary to the underlying model, and that we should have the absolute minimal number of idioms. >> which help users understand the >> relation between typed literals and the datatype that provides the >> context for that typing. The superfluous, more complex > > It can hardly be called more complex; its about the simplest form one > could imagine. I am not speaking about form. I am speaking about the whole enchilada. Understanding how the form relates to the datatyping model and results in an interpretation providing a value. How an application (or user) puts it all together and understands the sum total. >> datatype >> triple idiom undermines us providing the simpler, fully symmetrical >> solution. >> >> c) It requires schema definitions to use -- and thus it is not a >> schema-free local idiom, which was the whole point of providing a >> local/explicit idiom. > > I think this is wrong on two counts. First, it doesnt *require* using > a schema definition - that may be a problem with my exposition in the > first draft. Second, what is this about local idiom being > 'schema-free'? Ive never heard of that idea in our discussion before, > and I don't know what it means. Arent we talking about a schema > language here? You should have a look at the desiderada, then. It has been repeatedly stated that we must have an idiom that captures the datatyping explicitly and which can be interpreted by an application without any additional schema knowledge -- thus the fact that there must be an rdfs:subPropertyOf relation defined for *every* datatype "property" means that it is not a local/explicit idiom. An application cannot differentiate between datatype properties and non-datatype properties without it. The datatype triple idiom is not safely recognizable by an application without that extra schema knowledge. >> One must define each datatype as an rdfs:subPropertyOf rdf:value >> in order for the MT interpretation to work. Thus, the idiom does >> not meet the desiderada of either a local/explicit or global/implicit >> idiom, but is a kind of strange hybrid that needs both local >> definition and schema definition to work. > > I think this is just plain wrong. Which? That the idiom does not work without the rdfs:subPropertyOf statements or that it is not a true local idiom? Both of those assertions, however, are correct. > There is a single, clear, notion of > scope that handles all three idioms. The scope of a datatype is the > 'area' of the graph within which it imposes an interpretation on > literals. No disagreement there. That's simply an expression of the TDL concept. > The scope of a datatype triple is the triple itself, the > most local idiom possible. Wrong. Without an rdfs:subPropertyOf rdf:value statement, you cannot know that it is a datatype triple. Only the doublet idiom has such a constrained scope. Without knowledge that the datatype property in question is a subproperty of rdf:value, it is just another property and not a datatyping property. > If any of these deserves to be called a 'hybrid', it would be the doublet > case. I very much disagree. I.e. Global Doublet Datatype Triple -------------------------------------------- Local Typing + + Schema Required + + Now, which one is the hybrid? >> >> 3. The idiom forces the qname issue. >> >> The XML Schema community strongly dispute RDF qname practice as >> valid and an idiom that requires the use of qnames puts us deep in >> the middle of that issue -- which likely cannot be resolved within >> the boundries of our present charter. >> >> One has no choice but to use qnames to use the datatype triple >> idiom, whereas the other idioms work with full URIs and avoid >> this issue entirely. > > I do not follow this. We refer to 'urirefs' and Ive always assumed > that whatever they are, full URIs always count as urirefs. So it > would seem to follow that one could use full URIs in this case as > well. In fact, it seems to me that any uriref that can be used as a > node label in RDF can also be used as an arc label. So why does one > have 'no choice' about using qnames here?? Because, with the datatype triple idiom, one must use qnames in the RDF/XML serialization, yet with global and doublet idioms one may use only complete URIrefs in the RDF/XML serialization. The scope of the datatyping solution extends beyond just the MT or the graph. It must be an optimal solution for the entire scope of RDF, which includes XML serialization and other usability issues. >> Why exacerbate this issue needlessly? >> >> 4. Its interactions with rdfs:subPropertyOf are not clear. > > No, they are perfectly clear and unambiguous. That is still not evident to me. And if not evident to me, likely also not evident to a great many RDF users. (I know that probably comes off sounding arrogant or conceited, but I don't know how else to say it...) > Again, why should we mention it particularly? I mean, we might also > point out that it is probably a bad idea to say that ex:marriedTo is > a subProperty of, say, ex:favoriteDogIs.... No. You are trivializing the issue. It appears logical, efficient, and clever to get double duty out of a property by subclassing it as a datatype property rather than defining a range -- whereas your example above is just plain stupid. >> Again, more questions to be answered and addressed by >> the MT, the primer, the spec, or elsewhere. > > Nope. I disagree. Maybe not in the MT, if you are correct and there really are no longer issues there, but certainly *somewhere*. I don't expect we will leave the users in the dark about such pitfalls. >> See my comments in my reply to Pat's summary which detials a >> potential erroneous inference based on combined use of all three >> idioms (sorry, offline, but should be easy to find). > > And see my reply explaining why it isn't an erroneous inference. Fair enough, though it is unexpected. I.e. the explicit knowledge appears correct, and the reason why it is not correct is hidden in the machinery in a way that it is not hidden for the doublet idiom. >> Please, please, please, let's drop this extra idiom and move on. OK? > > Lets keep it and move on. Patrick doesn't have to use it if he > doesn't want to. Sergey and I will use it, and I suspect that many > other people will also use it. Gee, and I thought we were thinking about the entire RDF community... not just what we individual members would personally like to use. Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Monday, 11 February 2002 06:54:04 UTC