- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Wed, 13 Feb 2002 12:54:11 +0200
- To: <www-archive@w3.org>
-- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com ------ Forwarded Message From: Patrick Stickler <patrick.stickler@nokia.com> Date: Wed, 13 Feb 2002 12:53:22 +0200 To: Pat Hayes <phayes@ai.uwf.edu> Cc: "McBride, Brian" <bwm@hplb.hpl.hp.com> Subject: Re: One final step to datatyping convergence and closure? On 2002-02-12 23:42, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote: >> Thus, those applications which can achieve that merging won't need it. > > I fail to follow this. It might *want* to keep track of the literal > forms, was my point. For example, it might be important to know that > this value (of a bnode) has been found somewhere in the corpus > written in this way (a literal which can be string-matched) using > these conventions (reference to datatype). If all one wants to do is > to get at the values, your point might be well-taken; but that isn't > all that everyone needs to do. But you can get at both by (internally) simply storing the values on/in/as the bNodes -- just as you illustrate in your latest summary, where 35 is on the bNode itself. This is not RDF, but an implementational representation -- implied by RDF -- and it is only in a specific implementation that such a representation is possible, since actual value representations are implementation specific. Insofar as RDF is concerned, that actual value is not on/in/eq the bNode, but an application can put the value there to make queries more efficient -- and queries based on value comparison would be binding variables to the values, not to the nodes, so why should it matter if the nodes are shared/merged/tidy? >> You seem to be hoping that the datatype triple idiom will get >> you "values in the graph" but it can't do that reliably and >> consistently except in a context where you actually *can* >> achieve values in the implementation-specific graph (e.g. in >> the internal RDF engine, attach the actual value to the bNode >> on import, > > Wait, wait. What do you mean, 'attach the actual value'? There is no > way to attach *actual values* (other than strings) to nodes in an RDF > graph. We don't have any RDF syntax for actual values. We have been talking about the implementation space the whole time. I expected that that has been clear. The whole point of your arguments has been regarding implementational utility. So of course if I say 'attach a value to the bNode' I'm talking about the implementation space. I'm not stupid. I've been working with RDF long enough to know that you can't attach a value to a node in the RDF graph proper, and I would have expected you to give me that benefit of the doubt and consider that you had perhaps misunderstood me. Thank you very much. And the fact that I said "in the *internal* RDF engine' should have been a strong hint about the application specific context. Before you conclude that I've said something clearly stupid, I would very much appreciate it if you would please consider first that I haven't, but that you have misunderstood me, and read it again, OK? > You seem (??) to be assuming a use mode in which RDF is used for > information interchange between systems that immediately translate it > into some 'internal' non-RDF form, and that datatyping is primarily > relevant at the moment of such translation. Yes and no. Yes insofar as datatype values are concerned, because RDF can only denote them, but cannot represent them. No, for pretty much everything else (though I'm sure there are other exceptions). > That isn't what I see as > the typical use of RDF at all: I see it as more like a > general-purpose way of *recording* information (eg on a website) > where it can be accessed and used by a variety of processes, which > will typically themselves produce more RDF, often by manipulating RDF > content from a number of sources, which will itself get published in > RDF for use by other agents. Right. Like HTML and XML record information on a website and are interpreted in-the-raw rather than transformed into the DOM, or a SAX even stream, or an XML infoset, or a browser display pane... right, just like that ;-) And my arguments speak precisely to such a scenario, such that the RDF graph *records* knowledge but is not necessarily the actual literal data structure used by any "RDF engine" to store such knowledge -- but rather that "RDF engine" provides an abstract interface emulating the logical RDF graph structure on top of a (presumably) more efficient internal representation -- and in the case of knowledge which is implicit (albeit unambiguously implied) in the graph, such as the actual value that some TDL pairing denotes, that "RDF engine" may provide higher level access which is value based rather than directly idiom/graph based. The very same higher level "view" can be provided for all kinds of equalities and equivalences, such as those defined by rdfs:subPropertyOf, rdfs:subClassOf, daml:equivalentTo, etc. etc. >> (a) queries have to be three-part rather than just two-part > > ?? WHY? If one of the idioms entails the other, then you only have to > query the one that is 'downstream'. You are presuming an "RDF engine" that has built-in RDFS support. The point of a 'local' idiom is to be able to express and query typed data literals without additional RDFS expressed knowledge (with the exception being any knowledge mandated by the MT such as produced by closure rules, etc. which any given implementation must and can take into account). That is explicitly in the desiderada, and our proposals/solutions should address the desiderada. > If the other idiom is used, the > query will be satisfied in either case. The doublet form entails the > triple, but not the reverse. So if both idioms are used, one only > needs to check the triple form in the query. The datatype triple idiom is entailed by a closure rule that depends on explicit RDFS knowledge about the datatyping property in the graph. The idiom itself, sans this extra, explicit knowledge not provided by the MT, is not sufficient to recognize datatyping properties as such. >> (b) queries have to specify schema statements for the datatype >> triple idiom (again, it's not local) > > Again, see previous messages for this. First, not really true; See my replies... it is true ;-) Without the rdfs:subClassOf rdfs:Datatype and rdfs:subPropertyOf rdf:value statements, the datatype property cannot be generically recongnized as a datatype property and therefore it is not a local idiom. >> But not offering the utility you attribute to it. Consider a context >> of mass syndication of knowledge from many many sources, in real-time. >> In order for the idiom to offer some representation of equivalent >> values, merging must be done on the introduction of every new statement, >> and if such merging is being done, the application is likely going >> to just insert the values and be done with it. > > In the sense you are using the term here, the bnode IS the 'value'. > Suppose for example we know that > > _:t34276 rdf:value "the phone number of the man in the red hat" . > > and later we figure out, and add the graph: > > _:t34276 xsd:number "8504348903" Firstly, it is hard to really consider your example since you're using fictitious, possibly fanciful datatypes, but presuming that xsd:number is analogous or equivalent to xsd:integer, the above case would be in error, since "the...red hat" is not a valid lexical form for xsd:integer. >> Thus, there are very very few conceivable contexts where (a) the application >> cannot perform the merging itself and (b) it is sure that all possible >> mergings have occurred. Thus, again, the actual real-world utility of >> the merged bNodes denoting value equality is an illusion. It just doesn't >> offer what you think it does. Not in practice. > > I disagree. If I were writing apps, I would be using this all the > time. In the RKF project if it gets merged with DAML, we will use > this kind of technique centrally in our reasoning engines (which make > several slightly naughty closed-world assumptions behind the scenes > in order to get answers shipped in a reasonable time.) The key phrase here is "in our reasoning engines". What happens within/inside a given application is one thing. What is part of the standard that *everyone* has to use is another. I have repeatedly said, and I'll say it again, that your motivations for keeping the datatype triple idiom seem based solely on utility it *might* offer particular implementations with regards to *internal* representations relevant to *application specific* processes -- and that such arguments have little to no basis for including some functionality in a global standard which is intended to provide an economical, portable, consistent, and optimal representation of knowledge. >> My point was that, just because two datatype triple bNodes >> are not the same bNode does not mean they don't denote the >> same value -- unless the RDF engine fully keeps up with >> all mergings of all equal values, > > Well, all the ones it knows about, sure. But that would be easy to do. Agreed, but with no need of merged bNodes. >> But the need for such coreference in the actual graph is limited >> at best. If an RDF application supports the datatypes in question >> then it does not care about coreference since it can determine >> that itself. > > But there may be many other aspects to the entity apart from its > value. IN the phone-numnber example, The phone number example is broken. Can you provide another, please? >>> That is exactly what I am doing. The RDF *is* the internal knowledge >>> base, and the nodes are the canonical internal reps.. >> >> Fine. Then use the datatype triple idiom as an application specific >> representation if you like (though there are better ways). But insofar >> as knowledge interchange is concerned, it has no utility and therefore >> no place in the standard. > > BUt the standard isn't ONLY to be used for knowledge interchange > between applications. It also is intended to be used for recording > and publishing content; I would say primarily to be used for that, > seems to me. If you really need to capture that explicitly in the graph, then just merge the doublet bNodes into a bag. Then, an application knows if/which doublet values are not accounted for and which are considered equal. >> Given my comments above and elsewhere, since I consider the utility >> offered by the datatype triple to not apply in contexts of either >> mass syndication or datatype aware applications -- which I consider >> to cover nearly all RDF contexts > > Well, maybe that is the problem here. There are many potential uses > of the SW that do not fall into those two categories. The B2B > examples which Tim talks about do not, for example. Huh?! Of course they do. Please explain how they do not. It's B2B, eCommerce, Web Services, whatever you want to call it, that has been screaming the most for strict datatyping in RDF -- and its such industries that must deal continually with masses of knowledge coming from disparate sources -- partners, clients, suppliers, regulatory agencies, etc. etc. B2B is probably the *best* example of a context with a high degree of multi-source syndication and acute need of datatype aware processing. >> Yes, but inferencing will be based on values, not idioms, and >> thus the idioms used to *interchange* the knowledge will become >> transparent or even discarded in the application > > No, no, not at all!! Very important point !! RDF is to be used to > support inference DIRECTLY. One does inference *in* RDF. And > inference is based on syntactic forms, which include what we have ben > calling 'idioms' . They will not become transparent or discarded; > they are the very medium in which inference takes place, the > syntactic substrate of inferences. RDF(S) is the 'logic', not > something that gets converted or translated into some other logic. Then query by value will never succeed since literals are not required to be canonical lexical forms. Since the RDF graph can *never* contain values as syntactic components, the graph itself will *never* provide all that is required for determining equivalence of values. I understand that you want the shared bNode of the datatype triple idiom to serve that purpose, but it never can do so reliably, consistently, practically without the help of datatype aware applications, and in such a context, it is not needed. The lexical form is just a means to an end. Applications must use lexical forms and datatyping idioms because RDF has no native datatypes and values are not part of the graph grammar. But any application that cares about typed data literals does not care about the lexical form, but about the value itself. Granted, some may find the historical aspects of which lexical form were used interesting -- and when it comes to re-express some value in an RDF graph for further interchange, some lexical form must be chosen (along with an actual datatype) but these are secondary issues, just like round tripping of qnames. They are not central to the whole point of typed data literals. Really. What you want to accomplish with the datatype triple can be accomplished in many other (better) ways within a given application, and the coreference between values need not be explicit in the RDF graph. >> Again, I'll repeat, since I seem to be failing to communicate >> this one point: >> >> 1. We must have a fully local idiom. >> 2. The datatype triple idiom is not a fully local idiom. >> 3. The doublet idiom is the only fully local idiom. >> 4. If we must choose one, we must choose the doublet idiom. >> 5. There is no real utility offered by the datatype triple idiom. >> 6. There is no sufficiently motivating reason to include the datatype >> triple idiom. >> > > None of the idioms are fully local in your sense, and there is > genuine utility offered by the triple form. We really do seem to be at loggerheads about these two points. >> 2. My preference has always been for untidy literals and literals >> as subjects (a'la P++) in conjunction with rdfs:range for >> global typing. > > Well, Ive given up on literals as subjects. Fair enough. >> 3. If you can make that work, great. >> >> Bob ex:age _:1:"30" . It's just an (untidy) literal. >> >> Bob ex:age _:1:"30" . It's an integer. >> _:1:"30" rdf:dtype xsd:integer . >> >> Bob ex:age _:1:"30" . It's an integer. >> ex:age rdfs:range xsd:integer . > > That could work, sure, and I like that also since the datatyping is > 'about' the literal, which seems intuitively correct. But I thought > that tidy-literals was now kind of a done deal because of Dan C's > legacy use arguments. Well, lots of discussion both on the WG list and interest and logic clarified that there really is not conflict -- that it was up to the query user or engine to decide whether datatyping was to be taken into account or not, and that there was even no problem with having tidy literals in that context since the datatype interpretation was not based solely on the literal. Dan never conceded to that evidence, even though everyone else did. The bNode global idiom was just a way to move forward around the issue -- a real political compromise. However, I think that in retrospect, the manditory bNode has one very nice (unexpected at the time) feature, that the bNode consistently denotes the actual value, and in a specific datatype aware implementation, can be replaced or augmented with the actual value as a means of optimization/enhancement. Just as your example pics in your summary suggest, where the value is depicted on the bNode. >> The datatype triple idiom cannot be distinguished (generically) >> from any other triple without the schema knowledge. > > NONE of the idioms can be. I've already addressed this above. The doublet idiom can. > They are all perfectly well-formed and > meaningful RDF when used with a non-datatype uriref. One might make > an informed guess that the use of rdf:dtype is a strong hint that its > object is intended to be a datatype name, but its only a guess. It's far from a guess. The range of rdf:dtype is rdf:Datatype, and that is mandated by the MT, which an application has a right to presume, even if it is not explicit in the graph. > In > the case of the d.t. triple, you can make an analogous informed > guess: if the property arc of a triple with a literal object is > anything other than rdf:value, then it's probably intended to be a > datatype name. Uhhhh, like xxx foo:widget _:1 . _:1 abc:wombat "34971918374" . where supposedly 'abc:wombat' is a datatype and "34971918374" is a lexical form of that datatype...? Nope. Wrong. _:1 is simply some kind of qualified value with some property 'abc:wombat' (whatever that is) with a literal value "34971918374". It's *not* evident that it is a typed data literal, insofar as can be determined from the actual idiom/subgraph. Now, if we had either xxx foo:widget _:1 . _:1 rdf:value "34971918374" . _:1 rdf:dtype abc:wombat . or xxx foo:widget _:1 . _:1 abc:wombat "34971918374" . *plus* somewhere else: abc:wombat rdfs:subPropertyOf rdf:value . abc:wombat rdfs:subClassOf rdfs:Datatype . then it *would* be clear that abc:wombat is a datatype property and "34971918374" is a lexcal form for that datatype. Now... which of the above two idioms is independent of statements using vocabulary from RDFS? Which of the two idioms is trully 'local'? The doublet idiom. >> No. For the doublet idiom, the presence of the rdf:dtype property >> tells us it is a datatype. > > No, it does not. Really. The semantics only says that rdf:dtype is a > subproperty of rdf:type, and that IF its object is a datatype THEN > some special conditions apply. It does not require that it is, in > fact, a datatype; if it isn't, then those special conditions do not > apply, is all. My understanding of the proposal was that the object of rdf:dtype is an rdfs:Datatype. I.e. that rdf:dtype rdfs:range rdfs:Datatype would be a closure rule (or otherwise specified) in the MT and should be presumed by all applications whether explicit in the graph or not. If this is not the case, then it should be. Otherwise we have no trully local idiom (per my definition of 'local' which is what I think the desiderada means by 'local') I also still think that we need something like rdfs:drange to differentiate between rdf:type and rdf:dtype assertions. It may be that I wish to only assert a range constraint on the value space of a given property, but don't want to create a whole new non-lexical type to do so. I.e. I may want to say that all values of ex:age are integer values, but I don't care about the actual datatype used, and thus would say ex:age rdfs:range xsd:integer . which simply says that I expect all values to be integers even if locally typed differently, and that would thus entail rdf:type and not rdf:dtype. If I further wanted to constrain property values to lexical representations of xsd:integer, then I'd say ex:age rdfs:drange xsd:integer . which would entail rdf:dtype for property values. In the MT we would then specify that the following are 'given' rdfs:drange rdfs:subPropertyOf rdfs:range . rdfs:drange rdfs:range rdfs:Datatype . Thus, the combination of the two statements above along with rdfs:dtype rdfs:range rdfs:Datatype . allows applications to reliably recognize the doublet idiom as a datatyping idiom and the URIref value of rdf:dtype as a datatype irrespective of anything else in the graph. > but then we > could do the same kind of thing for the triple idiom as well. How? since we cannot reference specific datatypes in the MT. Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com ------ End of Forwarded Message
Received on Wednesday, 13 February 2002 05:52:49 UTC