- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Thu, 14 Feb 2002 09:57:46 +0200
- To: Pat Hayes <phayes@ai.uwf.edu>
- CC: "McBride, Brian" <bwm@hplb.hpl.hp.com>
On 2002-02-13 21:05, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote: >>> Suppose for example we know that >>> >>> _:t34276 rdf:value "the phone number of the man in the red hat" . >>> >>> and later we figure out, and add the graph: >>> >>> _:t34276 xsd:number "8504348903" >> >> Firstly, it is hard to really consider your example since >> you're using fictitious, possibly fanciful datatypes, but >> presuming that xsd:number is analogous or equivalent to >> xsd:integer, the above case would be in error, since >> "the...red hat" is not a valid lexical form for xsd:integer. > > Read it as xsd:integer (sorry, I meant to use that.) > > It is not an error. In the triple form, the datatype only applies to > the literal *in the same triple*. If we used the doublet form then > this would be an error: that is precisely my point. Well, I thought that xxx ddd "lll" . entails xxx rdf:dtype ddd . xxx rdf:value "lll" . where ddd rdf:type rdfs:Datatype . ??? And what about if there is global typing as well: ppp rdfs:range xsd:integer . which implies _:t34276 rdf:dtype xsd:integer . which means that _:t34276 rdf:value "the...hat" . is an error. ??? >> Huh?! Of course they do. Please explain how they do not. > > Well, consider the scenario in which a bank machine 'agent' checks > out the credentials of a proposed request to hand out a large sum of > cash, by checking bank accounts, security records, credit records and > so forth. None of that is concerned with mass syndication - it will > have no interaction with web sites in the conventional sense at all. > I guess it will use information stored in databases, but I would > expect that all to be done *through* RDF (or its successors, eg the > hypothetical OWL). This presumes a closed system with a homogenous ontology. While this may be the case in some/many cases, I don't think it is a given. >>> No, no, not at all!! Very important point !! RDF is to be used to >>> support inference DIRECTLY. One does inference *in* RDF. And >>> inference is based on syntactic forms, which include what we have ben >>> calling 'idioms' . They will not become transparent or discarded; >>> they are the very medium in which inference takes place, the >>> syntactic substrate of inferences. RDF(S) is the 'logic', not >>> something that gets converted or translated into some other logic. >> >> Then query by value will never succeed since literals are not >> required to be canonical lexical forms. > > True, but query by value is not a guaranteed safe option in the RDF > world in any case. It cannot be in any open-world setting , for just > the reason you mention. Huh? Only if you are stuck with non-canonical representations and have to base your value comparisons on lexical form string comparisons. In that case, no. But comparison by *value* is comparison by *value*. The integer 5 is the integer 5 is the integer 5 regardless of whether it has a thousand possible lexical representations in a thousand different datatypes. If I can't trust that, for known/supported datatypes which my application can map to actual *values* that I can trust those value comparisons, then to hell with RDF. I can't ever do what I need (or what Nokia or most companies need). Of course, that's not the case. Query by value does work. But it means that queries operate at a virtual layer above the literal RDF graph where datatyping idioms are distilled into actual values and those that can't be are unusable for such value-based queries (comparisons will always fail). >> Since the RDF graph can *never* contain values as syntactic >> components, the graph itself will *never* provide all that is >> required for determining equivalence of values. > > Not at all. It only needs to contain enough information to enable a > (properly savvy) engine to unambiguously reconstruct that value > representation if and when it needs it. Which is what datatyping in > RDF is for, right? Storing the values on nodes is just a caching > device for improved performance: not at all a bad idea, of course, > when it can be done, but it doesnt change the basic information > encoded in the graph. You seem to be a proponent of both sides of the argument ;-) Or I just don't get what you were trying to say about unreliablity of value-based queries... >> But any application that cares about typed data literals does >> not care about the lexical form, but about the value itself. > > I strongly disagree. Again, you are assuming that the only purpose of > datatyping information is to facilitate the translation of RDF into > something else. No. Read again what I said. I said that *applications* that care about the values won't care about the lexical forms. I never said there wouldn't be applications that won't care about the lexical form itself. Though I consider direct comparisons of lexical forms, which are non-canonical, to be just plain dumb in most cases since there could be an infinite number of variants and why would *any* application want to muck with that?! The only point of a *typed* literal is to get to a value. If you have an untyped literal -- that wihin a given context has consistent and unique meaning, that's something entirely different -- and a typed literal is not such a literal. >> Dan never conceded to that evidence, even though everyone else did. > > I didn't either. The MT trouble with that approach is that one node > cannot denote several things at once. Thats why I can't accommodate > the simple in-line usage which Peter wants, where > > aaa ex:age "10" . > ex:age rdfs:range xsd:integer . > > implies that aaa is ten. But it does. I.e. the literal node "10" does not denote the value 'ten'. The interpretation of the *combination* of "10" and xsd:integer provides the value 'ten'. Thus, whether literal nodes are tidy or not is irrelevant since it is not the literal node that is bearing the denotation of 'ten' by itself. > Like I said, I didnt intend the diagram to indicate a syntactic > labelling. Maybe I'll re-draw the diagrams :-) I didn't say it did. I just said that it illustrates a possible implementation-specific optimization. Please don't change the diagrams. They are very clear. Though you may wish to make a comment that the value is just illustrative of what is implied by the actual graph syntax. >> It's far from a guess. The range of rdf:dtype is rdf:Datatype, and >> that is mandated by the MT, > > Sorry, wrong. It could be, but (1) that would make the RDF break if > datatyping were removed; I don't see how, if those automatic statements are only valid when datatyping is present in an application. If "datatyping is removed" then so are those automatic statements about datatyping as they are a part of the datatyping that is removed. > and (2) we could make an analogous semantic > mandate which would make the triples usage not a guess, also. It > would be the rule: > > aaa subPropertyOf rdf:value --> aaa rdf:type rdfs:Datatype . But this still requires the presence of the actual statement aaa subPropertyOf rdf:value so it's still not local as such a statement cannot be generically and globally mandated by the spec. >> The doublet idiom. > > No, it isn't, because you also need (somewhere else) > > abc:wombat rdf:type rdfs:Datatype . You'll get that from the automatic statement rdf:dtype rdfs:range rdfs:Datatype . which for any ddd given xxx rdf:dtype ddd . entails ddd rdf:type rdfs:Datatype . > And I'd be happy to add the condition > > aaa rdfs:subPropertyOf rdf:value . > > as a generally assumed built-in condition for the truth of any triple > xxx aaa "uuu" . > > which would eliminate the first of the two triple conditions. But again, it cannot provide a local solution. It still depends on some other subPropertyOf statement specific to the datatype identity itself. It is not generic. >> I also still think that we need something like rdfs:drange >> to differentiate between rdf:type and rdf:dtype assertions. > > Graham also suggested this, but I fail to see what utility it has. > Basically the point of rdf:dtype is to block unwanted inheritance > inferences; but there is no need to block them twice. It's not for blocking, it's for restraining. >> It may be that I wish to only assert a range constraint >> on the value space of a given property, but don't want to >> create a whole new non-lexical type to do so. I.e. I >> may want to say that all values of ex:age are integer >> values, but I don't care about the actual datatype used, >> and thus would say >> >> ex:age rdfs:range xsd:integer . >> >> which simply says that I expect all values to be integers >> even if locally typed differently, and that would thus entail >> rdf:type and not rdf:dtype. > > Well, its not hard to do this already, just range it to a > (sub+super)class of the datatype class. That is a bit awkward, > admittedly; but is it worth adding another item to the rdf vocabulary > just to make things more convenient in one use case? If that's the only way to generically achieve a fully local idiom, yes. Any solution based on closure rules or any other mechanism that is dependent on some other explicit statement in the graph that names the datatype explicitly won't work. The automatic statements that I suggested in my latest proposal achieve a fully generic and local interpretation of the doublet idiom. > This only arises > in the case where we want to explicitly refer to a datatype class as > a property range but also do not want to invoke datatyping on that > property, which seems to me to be an unusual combination. I just gave a very good example of that. It may be the less common combination, but it's quite reasonable, and I expect that folks will want to use existing value-space only interpretation of rdfs:range without the extra datatyping implications. > (Why use > the datatype name to refer to the class if you don't intend to imply > datatyping? Why then worry about blocking lexical space constraints for subPropertyOf relations? Because instances of rdfs:Datatype are both Classes having value spaces as well as having those extra lexical spaces and mappings. We still should be able to treat them as Classes with value spaces. > Its not as though 'xsd:integer' is the only possible name > for the set of integers. ) But it is a standardized, established, understood name for integers (along with a defined lexical representation for them). I admit that this is more convenience than necessity, so I'm not demanding it, per se, but I think it should be very seriously considered. >> How? since we cannot >> reference specific datatypes in the MT. > > We can use rdfs:Datatype in the same way, or just pass off the > datatyping recognition to an external mechanism. Again, you seem to be missing the point of generic processing of datatyping idioms. It is important to be able to identify which URIrefs are datatypes and which subgraphs are datatyping idioms without application-specific knowledge about which URIrefs are datatypes. And the global and datatype triple idioms require idiom-external statements to do that (which is fine) whereas, with my proposed automatic statements in the DT spec the doublet idiom does not need idiom-external statements (other than those automatic ones, which can then be acceptably hard-coded in the app since they are part of the actual standard). Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Thursday, 14 February 2002 03:18:03 UTC