Re: One final step to datatyping convergence and closure? from Patrick Stickler on 2002-02-14 (www-archive@w3.org from February 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 14 Feb 2002 09:57:46 +0200
To: Pat Hayes <phayes@ai.uwf.edu>
CC: "McBride, Brian" <bwm@hplb.hpl.hp.com>
Message-ID: <B8913A9A.E586%patrick.stickler@nokia.com>
On 2002-02-13 21:05, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote:


>>>  Suppose for example we know that
>>> 
>>>  _:t34276 rdf:value "the phone number of the man in the red hat" .
>>> 
>>>  and later we figure out, and add the graph:
>>> 
>>>  _:t34276 xsd:number "8504348903"
>> 
>> Firstly, it is hard to really consider your example since
>> you're using fictitious, possibly fanciful datatypes, but
>> presuming that xsd:number is analogous or equivalent to
>> xsd:integer, the above case would  be in error, since
>> "the...red hat" is not a valid lexical form for xsd:integer.
> 
> Read it as xsd:integer (sorry, I meant to use that.)
> 
> It is not an error. In the triple form, the datatype only applies to
> the literal *in the same triple*. If we used the doublet form then
> this would be an error: that is precisely my point.

Well, I thought that

   xxx ddd "lll" .

entails

   xxx rdf:dtype ddd .
   xxx rdf:value "lll" .

where

   ddd rdf:type rdfs:Datatype .

???

And what about if there is global typing as well:

   ppp rdfs:range xsd:integer .

which implies

   _:t34276 rdf:dtype xsd:integer .

which means that 

   _:t34276 rdf:value "the...hat" .

is an error.

???


>> Huh?! Of course they do. Please explain how they do not.
> 
> Well, consider the scenario in which a bank machine 'agent' checks
> out the credentials of a proposed request to hand out a large sum of
> cash, by checking bank accounts, security records, credit records and
> so forth. None of that is concerned with mass syndication - it will
> have no interaction with web sites in the conventional sense at all.
> I guess it will use information stored in databases, but I would
> expect that all to be done *through* RDF (or its successors, eg the
> hypothetical OWL).

This presumes a closed system with a homogenous ontology. While
this may be the case in some/many cases, I don't think it is a
given.


>>>  No, no, not at all!! Very important point !! RDF is to be used to
>>>  support inference DIRECTLY. One does inference *in* RDF. And
>>>  inference is based on syntactic forms, which include what we have ben
>>> calling 'idioms' . They will not become transparent or discarded;
>>>  they are the very medium in which inference takes place, the
>>>  syntactic substrate of inferences. RDF(S) is the 'logic', not
>>>  something that gets converted or translated into some other logic.
>> 
>> Then query by value will never succeed since literals are not
>> required to be canonical lexical forms.
> 
> True, but query by value is not a guaranteed safe option in the RDF
> world in any case. It cannot be in any open-world setting , for just
> the reason you mention.

Huh? Only if you are stuck with non-canonical representations
and have to base your value comparisons on lexical form
string comparisons. In that case, no.

But comparison by *value* is comparison by *value*. The integer
5 is the integer 5 is the integer 5 regardless of whether it
has a thousand possible lexical representations in a thousand
different datatypes.

If I can't trust that, for known/supported datatypes which my
application can map to actual *values* that I can trust those
value comparisons, then to hell with RDF. I can't ever do what
I need (or what Nokia or most companies need).

Of course, that's not the case. Query by value does work. But
it means that queries operate at a virtual layer above the literal
RDF graph where datatyping idioms are distilled into actual
values and those that can't be are unusable for such value-based
queries (comparisons will always fail).

>> Since the RDF graph can *never* contain values as syntactic
>> components, the graph itself will *never* provide all that is
>> required for determining equivalence of values.
> 
> Not at all. It only needs to contain enough information to enable a
> (properly savvy) engine to unambiguously reconstruct that value
> representation if and when it needs it. Which is what datatyping in
> RDF is for, right? Storing the values on nodes is just a caching
> device for improved performance: not at all a bad idea, of course,
> when it can be done, but it doesnt change the basic information
> encoded in the graph.

You seem to be a proponent of both sides of the argument ;-)

Or I just don't get what you were trying to say about
unreliablity of value-based queries...

>> But any application that cares about typed data literals does
>> not care about the lexical form, but about the value itself.
> 
> I strongly disagree. Again, you are assuming that the only purpose of
> datatyping information is to facilitate the translation of RDF into
> something else. 

No. Read again what I said. I said that *applications* that care
about the values won't care about the lexical forms.

I never said there wouldn't be applications that won't care
about the lexical form itself. Though I consider direct
comparisons of lexical forms, which are non-canonical, to
be just plain dumb in most cases since there could be an
infinite number of variants and why would *any* application
want to muck with that?!

The only point of a *typed* literal is to get to a value. If
you have an untyped literal -- that wihin a given context has
consistent and unique meaning, that's something entirely
different -- and a typed literal is not such a literal.


>> Dan never conceded to that evidence, even though everyone else did.
> 
> I didn't either. The MT trouble with that approach is that one node
> cannot denote several things at once. Thats why I can't accommodate
> the simple in-line usage which Peter wants, where
> 
> aaa ex:age "10" .
> ex:age rdfs:range xsd:integer .
> 
> implies that aaa is ten.

But it does. I.e. the literal node "10" does not
denote the value 'ten'.

The interpretation of the *combination* of "10" and
xsd:integer provides the value 'ten'. Thus, whether
literal nodes are tidy or not is irrelevant since it
is not the literal node that is bearing the denotation
of 'ten' by itself.


> Like I said, I didnt intend the diagram to indicate a syntactic
> labelling. Maybe I'll re-draw the diagrams :-)

I didn't say it did. I just said that it illustrates a possible
implementation-specific optimization.

Please don't change the diagrams. They are very clear. Though you
may wish to make a comment that the value is just illustrative of
what is implied by the actual graph syntax.


>> It's far from a guess. The range of rdf:dtype is rdf:Datatype, and
>> that is mandated by the MT,
> 
> Sorry, wrong. It could be, but (1) that would make the RDF break if
> datatyping were removed;

I don't see how, if those automatic statements are only valid when
datatyping is present in an application. If "datatyping is removed"
then so are those automatic statements about datatyping as they
are a part of the datatyping that is removed.

> and (2) we could make an analogous semantic
> mandate which would make the triples usage not a guess, also. It
> would be the rule:
> 
> aaa subPropertyOf rdf:value --> aaa rdf:type rdfs:Datatype .

But this still requires the presence of the actual statement

   aaa subPropertyOf rdf:value

so it's still not local as such a statement cannot be generically
and globally mandated by the spec.


>> The doublet idiom.
> 
> No, it isn't, because you also need (somewhere else)
> 
> abc:wombat rdf:type rdfs:Datatype .

You'll get that from the automatic statement

   rdf:dtype rdfs:range rdfs:Datatype .

which for any ddd given

   xxx rdf:dtype ddd .

entails

   ddd rdf:type rdfs:Datatype .

> And I'd be happy to add the condition
> 
> aaa rdfs:subPropertyOf rdf:value .
> 
> as a generally assumed built-in condition for the truth of any triple
> xxx aaa "uuu" .
> 
> which would eliminate the first of the two triple conditions.

But again, it cannot provide a local solution. It still depends
on some other subPropertyOf statement specific to the datatype
identity itself. It is not generic.


>> I also still think that we need something like rdfs:drange
>> to differentiate between rdf:type and rdf:dtype assertions.
> 
> Graham also suggested this, but I fail to see what utility it has.
> Basically the point of rdf:dtype is to block unwanted inheritance
> inferences; but there is no need to block them twice.

It's not for blocking, it's for restraining.

>> It may be that I wish to only assert a range constraint
>> on the value space of a given property, but don't want to
>> create a whole new non-lexical type to do so. I.e. I
>> may want to say that all values of ex:age are integer
>> values, but I don't care about the actual datatype used,
>> and thus would say
>> 
>>   ex:age rdfs:range xsd:integer .
>> 
>> which simply says that I expect all values to be integers
>> even if locally typed differently, and that would thus entail
>> rdf:type and not rdf:dtype.
> 
> Well, its not hard to do this already, just range it to a
> (sub+super)class of the datatype class. That is a bit awkward,
> admittedly; but is it worth adding another item to the rdf vocabulary
> just to make things more convenient in one use case?

If that's the only way to generically achieve a fully local idiom, yes.

Any solution based on closure rules or any other mechanism that is
dependent on some other explicit statement in the graph that names
the datatype explicitly won't work.

The automatic statements that I suggested in my latest proposal achieve
a fully generic and local interpretation of the doublet idiom.

> This only arises 
> in the case where we want to explicitly refer to a datatype class as
> a property range but also do not want to invoke datatyping on that
> property, which seems to me to be an unusual combination.

I just gave a very good example of that. It may be the less common
combination, but it's quite reasonable, and I expect that folks
will want to use existing value-space only interpretation of rdfs:range
without the extra datatyping implications.


> (Why use 
> the datatype name to refer to the class if you don't intend to imply
> datatyping? 

Why then worry about blocking lexical space constraints for subPropertyOf
relations?

Because instances of rdfs:Datatype are both Classes having value spaces
as well as having those extra lexical spaces and mappings. We still
should be able to treat them as Classes with value spaces.

> Its not as though 'xsd:integer' is the only possible name
> for the set of integers. )

But it is a standardized, established, understood name for integers
(along with a defined lexical representation for them).

I admit that this is more convenience than necessity, so I'm
not demanding it, per se, but I think it should be very seriously
considered.


>> How? since we cannot
>> reference specific datatypes in the MT.
> 
> We can use rdfs:Datatype in the same way, or just pass off the
> datatyping recognition to an external mechanism.

Again, you seem to be missing the point of generic processing
of datatyping idioms. It is important to be able to identify
which URIrefs are datatypes and which subgraphs are datatyping
idioms without application-specific knowledge about which
URIrefs are datatypes.

And the global and datatype triple idioms require idiom-external
statements to do that (which is fine) whereas, with my proposed
automatic statements in the DT spec the doublet idiom does not
need idiom-external statements (other than those automatic ones,
which can then be acceptably hard-coded in the app since they
are part of the actual standard).

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Thursday, 14 February 2002 03:18:03 UTC