Re: A basis for convergence and closure? from Pat Hayes on 2002-02-07 (w3c-rdfcore-wg@w3.org from February 2002)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Thu, 7 Feb 2002 10:50:55 -0600
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101458b88857adc874@[65.212.118.208]>
>Pat Hayes wrote:
>>
>>  >
>>  >
>>  >It seems that, from all the past couple of weeks discussions,
>>  >there are the following characteristics on folks wish lists
>>  >(this is a partial recap of some of the desiderada):
>>  >
>>  >1. A working MT (duh ;-)
>>  >2. Tidy literals
>>  >3. A global/implicit idiom
>>  >4. A local/explicit idiom
>>  >5. Same vocabulary valid for both local and global idioms
>>  >6. Free combination of local and global idioms without conversion
>>  >7. The ability to conduct queries by value
>>  >8. The ability to conduct queries by literal
>>  >9. Datatype URIs denote the entire datatype, as defined by
>>  >    the datatype "owner", not only one of its components
>
>Pat, I think your summary is sharp enough so that I would suggest to use
>it as a starting point for a convergence write-up.

Im working on a better writeup and will send it to you and Patrick later today.

>We could produce it
>in a quick and streamlined fashion by simply referring to the relevant
>sections of the TDL, S, and  Pat's documents where appropriate, without
>replicating existing content. I think the main focus should be on
>capturing two things: a list of items we managed to agree on, and a list
>of options where we still have to make a final choice. What do you think
>guys?
>
>>  I think I can summarize a way to have all the above with a minimal
>>  imposition of particular idioms. Consider the following story, which
>>  I think is very similar to  Patrick's, but expressed slightly
>>  differently. This is a summary of the 'simple' version of the 'Oh my
>>  GOD' proposal. To hell with the subtle version.
>>
>>  I. Literal nodes are tidy, and literals denote themselves (ie we can
>>  treat literals as syntactic labels).
>
>I think (I) is for the "agreed" list.
>
>>  II. rdf:value means that its subject can be represented textually by
>>  its object (somehow), so
>>  _:s rdf:value "10" .
>>  means that '10' is one possible way to 'write' whatever it is that 
>>_:s denotes.
>
>(II) is stated flexibly enough so I'd also put it on the "agreed" list.
>
>>  Call that a 'value triple'. It is the basic (weakest) way of linking
>>  a value to a literal's lexical form.
>>
>>  III. There are two basic ways to say more about the relationship
>>  between the subject of a value triple and the lexical form of the
>>  literal.
>>
>>  IV.  One is to use a 'tighter' property, ie a subPropertyOf
>>  rdf:value. So for example (Graham's F case) one could assert
>>  ex:ISO8601 rdf:subPropertyOf rdf:value .
>>  and then
>>  _:s ex:ISO8601 "10" .
>>  would say that the subject node denotes something that could be
>>  written as '10' *using the conventions associated with that URI*.
>>
>>  This allows for 'multiple' such assertions, where the meaning would
>>  be that the literal/value pair had to conform to all the named
>>  constraints, ie a conjunctive reading, as usual. It also allows for
>>  alternative lexical forms for the same value, as in
>>  _:s xsd:realnumber "10.3" .
>>  _:s ex:germannumber "10,3" .
>
>(IV) would be on the "options" list, right?
>
>>  V. The other is to assert a special datatyping triple along with the
>>  value triple, using rdf:dtype (or rdfd:type, whatever), giving the
>>  doublet case:
>>
>>  _:s rdf:value "10" .
>>  _:s rdf:dtype xsd:integer .
>>
>>  This has exactly the same meaning as
>>  _:s xsd:integer "10" .
>>  when xsd:integer is known to be a datatype, and the two forms can be
>>  used interchangeably or together.
>>
>>  rdf:dtype is always a subPropertyOf rdf:type.
>
>(V) is another "option"
>
>>  VI. Each of these two cases IV and V requires a special semantic
>>  condition, and those conditions refer to externally defined datatype
>>  mappings. In order for an engine to make use of these mappings,  the
>>  relevant uris must be declared to be datatypes  by an assertion like
>>  xsd:integer rdf:type rdf:Datatype .
>>  or
>>  xsd:integer rdfs:subPropertyOf rdf:value .
>>
>>  It is acceptable for the graph to entail these assertions, but I
>  > suggest that we recommend that they be made explicit.
>
>(VI) is also an option, because I believe that there are further
>solutions besides (IV) and (V), see below.

OK, but part of my point is that we don't have to *choose between* 
these options; they are all compatible, and we can let people use any 
of them or all of them if they want.

>
>>  VII. Any graph that entails one of the forms above has the same
>>  meaning, as far as datatyping is concerned.  There are several ways
>>  to take advantage of this.
>>
>>  One is to make <dtype> a subPropertyOf <type>, which makes them
>>  equivalent, so that the datatyping effect of the doublet works for
>>  all type assertions applied to a datatype. That is the dangerous way
>>  to do it, since it will break if there are two datatypes with
>>  overlapping value spaces but incompatible lexical spaces. (The
>>  symptom of 'breaking' will arise when the datatyping machinery is
>>  invoked; exactly what happens in the bad case is not fully
>>  determined. One possibility would be that the datatyping constraint
>>  machinery will barf. Another is that it will work, but produce
>>  inconsistent or misleading answers.) Nevertheless, this is a user
>>  option that may be useful in situations where it is known that no
>>  such datatype clashes will arise, or to handle legacy RDF code
>>  written using rdf:type. We note that it is always safe to do this in
>>  the untyped case, ie where no datatypes have been declared.
>>
>>  Another way is to add a further constraint on rdf:dtype, which allows
>>  particular kinds of assertion to entail rdf:dtype triples. To do this
>>  with full generality would require the ability to write RDF rules
>>  (implications), but the only case that seems to be of widespread
>>  interest can be captured by one simple 'ranging' rule that could be
>>  incorporated into the general semantic conditions, viz:
>>
>>  ddd rdf:type rdf:Datatype .
>>  aaa rdf:range ddd .
>>  bbb aaa ccc .
>>  --->
>>  ccc rdfd:type ddd .
>
>Alternatively, why not just pretend there is a 'rule'
>
>   xxx rdf:type ddd .
>   xxx rdf:value vvv .
>   --->
>   xxx ddd vvv .

Well, use rdf:dtype rather than rdf:type, but yes; in fact this 
inference is valid in the proposed MT extension in any case. But this 
doesnt do range-datatyping by itself, as there is no RDFS inference 
path from rdfs:range to rdf:dtype. I think its too dangerous to make 
people use rdf:type, in general. It would be very tricky to be sure 
that some piece of RDFS closure reasoning didn't accidentally make an 
unintended or inconsistent datatype class association. The rdfs 
closure rules can have very 'remote' consequences, involving things 
like subclasses of ranges of subproperties.

>? (or additionally with   ddd rdf:type rdf:Datatype)
>
>As a further alternative to using rules, I could imagine that we
>consistently use pairs for typed values throughout the idioms. Thus, in
>
>   _:1 prop _:2
>   _:2 rdf:type xsd:int
>   _:2 rdf:value "10"
>
>_:2 would denote a pair, just as in TDL and S-P, but then in
>
>   _:1 prop _:2
>   _:2 xsd:int "10"
>
>_:2 could also denote the same pair (instead of element of a value space
>as it is defined now in S-A).
>
>The above works just fine without introducing rdf:dtype. Of course, this
>suggestion would make datatypes tightly coupled with their lexical
>representations (so Patrick would/should like it ;) My guess is that the
>question of semantic adequacy raised by Pat might not be very critical;
>if the current use assumes that typed values go together with their
>lexical representations (XSD also suggests that), then we could just
>call such pairs "typed values" and that's it.
>
>To summarize, I'm suggesting additional two "options". First, a
>different kind of a rule-based approach suggested by Pat, second, a
>consistent pair-oriented view throughout idioms.

Well, that means that Jenny's age is <"10",10>, which seems silly to 
me. If we say that, we will have people wanting to use selector 
properties to choose the number from the pair, or something. And if 
we omit the typing information, what is her age? The whole point of 
my proposal was to keep the values of the bnodes being simple values 
rather than pairs. I can see no utility to the 'pairs' idea, since 
while it allows you to use rdf:type, it also has the consequence that 
in practice all the inferences you would want to use it for are 
blocked, eg none of the datatypes are subClasses of any of the 
others, on this proposal. So the 'inference-blocking' effect is 
exactly what you get from using rdf:dtype, but it's kind of hidden 
from view (and therefore more likely to produce errors and problems.) 
I think its better to have the datatyping business out in the open 
where one can see that it is being used.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 7 February 2002 11:50:27 UTC