RE: ACTION 2001-11-02#02: Datatyping use-cases from CC/PP from Pat Hayes on 2001-11-15 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Thu, 15 Nov 2001 13:42:58 -0600
To: Patrick.Stickler@nokia.com
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101068b819c44f6e0d@[65.212.118.147]>
>  > >The question, then, arises about *when* such assertions are made.
>>
>>  Ah, that is a question about an RDFS inference engine. One kind of
>>  engine might maintain all the RDFS consequences of everything it
>>  knows, for example, like the closures in the (newer version of the)
>>  MT. Uses a lot of memory, I guess, but it will quickly find any
>>  problems and give you all the type information you could possibly
>>  want or need. Other strategies are also possible.
>
>But it may be a good idea to clearly document that in cases
>where no local type is specified and all knowledge about
>type is determined from rdfs:range properties, that if
>literal objects are separated from the context of their original
>statement by inference based on subPropertyOf relations, that
>the lexical form of the literal may not be a member of the
>lexical space of data type defined by the rdfs:range of
>the inference bound superordinate property.
>
>I.e, there is a potential for "BOOM" and implementors of
>RDFS based inference engines should be made aware of it.
>
>Eh?

Well, this raises a whole lot of tricky issues. I see what you mean, 
but you are here using a whole lot of assumptions about the 
'preferred' meanings of different 'kinds' of RDFS assertions - that 
local ones are somehow more definitive that less local ones when it 
comes to typing, that assertions that are 'separated' need to be 
treated differently from those that are not, that the 'original 
statement' has some kind of priority - none of which really have any 
RDFS meaning as such. These are all facts about HOW the information 
is encoded in RDFS, and in some cases about the HISTORY of that 
encoding; and I don't think that this kind of information can really 
be expressed in RDFS itself, or indeed that RDFS should be trying to 
represent such things. I would rather just try to give RDFS a 
coherent semantics which allows RDFS inference to be performed as 
'freely' as possible, explicitly without any need to keep track of 
the computational or inferential state of the graph or the reasoner, 
and be sure that any conclusions that are generated are valid. Any 
imposition of more subtle layers of meanings; where it is not what 
the RDFS graph says, but the way that it says it, that matters, seems 
to me to be going in a direction we explicitly do not want to be 
going in.

This is not to say, of course, that some specialized external 
reasoners might not want to check such properties of RDFS graphs and 
draw their own conclusions based on localization of information, for 
example; but if they do so, they are taking a risk, since RDFS 
provides no warranty that the information they extract has any 
particular meaning. They are rather in the position towards the RDF 
graph that a page-scraper is towards HTML text; it is free to squeeze 
as much meaning out as it can find, but the HTML provides no 
guarantees that what it gets will be correct.

Certainly we should not mislead implementors of inference engines, 
but I think that what we should tell them is that as far as RDFS 
inference is concerned, there simply are no notions of 'locality' or 
'primacy'. So if they set out rely on any such notions, particularly 
to resolve apparent conflicts, they are taking a risk of things going 
BOOM. But I don't think we can do better than to put the warnings on 
the container, as it were. (BTW, the RDFS inference itself isn't 
going to go boom; some hypothetical external datatype checker is 
going to explode, is the risk involved.)

>  > >There are no differences for RDFS insofar as "validity" are concerned
>>  >(catching contraditions, etc.) between this prescriptive
>>  (non assertive)
>>  >and descriptive (assertive) interpretation.
>>
>>  Ah, that is a relief. :-)
>
>;-)
>
>>  >The problem is in the case of non-locally typed literals when
>>  >it comes time to map them to values in some computer system's value
>>  >space. If, by inference based on subPropertyOf the literal gets
>>  >bound to a superordinate property with a superordinate data
>  > >type different from that of the original statement, the mapping
>>  >could be erroneous or even fail (with a parse error), if the
>>  >binding of literal to data type is based on the superordinate
>>  >property's range definition rather than the original statement
>>  >property's range definition.
>>  >
>>  >So, if a range definition asserts a binding of data type to literal,
>>  >when does that occur, and how can that binding be preserved
>>  throughout
>>  >common query and inference processes?
>>  >
>>  >That is my concern. I hope it is clear (given my batting average,
>>  >I won't be at all surprised if it isn't ;-)
>>
>>  Ah!! [Light goes on above head...]
>>
>>  Maybe I see what is bothering you. The purely descriptive model we
>>  currently have can deliver all inferable typing information to an
>>  external type-checker, and if it finds any contradictions then it can
>>  report those, or deliver enough information so that the checker can
>>  notice them. But that alone doesn't say which one of these
>>  type-assertions is the local one and the other is the global, or
>>  distant, one.  So if a type-checker wants to use the *local*
>>  information to make its decision - if it wants to treat THAT
>>  information as prescriptive rather merely descriptive, if I have this
>>  distinction right now - then it will be stymied, because the purely
>>  assertional nature of the RDFS reasoning process treats all sources
>>  of information as having equal claims on the truth, as it were. It
>>  doesn't enable the external type-checking engine to use a
>>  local-preferring default procedure to make its decision in the face
>>  of a conflict between local and global information about types,
>>  because it doesn't even have the distinction available. It can report
>>  a conflict, but it only does do neutrally; it doesn't take sides;
>>  whereas in this case you *want* it to take sides, in order that the
>>  external process can more rationally decide what to do.
>>
>>  Is that more or less right?
>
>Yes.

OK. That is one issue (above). This (next paragraph) is another 
issue. I would like us to explicitly refuse to deal with the above 
issue; and the next one is only an issue for the P(++) and maybe X(?) 
datatyping schemes; the U/S/DC schemes avoid it entirely. I agree it 
is a very important one, however.

>AND ;-) in addition, if there are no local types defined, and
>all type information is defined in terms of rdfs:range, and
>a literal which is a member of the lexical space of the data
>type of the range of the property of the original statement
>(yikes, what a clause! ;-) which is subsequently bound by inference
>to a superordinate property which has a different range and
>the data type of that different range has a different lexical
>space, then the literal may not be properly interpretable as
>it does not denote a lexical form that is a member of the
>lexical space of the non-original data type.

OK to the above, but not to this:

>I.e., the pairing of literal to data type is per the original
>statement and cannot change. If that pairing is defined by
>rdfs:range, then the assertion of data type for that literal
>must be made *before* any inference process can separate
>the literal from the data type defined for the property of
>the original statement.

All statements are original, and nothing changes. There is no notion 
of before and after. There is no RDFS state to provide such an 
ordering; inferences can be made in any order; they simply 
accumulate. The problem is better expressed by saying that the 
superordinate problem you describe will entail a contradictory 
conclusion about the datatype of the literal; it would be like saying

aaa rdf:type ThisDataType .
aaa rdf:type ThatDataType .

when This and That have incompatible datatype mappings. Deriving a 
contradiction is definitely a Bad Thing. But don't talk about which 
came first, or which is more local, or which is more prescriptive; 
none of that makes sense in RDFS.  We just need to make sure that 
this cannot ever happen, is all.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 15 November 2001 14:42:43 UTC