Integrity Checking vs. Typing [Re: RDFS bug "A property can have at most one range property"] from Wolfram Conen on 2001-11-17 (www-rdf-interest@w3.org from November 2001)

From: Wolfram Conen <conen@gmx.de>
Date: Sat, 17 Nov 2001 17:14:42 +0100
To: "Sean B. Palmer" <sean@mysterylights.com>, www-rdf-interest@w3.org
CC: tarod@softhome.net
Message-ID: <3BF68CF2.E23E9BC6@gmx.de>
[The context below is RDFS, I briefly discuss two interpretations of
domain/range (the old one suggested by the RDFS CR and the newer,
DAMLish interpretation) and claim that choosing either of this
interpretations has a different impact on deciding the
disjunctive/conjunctive question. I also discuss the usefulness of both
interpretations with respect to certain application context and show
that the newer interpretation is less expressive. Sorry for returning to
this issue but, as I briefly discuss below, the discussion has mixed
different issues from the beginning and seldom discussed (application)
requirements beyond compatibility to DAML/OIL.]

"Sean B. Palmer" wrote:
>
> Clearly, people take domains and ranges to be conjuntive; 
> it is useful to do so, and the content that you originally 
> cited gives a good
> reason (from TimBL) as to why they should be taken conjuntively, with
> agreement by whoever was responding. 

Hm, I remember a nice email from Lee Jonas [1] who pointed out pretty
clearly that disjunctive interpretation can be very helpful for some
important purposes.

The arguments Tim gave in [2] where very convinving as far as he argued
that a property should be allowed to have more than one range
constraint. It is not as convincing when it comes to the question of
disjunctive vs. conjunctive interpretation. Let me try to point out some
related things briefly:

The first question to answer is whether one wants to interpret
domain/range constraints as an "integrity constraints" on the usage of
properties (1) or as a "generative constraint (deductive rule)" on the
type of the subject/object that is used with a constrained property.

Example:
	[s:SomeProperty rdfs:domain s:someClass]
	[SomeResource s:SomeProperty SomeValue]

Interpretation (1) will give you a violation of the integrity
constraint. To make the above a valid collection  of triples, the type
of SomeResource has to be known as being s:someClass.

Interpretation (2) will never give you any violation, it simply
determines the type of SomeResource to be s:someClass.

Note that this initial decision influences the subsequent decision. The
problem with the discussion that started with TimBL email [2] was that a
number of people who where arguing implicitly from the point of view of
the second interpretation where contributing to it (for example, all
friends of OIL naturally prefer interpretation 2), so that it now seems
as if the decision to allow more than one range constraints has also
implied to interpret multiple range constraints conjunctively.

The problem with this is that, with interpretation (1), this is not
really convincing because adding more range constraints will make the
constraint tighter, leaving you with the risk that all your old validity
decisions turn out to have been faulty. In a disjunctive interpretation,
adding additional range constraints does relax your integrity constraint
(with is quite plausible if new range constraints come from schemata
that also introduce new types that can be used with an already known
property -- yes, it is clear that checking such constraints is only
meaningful relative to your knowledge, see below). In interpretation
(2), the conjunctive interpretation is much more convincing (as has been
argued a lot from the point of view of a DAML/OIL designed) than the
disjunctive.

Now, reading the "old" RDFS candidate recommendation left us (and
others, see Lee's email cited above) with the strong impression that the
"integrity constraint" interpretation (1) was the intended
interpretation. This would imply that a conjunctive interpretation is
not really helpful (there are some more aspects to consider, see below
if you are interested).

Nowadays, regarding the decisions of the RDF core working group, it has
been made clear that RDFS does not carry any integrity constraint
expressivity anymore, the interpretation of domain/range has been
adapted to fit their DAML/OIL counterparts, ie. they are now a tool to
determine the type of subjects/objects. So, yes, accepting this as
given, there seems to be no real point in discussing whether multiple
constraints should be disjunctive or conjunctive (conjunctive seems to
be clearly preferable)

Note however that this adaptation of RDFS may make your old schemata
behave very differently in the usage context you had in mind -- there is
no notion of validity anymore, no constraint checking can be done
(because domain/range determine the type and that's it), etc. The
differences in the old and the recent flavor of RDFS have been pointed
out with the help of first order logic in [3] (for the reader interested
in some formal detail and a set of roughly 20 rules capturing the MT).

No, has this any relevance? The email of Lee made an interesting point:
in applications of RDFS vocabularies, the integrity checking can easily
turn out to be THE key concept -- it allows to actually MAKE USE of all
the type information that you have collected in you RDFS documents (and
infered with type, subClassof or subProperties of the two). This
potential has been lost now -- note that from an extensional point of
view (which resources belong to which classes) the old and the new RDFS
are without difference: all type information that can be determined by
domain/range properties in the new interpretation can also be attached
to the subject/objects in the old interpretation (with type/subClassOf
or subProperties of both). However, the interpretation of domain/range
as integrity constraints CAN NOT be emulated with the new interpretation
(see the RDFS entailment lemma in the MT: applying 	the schema closure
rules allos to recude RDFS entailment to simple entailment, or, in other
words: nothing formulatable with RDFS can do any harm). Some
expressivity has been lost. 

Is this a significant loss? Hm, that depends really on what you are
planning to do with your RDF schemata. In two of the RDFS applications
that we develop(ed), constraint checking is the most important aspect
(in a schemata for role-based access CONTROL, for example). It seems to
be very easy to construct application scenarios en masse where typing
information is used to decide whether certain functions can be used on
the typed data (resources), ie.: can we feed this resource into this
transformation schema?, can it be used with this function?, can it go
into this databas table? etc. (it seems not to be so easy to construct
application scenarios that only use typing information as an end in
itself: wow, so this resource has all these nice types!. One probably
wants to make use of this information too). Now, clearly, what you then
need in the scenarios mentioned above is the ability to make use of the
type information by checking the type against allowed types for
functions, transformations, relations (->properties) -- which is exactly
what you could do in the old RDFS interpretation WITHIN your model --
and now have to do extrinsically. We consider this a remarkable loss.

It has been argued, however, that the integrity constraint checking
capability depends on the "completeness" of the knowledge -- assume that
you have a set of RDFS triples

1:	[s:SomeProperty rdfs:domain s:someClass]
2:	[SomeResource s:SomeProperty SomeValue]
---
3:	[SomeResource rdf:type s:someClass]

Now, further assume that you draw your conclusion that the domain
constraint is violated already after having received only the first two
triples. Then, receiving the third triple renders your decision faulty.
This  smells like non-monotonicity. However, it is not necessarily so:
if you draw your conclusions relative to the knowledge base you have at
hand, use versioning, keep track of your inferences etc., this can all
be handled without non-monotonicity (from a meta perspective). But that
is not the point here -- the point is that the same problem appears in
the "typing interpretation" of domain once we accept that the type
information (explicit or infered) is actually USED in one of the (not
completely artificial) application scenarios mentioned above -- when we
start to make decisions (like: if the resource x has the correct type C,
we can delete y) depending on the presence and/or absence of typing
information, similar arguments as above can be made (in the "absence
case" we can later encounter information that allows us to infer the
type we have been looking for, in the "presence case" the information we
used may turn out to have been wrong etc.). 

To conclude: I can see no real difference here, except that we now have
to model integrity constraints for type checking outside of RDFS (which
keeps RDFS simpler, but introduces a somewhat deceptive security
because, when we start to use it in the above scenarios, the questions
related to type checking and the subsequent problems of
updates/incompleteness of knowledge etc. have to be solved). I would
prefer to treat this question already in the "basic layer", because this
is where it starts to become interesting (from certain applicational
points of view) and seems to be unavaiodable to discuss (I like the old
interpretation... ;)


Thanks for reading - let me know what you think about it if you like!
	Wolfram
	conen@gmx.de

PS1: The above is largely based on discussions with Reinhold. I also
profited from an exchange of thoughts with Graham and Jan some time ago
(though the not necessarily endorse the above)

PS2: (Marc, maybe you found what you have been looking for in this or
the mentioned documents.)

References:

[1] Lee Jonas:
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Sep/0148.html
and following

[2] TimBL:
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Sep/0107.html,
refering to versio 1.0 of [4]

[3] Conen/Klapsing: "Logical Interpretations of RDFS - A Compatibility
Guide", http://nestroy.wi-inf.uni-essen.de/rdf/new_interpretation/

[4] Conen/Klapsing: A Logical Interpretation of RDF,
http://nestroy.wi-inf.uni-essen.de/rdf/logical_interpretation/
Received on Saturday, 17 November 2001 10:11:12 UTC