Inference for error checking [was Re: How to avoid that collections "break" relationships]

On 03/30/2014 03:13 AM, Pat Hayes wrote:
> [ , . . ]
 > What follows from knowing that
> ppp schema:domainIncludes ccc . ?
> Suppose you know this and you also know that
> x ppp y .
> Can you infer x rdf:type ccc? I presume not, since the domain might
> include other stuff outside ccc. So, what *can* be inferred about the
> relationship between x and ccc ? As far as I can see, nothing can be
> inferred. If I am wrong, please enlighten me. But if I am right, what
> possible utility is there in even making a schema:domainIncludes
> assertion?
> If "inference" is too strong, let me weaken my question: what
> possible utility **in any way whatsoever** is provided by knowing
> that schema:domainIncludes holds between ppp and ccc? What software
> can do what with this, that it could not do as well without this?

I think I can answer this question quite easily, as I have seen it come 
up before in discussions of logic.

Entailment produces statements that are known to be true, given a set of 
facts and entailment rules.  And indeed, adding the fact that

   ppp schema:domainIncludes ccc .

to a set of facts produces no new entailments in that sense.  But it 
*does* enable another kind of very useful machine-processable inference 
that is useful in error checking, which I'll describe.

In error checking, it is sometimes useful to classify a set of 
statements into three categories: Passed, Failed or Indeterminate. 
Passed means that the statements are fine (within the checkable limits 
anyway): sufficient information has been provided, and it is internally 
consistent.  Failed means that there is something malformed about them 
(according to the application's purpose).  Indeterminate means that the 
system does not have enough information to know whether the statements 
are okay or not: further work might need to be performed, such as manual 
examination or adding more information (facts) to the system.  Hence, it 
is *useful* to be able to quickly and automatically establish that the 
statements fall into the Passed or Failed category.

Note that this categorization typically relies on making a closed world 
assumption (CWA), which is common for an application to make for a 
particular purpose -- especially error checking.

In this example, let us suppose that to pass, the object of every 
predicate must be in the "Known Domain" of that predicate, where the 
Known Domain is the union of all declared schema:domainIncludes classes 
for that predicate.   (Note the CWA here.)

Given this error checking objective, if a system is given the facts:

   x ppp y .
   y a ccc .

then without also knowing that "ppp schema:domainIncludes ccc", the 
system may not be able to determine that these statements should be 
considered Passed or Failed: the result may be Indeterminate.  But if 
the system is also told that

   ppp schema:domainIncludes ccc .

then it can safely categorize these statements as Passed (within the 
limits of this error checking).

Thus, although schema:domainIncludes does not enable any new entailments 
under the open world assumption (OWA), it *does* enable some useful 
error checking inference under the closed world assumption (CWA), by 
enabling a shift from Indeterminate to Passed or Failed.

If anyone is concerned that this use of the CWA violates the spirit of 
RDF, which indeed is based on the OWA (for *very* good reason), please 
bear in mind that almost every application makes the CWA at some point, 
to do its job.


Received on Monday, 31 March 2014 15:31:57 UTC