W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

From: David Booth <david@dbooth.org>
Date: Sun, 06 Apr 2014 22:24:50 -0400
Message-ID: <53420C72.8010806@dbooth.org>
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Pat Hayes <phayes@ihmc.us>
CC: Markus Lanthaler <markus.lanthaler@gmx.net>, public-hydra@w3.org, "'public-lod@w3.org' (public-lod@w3.org)" <public-lod@w3.org>, W3C Web Schemas Task Force <public-vocabs@w3.org>, Dan Brickley <danbri@danbri.org>
On 04/06/2014 09:07 PM, Peter F. Patel-Schneider wrote:
> Well, certainly, one could do this if one wanted to.  However, is this a
> useful thing to do, in general, particularly in the absence of
> constructs that actually sanction the inference and particularly if the
> checking is done in a context where there is no way of actually getting
> the author to fix whatever problems are encountered?

I'll let others judge that.  My goal in the example was simply to 
demonstrate how it *could* be useful.

>
> My feelings are that if you really want to do this, then the place to do
> it is during data entry or data importation.

Sure, it's certainly best to do error checking as early as possible, but 
often there is still some value in doing it later as well.  Maybe the 
data users can contact the data publishers and alert them to a potential 
problem?  But like I say, I'll let others judge its usefulness.  I don't 
have a strong opinion on that.

David

>
>
> peter
>
> On 04/03/2014 03:12 PM, David Booth wrote:
>> First of all, my sincere apologies to Pat, Peter and the rest of the
>> readership for totally botching my last example, writing "domain" when
>> I meant "range" *and* explaining it wrong.  Sorry for all the
>> confusion it caused!
>>
>> I was simply trying to demonstrate how a schema:domainIncludes
>> assertion could be useful for error checking even if it had no
>> formal entailments, by making selective use of the CWA.  I'll
>> try again.
>>
>> Suppose we are given these RDF statements, in which the author
>> *may* have made a typo, writing ddd instead of ccc as the rdf:type
>> of x:
>>
>>   x ppp y .                       # Triple A
>>   x rdf:type ddd .                # Triple B
>>   ppp schema:domainIncludes ccc.  # Triple C
>>
>> As given, these statements are consistent, so a reasoner
>> will not detect a problem.  Indeed, they may or may
>> not be what the author intended.  If the author later
>> added the statement:
>>
>>   ccc owl:equivalentClass ddd .   # Triple E
>>
>> then ddd probably was what the author intended
>> in triple B.  OTOH if the author later added:
>>
>>   ccc owl:disjointWith ddd .      # Triple F
>>
>> then ddd probably was not what the author intended
>> in triple B.
>>
>> However, thus far we are only given triples {A,B,C}
>> above, and an error checker wishes
>> to check for *potential* typos by applying the rule:
>>
>>   For all subgraphs of the form
>>
>>     { x ppp y .
>>       ppp schema:domainIncludes ccc . }
>>
>>   check whether
>>
>>      { x rdf:type ccc . }
>>
>>   is *provably* true.  If not, then fail the
>>   error check.  If all such subgraphs pass, then
>>   the error check as a whole passes.
>>
>> Under the OWA, the requirement:
>>
>>      { x rdf:type ccc . }
>>
>> is neither provably true nor provably false given
>> graph {A,B,C}.  But under the CWA it is
>> considered false, because it is not provably true.
>>
>> This is how the schema:domainIncludes can be
>> useful for error checking even if it has no formal
>> entailments: it tells the error checker which
>> cases to check.
>>
>> I hope that now makes more sense.   Again, sorry to
>> have screwed up my example so badly last time, and
>> I hope I've got it right this time.  :)
>>
>> David
>>
>>
>> On 04/02/2014 11:42 PM, Pat Hayes wrote:
>>>
>>> On Mar 31, 2014, at 10:31 AM, David Booth <david@dbooth.org> wrote:
>>>
>>>> On 03/30/2014 03:13 AM, Pat Hayes wrote:
>>>>> [ , . . ]
>>>>> What follows from knowing that
>>>>>
>>>>> ppp schema:domainIncludes ccc . ?
>>>>>
>>>>> Suppose you know this and you also know that
>>>>>
>>>>> x ppp y .
>>>>>
>>>>> Can you infer x rdf:type ccc? I presume not, since the domain might
>>>>> include other stuff outside ccc. So, what *can* be inferred about the
>>>>> relationship between x and ccc ? As far as I can see, nothing can be
>>>>> inferred. If I am wrong, please enlighten me. But if I am right, what
>>>>> possible utility is there in even making a schema:domainIncludes
>>>>> assertion?
>>>>>
>>>>> If "inference" is too strong, let me weaken my question: what
>>>>> possible utility **in any way whatsoever** is provided by knowing
>>>>> that schema:domainIncludes holds between ppp and ccc? What software
>>>>> can do what with this, that it could not do as well without this?
>>>>
>>>> I think I can answer this question quite easily, as I have seen it
>>>> come up before in discussions of logic.
>>>>
>>>> ...
>>>
>>>> Note that this categorization typically relies on making a closed
>>>> world assumption (CWA), which is common for an application to make
>>>> for a particular purpose -- especially error checking.
>>>
>>> Yes, of course. If you make the CWA with the information you have, then
>>>
>>> ppp schema:domainIncludes ccc .
>>>
>>> has exactly the same entailments as
>>>
>>> ppp rdfs:domain ccc .
>>>
>>> has in RDFS without the CWA. But that, of course, begs the question.
>>> If you are going to rely on the CWA, then (a) you are violating the
>>> basic assumptions of all Web notations and (b) you are using a
>>> fundamentally different semantics. And see below.
>>>
>>> None of this has anything to do with a distinction between entailment
>>> and error checking, by the way. Your hypothetical three-way
>>> classification task uses the same meanings of the RDF as any other
>>> entailment task would.
>>>
>>>>
>>>> In this example, let us suppose that to pass, the object of every
>>>> predicate must be in the "Known Domain" of that predicate, where the
>>>> Known Domain is the union of all declared schema:domainIncludes
>>>> classes for that predicate.   (Note the CWA here.)
>>>>
>>>> Given this error checking objective, if a system is given the facts:
>>>>
>>>>   x ppp y .
>>>>   y a ccc .
>>>>
>>>> then without also knowing that "ppp schema:domainIncludes ccc", the
>>>> system may not be able to determine that these statements should be
>>>> considered Passed or Failed: the result may be Indeterminate.  But
>>>> if the system is also told that
>>>>
>>>>   ppp schema:domainIncludes ccc .
>>>>
>>>> then it can safely categorize these statements as Passed (within the
>>>> limits of this error checking).
>>>
>>> Why? [ y a cc . ] does not follow from this assertion and the x ppp
>>> y, so this looks like an Indeterminate to me. Even with the CWA
>>> applied to ppp, your check here is extremely risky. In fact, I could
>>> invoke Gricean reasoning to conclude that the domain of ppp **almost
>>> certainly must** include something outside ccc; because if not, why
>>> did whoever wrote this use the more cautious schema:domainIncludes
>>> rather than the simpler and more direct rdfs:domain? Indeed, isnt the
>>> ubiquity of the OWA in Web reasoning the only justification for
>>> having a construct like schema:domainIncludes at all? Why else was it
>>> invented, if not to allow for further information to make the domain
>>> larger?
>>>
>>>> Thus, although schema:domainIncludes does not enable any new
>>>> entailments under the open world assumption (OWA), it *does* enable
>>>> some useful error checking inference under the closed world
>>>> assumption (CWA), by enabling a shift from Indeterminate to Passed
>>>> or Failed.
>>>
>>> I would not want any important decision to rest on such an extremely
>>> flaky foundation as this.
>>>
>>>>
>>>> If anyone is concerned that this use of the CWA violates the spirit
>>>> of RDF, which indeed is based on the OWA (for *very* good reason),
>>>> please bear in mind that almost every application makes the CWA at
>>>> some point, to do its job.
>>>
>>> Um, bullshit. But in any case, even if it were true, the important
>>> thing is to know when to invoke the CWA. Assuming that you know all
>>> the domain, when you have been told explicitly that you probably have
>>> not been told all of it, is a very bad heuristic for invoking the CWA.
>>>
>>> Pat
>>>
>>>>
>>>> David
>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 home
>>> 40 South Alcaniz St.            (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile (preferred)
>>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
>
>
Received on Monday, 7 April 2014 02:25:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC