Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

Hi Peter,

Data Sets all age at the same rate, (1460 Days + 1 Leap Day per 16 Calendar Quarters) or any scalar multiple of that single frequency.  The frequency is "man-made".  Certainly error checking is good, but cross-domain data transfers are only a transportation service via a dumb pipe.  I am wary of added value in-transit claims.  They are a delusion that some may find in watches but are nowhere to be found in calendars.  

http://www.rustprivacy.org/2014/balance/CulturalHeritageVision.jpg

--Gannon
--------------------------------------------
On Sun, 4/6/14, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:

 Subject: Re: Inference for error checking [was Re: How to avoid that collections  "break" relationships]
 To: "David Booth" <david@dbooth.org>, "Pat Hayes" <phayes@ihmc.us>
 Cc: "Markus Lanthaler" <markus.lanthaler@gmx.net>, public-hydra@w3.org, "'public-lod@w3.org' (public-lod@w3.org)" <public-lod@w3.org>, "W3C Web Schemas Task Force" <public-vocabs@w3.org>, "Dan Brickley" <danbri@danbri.org>
 Date: Sunday, April 6, 2014, 8:07 PM
 
 Well, certainly, one could do this if
 one wanted to.  However, is this a useful thing to do,
 in general, particularly in the absence of constructs that
 actually sanction the inferenceand particularly if the
 checking is done in a context where there is no way of
 actually getting the author to fix whatever problems are
 encountered?
 
 My feelings are that if you really want to do this, then the
 place to do it isduring data entry or data importation.
 
 
 peter
 
 On 04/03/2014 03:12 PM, David Booth wrote:
 > First of all, my sincere apologies to Pat, Peter and
 the rest of the
 > readership for totally botching my last example,
 writing "domain" when
 > I meant "range" *and* explaining it wrong.  Sorry
 for all the confusion it caused!
 > 
 > I was simply trying to demonstrate how a
 schema:domainIncludes
 > assertion could be useful for error checking even if it
 had no
 > formal entailments, by making selective use of the
 CWA.  I'll
 > try again.
 > 
 > Suppose we are given these RDF statements, in which the
 author
 > *may* have made a typo, writing ddd instead of ccc as
 the rdf:type
 > of x:
 > 
 >   x ppp y .       
            
    # Triple A
 >   x rdf:type ddd .     
           # Triple B
 >   ppp schema:domainIncludes ccc.  #
 Triple C
 > 
 > As given, these statements are consistent, so a
 reasoner
 > will not detect a problem.  Indeed, they may or
 may
 > not be what the author intended.  If the author
 later
 > added the statement:
 > 
 >   ccc owl:equivalentClass ddd
 .   # Triple E
 > 
 > then ddd probably was what the author intended
 > in triple B.  OTOH if the author later added:
 > 
 >   ccc owl:disjointWith ddd . 
     # Triple F
 > 
 > then ddd probably was not what the author intended
 > in triple B.
 > 
 > However, thus far we are only given triples {A,B,C}
 > above, and an error checker wishes
 > to check for *potential* typos by applying the rule:
 > 
 >   For all subgraphs of the form
 > 
 >     { x ppp y .
 >       ppp
 schema:domainIncludes ccc . }
 > 
 >   check whether
 > 
 >      { x rdf:type ccc . }
 > 
 >   is *provably* true.  If not, then
 fail the
 >   error check.  If all such
 subgraphs pass, then
 >   the error check as a whole passes.
 > 
 > Under the OWA, the requirement:
 > 
 >      { x rdf:type ccc . }
 > 
 > is neither provably true nor provably false given
 > graph {A,B,C}.  But under the CWA it is
 > considered false, because it is not provably true.
 > 
 > This is how the schema:domainIncludes can be
 > useful for error checking even if it has no formal
 > entailments: it tells the error checker which
 > cases to check.
 > 
 > I hope that now makes more
 sense.   Again, sorry to
 > have screwed up my example so badly last time, and
 > I hope I've got it right this time.  :)
 > 
 > David
 > 
 > 
 > On 04/02/2014 11:42 PM, Pat Hayes wrote:
 >> 
 >> On Mar 31, 2014, at 10:31 AM, David Booth <david@dbooth.org>
 wrote:
 >> 
 >>> On 03/30/2014 03:13 AM, Pat Hayes wrote:
 >>>> [ , . . ]
 >>>> What follows from knowing that
 >>>> 
 >>>> ppp schema:domainIncludes ccc . ?
 >>>> 
 >>>> Suppose you know this and you also know
 that
 >>>> 
 >>>> x ppp y .
 >>>> 
 >>>> Can you infer x rdf:type ccc? I presume
 not, since the domain might
 >>>> include other stuff outside ccc. So, what
 *can* be inferred about the
 >>>> relationship between x and ccc ? As far as
 I can see, nothing can be
 >>>> inferred. If I am wrong, please enlighten
 me. But if I am right, what
 >>>> possible utility is there in even making a
 schema:domainIncludes
 >>>> assertion?
 >>>> 
 >>>> If "inference" is too strong, let me weaken
 my question: what
 >>>> possible utility **in any way whatsoever**
 is provided by knowing
 >>>> that schema:domainIncludes holds between
 ppp and ccc? What software
 >>>> can do what with this, that it could not do
 as well without this?
 >>> 
 >>> I think I can answer this question quite
 easily, as I have seen it come up before in discussions of
 logic.
 >>> 
 >>> ...
 >> 
 >>> Note that this categorization typically relies
 on making a closed world assumption (CWA), which is common
 for an application to make for a particular purpose --
 especially error checking.
 >> 
 >> Yes, of course. If you make the CWA with the
 information you have, then
 >> 
 >> ppp schema:domainIncludes ccc .
 >> 
 >> has exactly the same entailments as
 >> 
 >> ppp rdfs:domain ccc .
 >> 
 >> has in RDFS without the CWA. But that, of course,
 begs the question. If you are going to rely on the CWA, then
 (a) you are violating the basic assumptions of all Web
 notations and (b) you are using a fundamentally different
 semantics. And see below.
 >> 
 >> None of this has anything to do with a distinction
 between entailment and error checking, by the way. Your
 hypothetical three-way classification task uses the same
 meanings of the RDF as any other entailment task would.
 >> 
 >>> 
 >>> In this example, let us suppose that to pass,
 the object of every predicate must be in the "Known Domain"
 of that predicate, where the Known Domain is the union of
 all declared schema:domainIncludes classes for that
 predicate.   (Note the CWA here.)
 >>> 
 >>> Given this error checking objective, if a
 system is given the facts:
 >>> 
 >>>   x ppp y .
 >>>   y a ccc .
 >>> 
 >>> then without also knowing that "ppp
 schema:domainIncludes ccc", the system may not be able to
 determine that these statements should be considered Passed
 or Failed: the result may be Indeterminate.  But if the
 system is also told that
 >>> 
 >>>   ppp schema:domainIncludes ccc
 .
 >>> 
 >>> then it can safely categorize these statements
 as Passed (within the limits of this error checking).
 >> 
 >> Why? [ y a cc . ] does not follow from this
 assertion and the x ppp y, so this looks like an
 Indeterminate to me. Even with the CWA applied to ppp, your
 check here is extremely risky. In fact, I could invoke
 Gricean reasoning to conclude that the domain of ppp
 **almost certainly must** include something outside ccc;
 because if not, why did whoever wrote this use the more
 cautious schema:domainIncludes rather than the simpler and
 more direct rdfs:domain? Indeed, isnt the ubiquity of the
 OWA in Web reasoning the only justification for having a
 construct like schema:domainIncludes at all? Why else was it
 invented, if not to allow for further information to make
 the domain larger?
 >> 
 >>> Thus, although schema:domainIncludes does not
 enable any new entailments under the open world assumption
 (OWA), it *does* enable some useful error checking inference
 under the closed world assumption (CWA), by enabling a shift
 from Indeterminate to Passed or Failed.
 >> 
 >> I would not want any important decision to rest on
 such an extremely flaky foundation as this.
 >> 
 >>> 
 >>> If anyone is concerned that this use of the CWA
 violates the spirit of RDF, which indeed is based on the OWA
 (for *very* good reason), please bear in mind that almost
 every application makes the CWA at some point, to do its
 job.
 >> 
 >> Um, bullshit. But in any case, even if it were
 true, the important thing is to know when to invoke the CWA.
 Assuming that you know all the domain, when you have been
 told explicitly that you probably have not been told all of
 it, is a very bad heuristic for invoking the CWA.
 >> 
 >> Pat
 >> 
 >>> 
 >>> David
 >>> 
 >>> 
 >> 
 >>
 ------------------------------------------------------------
 >> IHMC           
                
          (850)434 8903 home
 >> 40 South Alcaniz St.       
     (850)202 4416   office
 >> Pensacola           
                
 (850)202 4440   fax
 >> FL 32502           
                
   (850)291 0667   mobile (preferred)
 >> phayes@ihmc.us 
      http://www.ihmc.us/users/phayes
 >> 
 >> 
 >> 
 >> 
 >> 
 >> 
 >> 
 >> 
 >> 
 
 
 

Received on Monday, 7 April 2014 18:17:34 UTC