Re: Datasets, blackboxes and frames from Michael Kifer on 2007-06-25 (public-rif-wg@w3.org from June 2007)

From: Michael Kifer <kifer@cs.sunysb.edu>
Date: Mon, 25 Jun 2007 07:17:24 -0400
To: Chris Welty <cawelty@gmail.com>
Cc: Dave Reynolds <der@hplb.hpl.hp.com>, RIF <public-rif-wg@w3.org>
Message-ID: <11268.1182770244@cs.sunysb.edu>
> 
> 
> Michael Kifer wrote:
> > Dave Reynolds <der@hplb.hpl.hp.com> wrote:
> >> In particular, I would find it useful to be able to map SPARQL-style 
> >> named graph expressions into RIF - e.g. in order to represent CWM rules 
> >> and because that something we need for our own use cases (which may 
> >> affect how JenaRules evolves).
> > 
> > SPARQL's named graphs is a hack, 
> 
> Sparql is a candidate recommendation from the W3C.  If you find 
> something wrong with it, there are open channels (not academic 
> publications) in which you can state any objection.

SPARQL is a hack because it does not have model theory. They decided to
relegate it to an appendix, and it does not exactly match the graph
matching algorithm that they use. The algorithm currently used is a hack
and so is their named graph idea. But all this can be approximated with a
traditional model theory, which is what we should do in RIF.

I do not need to state my objections because this is well-known and a
number of people in the SPARQL group have already raised it before
(unsuccessfully). In fact, we discussed this with you and Enrico when you
were in Bolzano.

> If you think there are problems with it that are relevant to RIF, 
> please state them.  Stating that it is "a hack" doesn't help at all.

Fortunately, we are not dependent on SPARQL. All we need is to provide some
kind of interface. Since they refused to give a normal model theory to
their language, it makes our (RIF) job easy: it is just a built-in with a
black-box semantics.

> > which has clean logical counterpart. It is
> > called scoped inference. It was described in several places, such as
> > http://www.springerlink.com/content/f511460n0v3hl61n/
> > http://www.springerlink.com/content/1kcf7e0eu32kycxr/
> 
> If you wish people in the group to read it, please provide a link to a 
> version we can access.  These links require paying a fee.

OK, did not realize it was not free. But
one can simply cut and paste the titles and get the links from there. Anyway:
http://www.inf.unibz.it/~jdebruijn/publications/msa-ruleml05.pdf
ftp://ftp.cs.sunysb.edu/pub/TechReports/kifer/flora-lpnmr2005.pdf



	--michael  

> Anyway, I think we agree *something like this* would be useful.
> 
> -Chris
> 
> > 
> > It has also been implemented in several systems, such as Flora-2, Triple,
> > Ontobroker.
> > 
> > This is all we need to have scoped negation, which is mentioned in the
> > Charter for phase 2. So, having this in the core will pave way for scoped
> > negation in phase 2.
> > 
> > 
> >> This could be achieved by having some builtin in the library that can 
> >> query a dataset, such as the SPARQL blackbox we have talked about before:
> >>
> >>     SPARQL(dataset-id-list, query-string, var1, ... varn)
> >>
> >> However, I wonder whether it would be possible/reasonable to have the 
> >> frame terms include an optional datasource identifier:
> >>
> >>     oid{datasource}[p->v, ... p'->v']
> >>
> >> N.B. I don't care about the human readable syntax, this is just to give 
> >> a way to discuss it.
> > 
> > A conceptually better syntax is
> > 
> >      oid[p->v, ... p'->v']@datasource
> >      pred(....)@datasource
> > 
> > The important point here is not the exact syntax, but an emphasis on the
> > fact that we are asking queries (the part left of @) against a knowledge base
> > (a logical theory), which is to the right of @.
> > 
> > Note that this is not (and should not be) specific to RDF. Scoped inference
> > is a generally useful facility for distributed (and even non-distributed)
> > knowledge bases.
> > 
> >> Thus the facts would be partitioned into a set of fact datasets, one 
> >> default anonymous one and a set of named ones identified by URIs.
> > 
> > Does not need to be identified by a URI. This facility is also very useful
> > for modularization of a KB. It is the same issue as global/local Ids for
> > predicates.
> > 
> >> A pattern with no explicit datasource ID is matched against the default 
> >> set, one with an explicit datasource ID is matched against the 
> >> corresponding dataset of facts.
> > 
> > Yes, this is exactly how it is implemented in FLORA-2.
> > 
> > 
> >> There need be no formal link between the dataset URI and the web. There 
> >> would be no enforced processing model requiring you to dereference the 
> >> URI to fetch the data. The URI is simply a name for a data partition.
> > 
> > Right.
> > 
> >> (1) Is this a reasonable approach at all?
> > 
> > Yes.
> > 
> >> (2) What other rule languages might need such dataset-specific 
> >> conditions and would this mechanism be useful for them?
> > 
> > I think every language needs it, but some do not realize it :-)
> > 
> >> (3) Assuming some derivative of this can be made useful, should it go in 
> >> the Core?
> > 
> > I believe that this is necessary even just to be able to properly interface
> > with RDF in the core. The problem is that without such a facility there is
> > no way to represent RDF/S theories properly. If we just include RDFS axioms
> > then there is no barrier to people adding other axioms that affect the
> > inference in imported RDF/S data. Worse, the interaction between the
> > imported theories and other rules may (and is likely to be) unintentional.
> > 
> > 
> > 	cheers
> > 	  --michael  
> > 
> > 
> 
> -- 
> Dr. Christopher A. Welty                    IBM Watson Research Center
> +1.914.784.7055                             19 Skyline Dr.
> cawelty@gmail.com                           Hawthorne, NY 10532
> http://www.research.ibm.com/people/w/welty
>
Received on Monday, 25 June 2007 11:17:39 UTC