Re: Datasets, blackboxes and frames

Dave Reynolds <der@hplb.hpl.hp.com> wrote:
> 
> The architecture document section on datasets [*] focusses on how you 
> identify and describe datasets.

Perfect timing! I was planning to raise this issue myself and spoke about
it with Jos and Harold a week ago. But did not have an opportunity because
of travel.

> 
> There is a second, related, dataset issue I'd like to get clearer on.
> 
> Will we support dataset-specific queries in the core?

This is actually absolutely essential to make RIF useful for data
integration on the Web.

> In particular, I would find it useful to be able to map SPARQL-style 
> named graph expressions into RIF - e.g. in order to represent CWM rules 
> and because that something we need for our own use cases (which may 
> affect how JenaRules evolves).

SPARQL's named graphs is a hack, which has clean logical counterpart. It is
called scoped inference. It was described in several places, such as
http://www.springerlink.com/content/f511460n0v3hl61n/
http://www.springerlink.com/content/1kcf7e0eu32kycxr/

It has also been implemented in several systems, such as Flora-2, Triple,
Ontobroker.

This is all we need to have scoped negation, which is mentioned in the
Charter for phase 2. So, having this in the core will pave way for scoped
negation in phase 2.


> This could be achieved by having some builtin in the library that can 
> query a dataset, such as the SPARQL blackbox we have talked about before:
> 
>     SPARQL(dataset-id-list, query-string, var1, ... varn)
> 
> However, I wonder whether it would be possible/reasonable to have the 
> frame terms include an optional datasource identifier:
> 
>     oid{datasource}[p->v, ... p'->v']
> 
> N.B. I don't care about the human readable syntax, this is just to give 
> a way to discuss it.

A conceptually better syntax is

     oid[p->v, ... p'->v']@datasource
     pred(....)@datasource

The important point here is not the exact syntax, but an emphasis on the
fact that we are asking queries (the part left of @) against a knowledge base
(a logical theory), which is to the right of @.

Note that this is not (and should not be) specific to RDF. Scoped inference
is a generally useful facility for distributed (and even non-distributed)
knowledge bases.

> Thus the facts would be partitioned into a set of fact datasets, one 
> default anonymous one and a set of named ones identified by URIs.

Does not need to be identified by a URI. This facility is also very useful
for modularization of a KB. It is the same issue as global/local Ids for
predicates.

> A pattern with no explicit datasource ID is matched against the default 
> set, one with an explicit datasource ID is matched against the 
> corresponding dataset of facts.

Yes, this is exactly how it is implemented in FLORA-2.


> There need be no formal link between the dataset URI and the web. There 
> would be no enforced processing model requiring you to dereference the 
> URI to fetch the data. The URI is simply a name for a data partition.

Right.

> (1) Is this a reasonable approach at all?

Yes.

> (2) What other rule languages might need such dataset-specific 
> conditions and would this mechanism be useful for them?

I think every language needs it, but some do not realize it :-)

> (3) Assuming some derivative of this can be made useful, should it go in 
> the Core?

I believe that this is necessary even just to be able to properly interface
with RDF in the core. The problem is that without such a facility there is
no way to represent RDF/S theories properly. If we just include RDFS axioms
then there is no barrier to people adding other axioms that affect the
inference in imported RDF/S data. Worse, the interaction between the
imported theories and other rules may (and is likely to be) unintentional.


	cheers
	  --michael  

Received on Saturday, 23 June 2007 14:57:15 UTC