RE: Test cases: source of a triple from Seaborne, Andy on 2004-08-26 (public-rdf-dawg@w3.org from July to September 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 26 Aug 2004 18:37:58 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E80803E3C079@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: Rob Shearer <mailto:Rob.Shearer@networkinference.com>
> Date: 26 August 2004 17:20
> 
> > == Test case 2: inference
> > 
> > Data:
> >   a1.rdf:
> >   :x rdf:type :C1 .
> >   :C1 rdfs:subClassOf :C2 .
> > 
> > Query:
> >     SELECT * WHERE { ?x rdf:type :C2 }
> > 
> > ?x = :x
> > ?src = <a.rdf> maybe.
> > 
> > Now suppose:
> >   a2.rdf:
> >   :x rdf:type :C1 .
> >   :x rdf:type :C2 .
> >   :C1 rdfs:subClassOf :C2 .
> > 
> > Now ?src = <a.rdf>
> > 
> > but a1.rdf and a2.rdf have the RDFS-same information.
> > Should it be the same whether :x rdf:type :C2 .is explicit or
> > inferred? (Forward rules systems where rules are run at ingestion
> > time would not be able to differentiate).

> As others have pointed out, I think you're sidestepping the real issues
> by avoiding the actual syntax of a query.

I'm not avoiding syntax - the the same query throughout.  Feel free to
rewrite the test cases to be clearer.

(Dave asked for the old syntax.  Steve corrected my mistake.)

> If you were to query either
> one of these documents, then you'd be querying RDF, not a "completed"
> graph with extra inferences. I certainly haven't seen anything to
> suggest that a query implementation should be able to perform
> inferencing, and I certainly don't see anything in the BRQL spec to try
> to get this to happen.

Firstly - I'm not arguing for one way or another.  You are assuming I am
advocating a position.  I undertook to write some test cases for points I
thought needed answering.  

I agree that the query system accesses a graph without regard to inference.
I think this should be a headline principle of our work.

It happens to lead to ?src = <a1/a2.rdf> in each case which seems like the
natural answer.  So far so good.  

My next example (3) then highlights an interaction of SOURCE and inference
if we attempt to use the natural result from case 2.  Others advocate that
SOURCE reflect the origin graph in the aggregation.  What if it can arise
across the aggregations? Are we saying that inference *can't* be done in
this case?

If we only had named unions/merges then the problem does not materialise.
Its only ad hoc aggregations that can illustrate this.  Your position leads
to two solution of the query (see example 1 for aside on
statements/statings) if SOURCE is asked for.  Fine - seems reasonable.  We
just need to be explicit about what we are deciding.  Hence test cases - see
the text at the beginning of my message.

As a WG we can decide to:
1/ Remain strictly within RDF
2/ Decide that it is important for query systems to support aggregation
   but not cover all provenance issues.
3/ Decide that it is important to address the whole provenance issue.
4/ Something else.

> It makes sense to be able to target a query at a completed graph (which
> is either fully realized by an outside system or transparently
> virtualized by an extended query processor, and most likely identified
> by some magical URI), but in this case from the query processor's point
> of view there is only ONE source.
> It also makes a lot of sense that whatever information sits in the
> fourth "quad" might need to be different even for triples within the
> same document/accessed via the same URI. 

Yes.  This is one of my worries about quads - that it is an incomplete
solution without chaining.  There is a reasonable position to take that only
the last step in the chain matters because publishing a statement is
undertaking it to be true.  Hence, only quads are needed.  I'd just like to
be sure of this if the WG decides that way.

> If such were possible then an
> inferencing system could annotate its inferences however it sees fit,
> and aggregators could annotate their aggregations. Otherwise we get back
> to the "information representation by document management" notion which
> I think is anathema to RDF.
> My conclusion is that "SOURCE" doesn't make a whole lot of sense, but I
> believe I've pointed that out before...

	Andy
Received on Thursday, 26 August 2004 17:38:20 UTC