Re: 3store+rdf store compatability tests added from Seaborne, Andy on 2004-10-19 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 19 Oct 2004 20:44:35 +0100
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Cc: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <41756EA3.5000708@hp.com>

Steve Harris wrote:
> I've just added the tests that Alberto and I agreed, and show
> compatibility between 3store and RDFStore. These are not exactly the test
> we ran (they were RDQL), as I've refomatted to SPARQL, so Alberto should
> check them to make sure they have the same semantics as the ones he
> tested.
> 
> These tests are source-query-*.
> http://www.w3.org/2001/sw/DataAccess/tests/#source-query-001 and on.
> 
> The significant differences between these tests and the
> dawg-source-simple-* tests are that these do not distinguish between
> stores that treat multiple RDF grahs as one big graph, and stores that
> treat it as a bag of graphs (quads).

We need to be aware that "quads" covers several things - for example, 
allocating a triple id and storing it in 4th slot is "quads" (like storing 
statings) but is a different approach again.

> Alberto and I do the latter, and
> speaking for myself I have no reason to change, and a strong
> disincliniation to. The differences between quad stores and "big graph"
> stores are glossed over with the DISTINCT keyword.
> 
> These tests also do not require that the SOURCE URI of a statement is the
> URI by which it was resolved, which allows thing s such as two versions of
> a graph retreived at different times from the same location.

There is no requirement that the URI of a subgraph (named container) is 
that by which it was resolved.  The only requirement is that set of 
triples has some URI.  A system is free to choose a URI by which it 
obtained the graph or allocate its own for that unit.

Having a URI means it can be returned as a result via a serialization, 
whether XML or RDF.  Both sets of SOURCE examples have a 1-1 relationship 
between variable in SOURCE and the set of triples being identified.

If the processors allocates a new name to the same document retrieved at 
different times, the resulting aggregation still have distinguishable sets 
of triples.

An implementation can use the allocated URI in any further triples it 
wishes to use to record provenance information such as date/time the graph 
was retrieved, the retrieval URI, a quality measure, software versions 
used to process the data, reason why the graph was retrieved - whatever it 
wishes to record about the ingestion step.

> 
> The contraversial aspect is that some require a dc:source assertion from
> the SOURCE URI to the URI the was dereferenced to retreive the graph, I
> wouldn't neccesarily want this in the final test suite, but its something
> that Alberto and I both do, so it went in the compatibilty tests.

An implementation can choose to do that as I illustrated in the original 
description.  I think it is important not to force one particular view of 
provenance into SPARQL without a wider approach to provenance (and that 
isn't going to happen just yet).

	Andy

> 
> I think this now completes my action from the last telecon re. updating
> the tests with source tests, as Andy kindly uploaded his tests.
> 
> - Steve
>

Received on Tuesday, 19 October 2004 19:45:05 UTC