SOURCE issue - querying and retrieval of source

Intended to address my action -
  DaveB: propose an way to address the SOURCE issue. CONTINUES.


As a minimal requirement for 'source' I see two aspects -
  1) using source information in query
  2) retrieving source in results

but first, it has to be clear what 'source information' actually is
or at least for this discussion.

The most useful meaning I think is the resource URI where a
representation was retrieved that provided a set of RDF triples,
also called an RDF graph.  I'll call this a Source URI.

This could be the URI of some RDF/XML say at
  http://www.w3.org/2000/08/w3c-synd/home.rss
which right now returns 55 RDF triples which could
be stored in a graph along with some others triples, maybe
from other RSS 1.0 feeds.


1) Source URI in Querying

In the query language, the first case is querying some RDF graph, and
using the information that some triples may be associated with the
URI http://www.w3.org/2000/08/w3c-synd/home.rss

BRQL Example:
Find all triples in an graph of aggregated RSS 1.0 feeds which were
retrieved from the W3C's feed.
  SELECT ?x,?y,?z WHERE
    SOURCE <http://www.w3.org/2000/08/w3c-synd/home.rss> (?x ?y ?z)

Now it may be that there are inferred triples (or others) that have
no Source URI by this definition or an RDF applcation may not be able
to provide this feature of associating a Source URI with triples.

In the former case, the query would fail to match.  In the latter
case, the query would fail depending on the implementation - which is
bad for conformance.  Requiring all implementations provide this
association is possible but might be significant overhead.


2) Source URI in results

A query may return a Source URI, separate from binding a Source URI
in a matching triple or constraint.

BRQL Example:
The graph contains aggregated RSS feeds and the query wants to
return all items indicating where they were originally retrieved
from, even with duplicates:

  SELECT ?s WHERE
    SOURCE ?s (?x rdf:type rss:item)

The following could be forbidden, while still allowing the above:
  SELECT ?x,?y,?z WHERE
    SOURCE <http://www.w3.org/2000/08/w3c-synd/home.rss> (?x ?y ?z)


An implementation that did not provide this feature could always
return no binding for ?s which could be used to indicate there
was no such information available.  The cost of having this
feature as optional is lower than having it in query.


My Personal Choice

Allow 2) with null bindings indication not-known.  Do not do 1).


Other definitions of provenance

The Source URI could be extended to allow non-URIs, which would move
it more into what has been called context in various ways.  The only
sensible other non-URI to allow would be a bnode.  This definition
would more easily allow things such as graph-scoped sets of triples
such as inferred triples, with no implication of a retrieval
operation.


Dave

Received on Tuesday, 24 August 2004 14:04:22 UTC