RDF Graphs and Stores Summary from Nathan on 2010-11-04 (public-rdfa-wg@w3.org from November 2010)

From: Nathan <nathan@webr3.org>
Date: Thu, 04 Nov 2010 16:50:57 +0000
To: RDFA Working Group <public-rdfa-wg@w3.org>
CC: Ivan Herman <ivan@w3.org>, Manu Sporny <msporny@digitalbazaar.com>, Mark Birbeck <mark.birbeck@webbackplane.com>
Message-ID: <4CD2E471.3030402@webr3.org>
Hi All,

Today in the weekly telecon we discussed ISSUE-52, and specifically a 
general conceptual issue which needs addressed before we can proceed 
with the open issues on the RDFa API.

The conceptual issue is whether to make a distinction between "RDF 
Graph" and "Data Store", and then determine which are required by the 
RDFa API.

I think we can safely consider this issue a blocker wrt the RDFa API, so 
the sooner we can agree the better.

To perhaps bring clarity to the issue, I'll assert that there are in 
fact three distinct concepts:

  - RDF Graph
    An interface representing a "set" of an RDF Triples, an RDF Graph as 
defined in the RDF Specifications

  - RDF Graph Store
    An interface which allows the storage and retrieval of several 
distinct RDF Graphs.

  - RDF Triple Store
    An interface which allows the storage and retrieval of RDF Triples, 
where the notion of distinct graphs has been discarded (and generally 
where provenance information has been removed).


Any RDF Graph interface will be somewhat akin to an Array or Sequence of 
RDF Triples, and can roughly be specified as:

    interface RDFGraph = sequence<RDFTriple> (or RDFTriple[])


Any RDF Graph Store interface will be similar to a Dictionary or simple 
Key/Value store which stores "RDF Graphs" against a certain key, this is 
similar to QuadStores and the notion of Named Graphs, roughly this would 
be specified as:

    interface GraphStore {
      RDFGraph   get (in DOMString key);
      void       set (in DOMString key, in RDFGraph value);
    }


However, and I believe this is where any "grey" conceptual areas have 
arisen in the existing RDFa API, an "RDF Triple Store" is very like an 
"RDF Graph", in fact it's almost identical other than the fact it's 
persistent in some way.

    interface TripleStore ~= RDFGraph


So to begin addressing this issue, I'll first assert that the concepts 
of "RDF Graph" and "RDF Graph Store" are clearly distinct from each 
other, and that any RDFa or RDF API *requires* the concept of an "RDF 
Graph". Additionally, RDFa Core *requires* the RDFa API to /at least/ 
support two instances of RDF Graph, the "default graph" and the 
"processor graph", and to provide clear access to both.

However, the RDFa API does not require the concept of an "RDF Graph 
Store", although clearly such a thing exists, and is required if one is 
to store multiple rdf graphs, keeping them distinct from one another, 
and is required when using such features as the "FROM" clause in SPARQL, 
thus we may be wise to mention or define it in some way.

Similarly I think we can quickly assert that an "RDF Graph Store" is 
distinct from an "RDF Triple Store", one handles distinct sets of 
triples, graphs, the other handles a single set of triples, a single graph.

Thus, from this point on I'll remove "RDF Graph Store" from this discussion.

Remaining we have the slightly more complex "RDF Graph" vs "RDF Triple 
Store" distinction to make, where the grey area currently exists in the API.

Whilst both an RDF Graph and an RDF Triple Store share many common 
features, both "contain" sets of RDF Triples, and both need to provide 
access to the triples, I believe that there are key distinctions we can 
make between the two.

First, an RDF Triple Store is an interface to a Store, the store may 
hold triples in memory and in the same environment, and there may be 
multiple stores, but critically the stores may also be located in a 
different environment, on a different tier or on an entirely different 
machine all together - whereas an RDF Graph is an interface which simply 
represents a set of RDF Triples - it's the same distinction we make 
between an array and a database, they are quite different. However, we 
could also assert that an RDF Triple Store which we'd specify would be 
constrained to be in the same environment, in memory, and thus these 
distinguishing features would essentially be lost, so we have to look 
deeper.

Next, an RDF Graph is in many ways (almost-) immutable, that is to say 
that an RDF Graph is a set of triples to which you can add more triples, 
but you cannot remove triples from, also the key methods on an RDF Graph 
are immutable, a filter() will return a new RDF Graph (which can be 
considered a subgraph of the first), and a "merge" method will be more 
like a concatenation of two (or more) graphs, returning a third new 
graph. Whereas an RDF Triple Store has no immutable characteristics, it 
requires methods to both add and remove triples, a "filter" is more like 
a "select", and a "merge" method is more of an "import". The important 
thing to note here is that an RDF Triple store requires that any 
merge/import method add new triples to the store, whereas any similar 
method on a graph could be defined either way.

So, the key distinctions we have are that conceptually a store is 
persistent and potentially may contain triples from many graphs, whereas 
a graph just is a graph/set of triples - and, a store has functionality 
to remove triples, whereas a graph does not.

The other thing to note is that if we ignore the concept of an RDF Graph 
and instead use a Triple Store (or as we termed it, Data Store), will 
somebody else have to define the interface for an RDF Graph at a later 
date? and will our definition of a Store (possibly with no remove 
methods!) suit common usage of Stores in the wild, or will somebody have 
to define a better/more suited interface for a Store?


To summarise, we need to agree on the answers to the following questions:

- Are an RDF Graph and an RDF Triple Store distinct?
- Can we use an RDF Triple Store instead of an RDF Graph in the API?
- Should we use an RDF Triple Store instead of an RDF Graph in the API?
- Which of the three interfaces should we define as part of the RDFa 
API, [ RDF Graph, RDF Triple Store, RDF Graph Store ]
- Which of the three interfaces might we define as part of a note?


And after that rather lengthy email, here's my personal opinion:

  - Define an "RDF Graph" interface (aligned with Array in javascript)
  - Expose a property or method on the DataParser interface which gives 
access to the "processor graph" as required by RDFa Core.
  - Assert that we have two as-yet-undefined interfaces, "RDF Triple 
Store" and "RDF Graph Store"
  - Clear the issues and get the next editors draft of the RDFa API done.
  - If we have time, define one or both of the Triple Store and Graph 
Store interfaces, in a note.


However, we really need to all agree and move forwards on this matter,

Best,

Nathan
Received on Thursday, 4 November 2010 16:52:06 UTC