Re: RDF Graphs and Stores Summary from Ivan Herman on 2010-11-05 (public-rdfa-wg@w3.org from November 2010)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 5 Nov 2010 11:28:37 +0100
To: nathan@webr3.org
Cc: RDFA Working Group <public-rdfa-wg@w3.org>, Manu Sporny <msporny@digitalbazaar.com>, Mark Birbeck <mark.birbeck@webbackplane.com>
Message-Id: <36544ADD-EA03-4D1D-8144-86B704DCABAE@w3.org>
Nathan,

first of all, thank you for writing down the issues...

My problem with what you write is that you seem to make assumptions that I would not make. You write, below:

[[[
an RDF Graph is in many ways (almost-) immutable, that is to say that an RDF Graph is a set of triples to which you can add more triples, but you cannot remove triples from
]]]

and I am not sure why that would be true. A Graph is a set of triples; as such, it should be possible to add _and_ to remove triples to and from a Graph. Whether the filter operation returns an array of triples or a separate Graph is a different issue. Also, whether merge means returning a new graph, or you can merge a graph into another one is again a different issues, maybe both operations should be available.

(I am a little bit afraid that you are influenced by the analogy of an array. Well, the array might be a way to implement a Graph in Javascript but, conceptually, it is a set. As such, removal and addition shouldn't be a problem.)

Bottom line: I do not see why making a separation between a Graph and a Triple Store. I do not see the value of differentiating between the two (but yes, we need the RDF Graph store abstraction, I agree with you!)

B.t.w., I looked up the (Python) RDFLib's Graph object interface as a comparison. Instead of cluttering this mail, I put this to the very end as a comparison.

With that, my answers on your questions are below (at the end of your mail:-)

On Nov 4, 2010, at 17:50 , Nathan wrote:

> Hi All,
> 
> Today in the weekly telecon we discussed ISSUE-52, and specifically a general conceptual issue which needs addressed before we can proceed with the open issues on the RDFa API.
> 
> The conceptual issue is whether to make a distinction between "RDF Graph" and "Data Store", and then determine which are required by the RDFa API.
> 
> I think we can safely consider this issue a blocker wrt the RDFa API, so the sooner we can agree the better.
> 
> To perhaps bring clarity to the issue, I'll assert that there are in fact three distinct concepts:
> 
> - RDF Graph
>   An interface representing a "set" of an RDF Triples, an RDF Graph as defined in the RDF Specifications
> 
> - RDF Graph Store
>   An interface which allows the storage and retrieval of several distinct RDF Graphs.
> 
> - RDF Triple Store
>   An interface which allows the storage and retrieval of RDF Triples, where the notion of distinct graphs has been discarded (and generally where provenance information has been removed).
> 
> 
> Any RDF Graph interface will be somewhat akin to an Array or Sequence of RDF Triples, and can roughly be specified as:
> 
>   interface RDFGraph = sequence<RDFTriple> (or RDFTriple[])
> 
> 
> Any RDF Graph Store interface will be similar to a Dictionary or simple Key/Value store which stores "RDF Graphs" against a certain key, this is similar to QuadStores and the notion of Named Graphs, roughly this would be specified as:
> 
>   interface GraphStore {
>     RDFGraph   get (in DOMString key);
>     void       set (in DOMString key, in RDFGraph value);
>   }
> 
> 
> However, and I believe this is where any "grey" conceptual areas have arisen in the existing RDFa API, an "RDF Triple Store" is very like an "RDF Graph", in fact it's almost identical other than the fact it's persistent in some way.
> 
>   interface TripleStore ~= RDFGraph
> 
> 
> So to begin addressing this issue, I'll first assert that the concepts of "RDF Graph" and "RDF Graph Store" are clearly distinct from each other, and that any RDFa or RDF API *requires* the concept of an "RDF Graph". Additionally, RDFa Core *requires* the RDFa API to /at least/ support two instances of RDF Graph, the "default graph" and the "processor graph", and to provide clear access to both.
> 
> However, the RDFa API does not require the concept of an "RDF Graph Store", although clearly such a thing exists, and is required if one is to store multiple rdf graphs, keeping them distinct from one another, and is required when using such features as the "FROM" clause in SPARQL, thus we may be wise to mention or define it in some way.
> 
> Similarly I think we can quickly assert that an "RDF Graph Store" is distinct from an "RDF Triple Store", one handles distinct sets of triples, graphs, the other handles a single set of triples, a single graph.
> 
> Thus, from this point on I'll remove "RDF Graph Store" from this discussion.
> 
> Remaining we have the slightly more complex "RDF Graph" vs "RDF Triple Store" distinction to make, where the grey area currently exists in the API.
> 
> Whilst both an RDF Graph and an RDF Triple Store share many common features, both "contain" sets of RDF Triples, and both need to provide access to the triples, I believe that there are key distinctions we can make between the two.
> 
> First, an RDF Triple Store is an interface to a Store, the store may hold triples in memory and in the same environment, and there may be multiple stores, but critically the stores may also be located in a different environment, on a different tier or on an entirely different machine all together - whereas an RDF Graph is an interface which simply represents a set of RDF Triples - it's the same distinction we make between an array and a database, they are quite different. However, we could also assert that an RDF Triple Store which we'd specify would be constrained to be in the same environment, in memory, and thus these distinguishing features would essentially be lost, so we have to look deeper.
> 
> Next, an RDF Graph is in many ways (almost-) immutable, that is to say that an RDF Graph is a set of triples to which you can add more triples, but you cannot remove triples from, also the key methods on an RDF Graph are immutable, a filter() will return a new RDF Graph (which can be considered a subgraph of the first), and a "merge" method will be more like a concatenation of two (or more) graphs, returning a third new graph. Whereas an RDF Triple Store has no immutable characteristics, it requires methods to both add and remove triples, a "filter" is more like a "select", and a "merge" method is more of an "import". The important thing to note here is that an RDF Triple store requires that any merge/import method add new triples to the store, whereas any similar method on a graph could be defined either way.
> 
> So, the key distinctions we have are that conceptually a store is persistent and potentially may contain triples from many graphs, whereas a graph just is a graph/set of triples - and, a store has functionality to remove triples, whereas a graph does not.
> 
> The other thing to note is that if we ignore the concept of an RDF Graph and instead use a Triple Store (or as we termed it, Data Store), will somebody else have to define the interface for an RDF Graph at a later date? and will our definition of a Store (possibly with no remove methods!) suit common usage of Stores in the wild, or will somebody have to define a better/more suited interface for a Store?
> 
> 
> To summarise, we need to agree on the answers to the following questions:
> 
> - Are an RDF Graph and an RDF Triple Store distinct?

No.

> - Can we use an RDF Triple Store instead of an RDF Graph in the API?

No. At that level of abstraction these are the same, and 'Graph' seems to be a logical name.

> - Should we use an RDF Triple Store instead of an RDF Graph in the API?

See above.

> - Which of the three interfaces should we define as part of the RDFa API, [ RDF Graph, RDF Triple Store, RDF Graph Store ]

RDF Graph, with the provision that two instances per document should be available, ie, the processor and the default graphs

> - Which of the three interfaces might we define as part of a note?
> 

RDF Graph Store

> 
> And after that rather lengthy email, here's my personal opinion:
> 
> - Define an "RDF Graph" interface (aligned with Array in javascript)

I am not sure whether the 'aligned with Array' is necessary. Conceptually, this is not an array nor should it be; it is a set. That being said, if Javascript is too poor to do it otherwise, I yield to you guys' wisdom on that!

> - Expose a property or method on the DataParser interface which gives access to the "processor graph" as required by RDFa Core.

... and the default graph, yes.

> - Assert that we have two as-yet-undefined interfaces, "RDF Triple Store" and "RDF Graph Store"

Only the Graph Store

> - Clear the issues and get the next editors draft of the RDFa API done.

of course:-)

> - If we have time, define one or both of the Triple Store and Graph Store interfaces, in a note.
> 

... and maybe some additional methods and interfaces that the RDFa API does not really need but a more general RDF does. I think we should really plan for that note, that can be written once the RDFa API is, say, in last call or candidate rec.

Ivan

P.S. Here is the rough RDFLib Graph interface translation, for comparison

class Graph :
	add()		# add a triple
	remove()	# remove a triple
	triples()	# an 'iterator', ie, it returns
			# and array of (s,p,o) tuples
			# the request can use a simple filter
			# by specifying a specific s, p, or o value
			# or None for 'anything'
	query()		# essentially an entry point of a sparql query on
			# that graph, returning an array of (s,p,o)
	serialize()	# returns a string serializing the graph 
			# in turtle, xml, etc
	parse()		# parse a source in a specific format into the graph
	__add__()	# a Python idiom to allow for a g = g1 + g2 
			# operation for a merge
	__iadd__()	# a Python idiom to allow for a g += g1 operation
			# for a merge

there are some other methods, like shorthands, but these operations are the essential one for each graph. This is not really different than what we have, actually...



----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Friday, 5 November 2010 10:27:45 UTC