Re: [Graphs] Proposal: RDF Datasets

Date: Sat, 27 Aug 2011 07:39:20 +0200
On Aug 26, 2011, at 18:39 , Antoine Zimmermann wrote:

> Pierre-Antoine,
> I am in total agreement with what Richard says below. However, I sympathise to some extent with your idea. I would be interested to see some people define a datatype for serialised graphs, say in Turtle. Then, they should brainstorm a few use cases and implement some tools around this proposal and see how things are going, gather experiences and come back in a few year with a report and possibly a proposal for standardisation.
> Start by defining a datatype for Turtle graph literals:
> - lexical space is the set of valid Turtle documents;
> - value space is the set of RDF graphs;
> - L2V is the mapping from Turtle to RDF graph, as defined in th Turtle spec.
> Of course, you can do the same for other syntaxes, but I think Turtle best fits.
> Then you may need to introduce a set of terms like rdf:Graph, rdf:serialisation, etc... This set of terms should be crafted in function of the experience that the group gather by trying to deal with their use cases.
> BUT, this is certainly not something that should be done within this working group.

A few years ago I had an attempt to do something like that


but then, somehow, I did not _really_ finish it and I am sure it is full of rubbish, too, mainly on the semantics side. But my basic approach in terms of the serialization of this stuff was much more restrictive than Pierre-Antoine's, namely that within a specific serialization one can use only the same serialization for a graph. I indeed do not see why one would allow to use, say, RDF/XML to encode a graph literal when one is in Turtle...

Although the document is there, I am _not_ sure this is something this WG has to really take up. This is still open in my mind.


> AZ.
> Le 22/08/2011 18:54, Richard Cyganiak a écrit :
>> Pierre-Antoine,
>> Thanks for picking this up again.
>> There are several things I don't like about [2].
>> 1. It is not an abstract syntax. It is a mix of concrete and abstract
>> syntax. Thus it negates the benefits of having an abstract syntax in
>> the first place. For example, one cannot really describe any
>> operations over such a multigraph representation without appealing to
>> the use of various syntax parsers. And one has to explain what
>> happens if the serialized graph isn't valid in the respective syntax.
>> Etc
>> 2. It doesn't achieve the goal of standardisation. Different existing
>> multigraph approaches (TriG, SPARQL, etc) would all look differently
>> when expressed according to this proposal. Thus, it doesn't promote
>> interoperability and doesn't actually make working with multiple
>> graphs any easier.
>> 3. I feel that it is actually more complex than the RDF Dataset
>> proposal [1] because it requires the definition of one predicate for
>> every RDF graph serialization, as well as additional vocabulary for
>> every multigraph representation.
>> 4. It is clear that actually storing or serializing anything in that
>> way would be a bad idea. Instead, one wants to use optimized syntaxes
>> that can serialize the graph literals without “double serialization”,
>> and optimized storage schemes that can actually store and index the
>> parsed form of the graph literals. But if that is the case, then why
>> not define an abstract syntax that actually reflects these concrete
>> syntaxes and storage schemes?
>> 5. From a pure RDF modeling and semantics point of view, this
>> proposal should use typed literals and not plain/xsd:string
>> literals.
>> Best, Richard
>> On 22 Aug 2011, at 16:12, Pierre-Antoine Champin wrote:
>>> As I promissed to Richard during the last TC, I'm reactivating the
>>> thread on his proposal to "lift" the definition of RDF datasets
>>> into from SPARQL to RDF concepts [1]
>>> My main concern with this proposal is that it defines a somewhat
>>> complex structure (the dataset) as a primitive concept in RDF. My
>>> gut feeling is that we could instead define more basic concepts, on
>>> top of which SPARQL datasets, SPARQL graph stores, and possibly
>>> other structures, could be defined. In my understanding, this is
>>> what the g-* terminology was aiming at.
>>> In this perspective, back in June, I made an alternate proposal [2]
>>> for which I got almost no feedback. In a nutshell, it provides a
>>> minimal vocabulary for reifying RDF graphs into standard RDF, and
>>> sketches the semantics of such a reification. From there, it
>>> illustrates how multi-graphs syntaxes (such as Trig) and models
>>> (such as SPARQL datasets) can be defined on top of it.
>>> I know that Richard was concerned about several multi-graph models
>>> had slight differences (e.g. can a BNode be used as a graph name),
>>> and his solution was to endorse one of them and wait for the others
>>> to converge. My proposal is rather to provide the building blocks
>>> for everyone to describe their model in RDF itself, and leave it
>>> open for different models to coexist, which is ok as long as they
>>> can all be expressed in plain RDF.
>>> pa
>>> [1]
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>>> [2]
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Quadless-Proposal
