Re: [Graphs] Proposal: RDF Datasets from Pierre-Antoine Champin on 2011-08-22 (public-rdf-wg@w3.org from August 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Mon, 22 Aug 2011 21:43:38 +0200
To: Richard Cyganiak <richard@cyganiak.de>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4E52B16A.6050302@liris.cnrs.fr>
(sorry Richard for the duplicates, I keep sending those mails from the
wrong email address, so I get rejected by the list)

On 08/22/2011 06:54 PM, Richard Cyganiak wrote:
> Pierre-Antoine,
> 
> Thanks for picking this up again.
> 
> There are several things I don't like about [2].
> 
> 1. It is not an abstract syntax. It is a mix of concrete and abstract syntax. Thus it negates the benefits of having an abstract syntax in the first place. For example, one cannot really describe any operations over such a multigraph representation without appealing to the use of various syntax parsers. And one has to explain what happens if the serialized graph isn't valid in the respective syntax. Etc

I can see your point, but I think you are being a bit hash on that: the
proposal is completely independant on any concrete syntax. It only
requires that such a concrete syntax exists, which is not a strong
requirement...

> 2. It doesn't achieve the goal of standardisation. Different existing multigraph approaches (TriG, SPARQL, etc) would all look differently when expressed according to this proposal. Thus, it doesn't promote interoperability and doesn't actually make working with multiple graphs any easier.

Well, in RDF people can use multiple vocabularies to represent the same
domain... While this is not ideal for interoperability, this is key to
scalability. How is the domain of multi-graphs different from other
domains?

> 3. I feel that it is actually more complex than the RDF Dataset proposal [1] because it requires the definition of one predicate for every RDF graph serialization, 

it only *requires* one such predicate; it would probably lead to the
definition of as many predicates as there will be recommended concrete
syntaxes, which is still tractable.

> as well as additional vocabulary for every multigraph representation.

This is indeed more complex in a way, as the goal is not to provide a
single built-in multigraph representation, but the building blocks to
describe such representations.

> 4. It is clear that actually storing or serializing anything in that way would be a bad idea. 

You are probably right. However, this is not a problem created by my
proposal: I'm sure there are several naive ways to implement RDF 2004
which would prove to be bad ideas.

> Instead, one wants to use optimized syntaxes that can serialize the graph literals without “double serialization”, and optimized storage schemes that can actually store and index the parsed form of the graph literals. But if that is the case, then why not define an abstract syntax that actually reflects these concrete syntaxes and storage schemes?

Why not indeed. Again, my goal in this proposal was to represent the g-*
terminology with *minimal* change to the RDF syntax and model...

> 5. From a pure RDF modeling and semantics point of view, this proposal should use typed literals and not plain/xsd:string literals.

Do you mean: with the content-type (RDF/XML, Turtle...) as their
datatype? Why not, though I don't see how this is compellingly superior
to using specialized properties...


To sum it up: I agree that relying on concrete syntaxes is not elegant
from a theoretical point of view, nor practical for implementation. And
if one had to "interpret" it in order to implement it correctly, one
would probably end up with something that lools like a SPARQL dataset :)
Not sure they would end up with "URIs only as graph names" nor with a
"default graph", though...

I'll think about it.

 thanks for your feedback

  pa

> Best,
> Richard
> 
> 
> On 22 Aug 2011, at 16:12, Pierre-Antoine Champin wrote:
> 
>> As I promissed to Richard during the last TC, I'm reactivating the
>> thread on his proposal to "lift" the definition of RDF datasets into
>> from SPARQL to RDF concepts [1]
>>
>> My main concern with this proposal is that it defines a somewhat complex
>> structure (the dataset) as a primitive concept in RDF. My gut feeling is
>> that we could instead define more basic concepts, on top of which SPARQL
>> datasets, SPARQL graph stores, and possibly other structures, could be
>> defined. In my understanding, this is what the g-* terminology was
>> aiming at.
>>
>> In this perspective, back in June, I made an alternate proposal [2] for
>> which I got almost no feedback. In a nutshell, it provides a minimal
>> vocabulary for reifying RDF graphs into standard RDF, and sketches the
>> semantics of such a reification. From there, it illustrates how
>> multi-graphs syntaxes (such as Trig) and models (such as SPARQL
>> datasets) can be defined on top of it.
>>
>> I know that Richard was concerned about several multi-graph models had
>> slight differences (e.g. can a BNode be used as a graph name), and his
>> solution was to endorse one of them and wait for the others to converge.
>> My proposal is rather to provide the building blocks for everyone to
>> describe their model in RDF itself, and leave it open for different
>> models to coexist, which is ok as long as they can all be expressed in
>> plain RDF.
>>
>>  pa
>>
>>
>> [1] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>> [2] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Quadless-Proposal
>
Received on Monday, 22 August 2011 19:44:26 UTC