Re: [Graphs] Proposal: RDF Datasets from Pierre-Antoine Champin on 2011-09-01 (public-rdf-wg@w3.org from September 2011)

From: Pierre-Antoine Champin <pierre-antoine@champin.net>
Date: Thu, 01 Sep 2011 15:45:07 +0000
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "Ivan Herman , antoine.zimmermann@insa-lyon.fr , public-rdf-wg@w3.org" <ivan@w3.org>
Message-Id: <4E5FA852.4070101@champin.net>
On 08/27/2011 06:39 PM, Richard Cyganiak wrote:
> On 27 Aug 2011, at 06:39, Ivan Herman wrote:
>> http://www.w3.org/2009/07/NamedGraph.html
> 
> You could have told us about that earlier Ivan!

Indeed! :)

> /me prints a copy

same here...

A quick look at this document helped me realize something (that I should
have realized earlier, when Richard suggested that I used datatyped
literals):

using datatyped literals, rather than "plain" strings, to describe graph
literals, has a very nice feature: it puts the concrete syntax (used to
described the graph) back where it belongs: outside of the abstract syntax.

Take the following example:

1  <#pa> :believes """
2    @prefix : <some-uri#> .
3    :graph-literals :are :easy.
4  """^^rdfl:graphLiteral .

The abstract syntax of the turtle above knows nothing of lines 2-3. In
fact, I could have written:

1  <#pa> :believes """
2    @prefix foo: <some-uri#> .
3    @prefix bar: <some-uri#> .
4    foo:graph-literals
5       <some-uri#are>
6      bar:easy
7        .
8  """^^rdfl:graphLiteral .

and that would be (from the abstract syntax POV) *exactly* the same
graph, just like

 :a :b "00000001"^^xsd:integer .

and

 :a :b 1 .

are exactly the same triple.

That being said, I hear Antoine's arguments that this is out of the
scope of the group, and should rather be explored as a research work.
I'm still open for both options.

 pa

> 
> Best,
> Richard
> 
> 
> 
>> 
>> On Aug 26, 2011, at 18:39 , Antoine Zimmermann wrote:
>> 
>>> Pierre-Antoine,
>>> 
>>> 
>>> I am in total agreement with what Richard says below. However, I sympathise to some extent with your idea. I would be interested to see some people define a datatype for serialised graphs, say in Turtle. Then, they should brainstorm a few use cases and implement some tools around this proposal and see how things are going, gather experiences and come back in a few year with a report and possibly a proposal for standardisation.
>>> 
>>> Start by defining a datatype for Turtle graph literals:
>>> - lexical space is the set of valid Turtle documents;
>>> - value space is the set of RDF graphs;
>>> - L2V is the mapping from Turtle to RDF graph, as defined in th Turtle spec.
>>> 
>>> Of course, you can do the same for other syntaxes, but I think Turtle best fits.
>>> 
>>> Then you may need to introduce a set of terms like rdf:Graph, rdf:serialisation, etc... This set of terms should be crafted in function of the experience that the group gather by trying to deal with their use cases.
>>> 
>>> BUT, this is certainly not something that should be done within this working group.
>>> 
>> 
>> A few years ago I had an attempt to do something like that
>> 
>> http://www.w3.org/2009/07/NamedGraph.html
>> 
>> but then, somehow, I did not _really_ finish it and I am sure it is full of rubbish, too, mainly on the semantics side. But my basic approach in terms of the serialization of this stuff was much more restrictive than Pierre-Antoine's, namely that within a specific serialization one can use only the same serialization for a graph. I indeed do not see why one would allow to use, say, RDF/XML to encode a graph literal when one is in Turtle...
>> 
>> Although the document is there, I am _not_ sure this is something this WG has to really take up. This is still open in my mind.
>> 
>> Ivan
>> 
>>> 
>>> AZ.
>>> 
>>> Le 22/08/2011 18:54, Richard Cyganiak a écrit :
>>>> Pierre-Antoine,
>>>> 
>>>> Thanks for picking this up again.
>>>> 
>>>> There are several things I don't like about [2].
>>>> 
>>>> 1. It is not an abstract syntax. It is a mix of concrete and abstract
>>>> syntax. Thus it negates the benefits of having an abstract syntax in
>>>> the first place. For example, one cannot really describe any
>>>> operations over such a multigraph representation without appealing to
>>>> the use of various syntax parsers. And one has to explain what
>>>> happens if the serialized graph isn't valid in the respective syntax.
>>>> Etc
>>>> 
>>>> 2. It doesn't achieve the goal of standardisation. Different existing
>>>> multigraph approaches (TriG, SPARQL, etc) would all look differently
>>>> when expressed according to this proposal. Thus, it doesn't promote
>>>> interoperability and doesn't actually make working with multiple
>>>> graphs any easier.
>>>> 
>>>> 3. I feel that it is actually more complex than the RDF Dataset
>>>> proposal [1] because it requires the definition of one predicate for
>>>> every RDF graph serialization, as well as additional vocabulary for
>>>> every multigraph representation.
>>>> 
>>>> 4. It is clear that actually storing or serializing anything in that
>>>> way would be a bad idea. Instead, one wants to use optimized syntaxes
>>>> that can serialize the graph literals without “double serialization”,
>>>> and optimized storage schemes that can actually store and index the
>>>> parsed form of the graph literals. But if that is the case, then why
>>>> not define an abstract syntax that actually reflects these concrete
>>>> syntaxes and storage schemes?
>>>> 
>>>> 5. From a pure RDF modeling and semantics point of view, this
>>>> proposal should use typed literals and not plain/xsd:string
>>>> literals.
>>>> 
>>>> Best, Richard
>>>> 
>>>> 
>>>> On 22 Aug 2011, at 16:12, Pierre-Antoine Champin wrote:
>>>> 
>>>>> As I promissed to Richard during the last TC, I'm reactivating the
>>>>> thread on his proposal to "lift" the definition of RDF datasets
>>>>> into from SPARQL to RDF concepts [1]
>>>>> 
>>>>> My main concern with this proposal is that it defines a somewhat
>>>>> complex structure (the dataset) as a primitive concept in RDF. My
>>>>> gut feeling is that we could instead define more basic concepts, on
>>>>> top of which SPARQL datasets, SPARQL graph stores, and possibly
>>>>> other structures, could be defined. In my understanding, this is
>>>>> what the g-* terminology was aiming at.
>>>>> 
>>>>> In this perspective, back in June, I made an alternate proposal [2]
>>>>> for which I got almost no feedback. In a nutshell, it provides a
>>>>> minimal vocabulary for reifying RDF graphs into standard RDF, and
>>>>> sketches the semantics of such a reification. From there, it
>>>>> illustrates how multi-graphs syntaxes (such as Trig) and models
>>>>> (such as SPARQL datasets) can be defined on top of it.
>>>>> 
>>>>> I know that Richard was concerned about several multi-graph models
>>>>> had slight differences (e.g. can a BNode be used as a graph name),
>>>>> and his solution was to endorse one of them and wait for the others
>>>>> to converge. My proposal is rather to provide the building blocks
>>>>> for everyone to describe their model in RDF itself, and leave it
>>>>> open for different models to coexist, which is ok as long as they
>>>>> can all be expressed in plain RDF.
>>>>> 
>>>>> pa
>>>>> 
>>>>> 
>>>>> [1]
>>>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>>>>> [2]
>>>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Quadless-Proposal
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Antoine Zimmermann
>>> Researcher at:
>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>> Database Group
>>> 7 Avenue Jean Capelle
>>> 69621 Villeurbanne Cedex
>>> France
>>> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
>>> Lecturer at:
>>> Institut National des Sciences Appliquées de Lyon
>>> 20 Avenue Albert Einstein
>>> 69621 Villeurbanne Cedex
>>> France
>>> antoine.zimmermann@insa-lyon.fr
>>> http://zimmer.aprilfoolsreview.com/
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
>
Received on Wednesday, 7 September 2011 08:46:40 UTC