Re: [Graphs] Proposal: RDF Datasets from Pierre-Antoine Champin on 2011-08-23 (public-rdf-wg@w3.org from August 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Tue, 23 Aug 2011 17:06:21 +0200
To: Richard Cyganiak <richard@cyganiak.de>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4E53C1ED.6080305@liris.cnrs.fr>
Richard,

On 08/23/2011 01:43 PM, Richard Cyganiak wrote:
> Hi Pierre-Antoine,
> 
> Thought experiment. Let's assume there was some way to ensure that
> 
> 1. no two SPARQL stores share any graph names,
> 2. every SPARQL store has an IRI that allows talking about its contents.
> 
> (Don't ask me how -- it's a thought experiment.)

:-)

> I believe this would mean that collections of “quad-sets” can be “flattened” into just one “quad-set”, so quintuples are unnecessary to keep track of provenance of quad-sets.
> 
> Would you agree?

completely;

So, trying to sum it up:
we seem to agree that, in order to address the multi-graph use cases, we
need an abstract syntax allowing several datasets to be "embedded" (or
"projected") into a single dataset.

I proposed a triple-based abstract syntax (from RDF 2004), and achieved
the "embedding" by using an ugly mix of abstract and concrete syntax.

You proposed a quad-based abstract syntax (from SPARQL), and aim to
achieve the "embedding" by ensuring unique graph URIs.

All in all, you convinced me that finding a satisfying solution to the
graph-name-unicity problem is more likely than finding a satisfying
solution to the merging-abstract-and-concrete problem ;)

I'm therefore ready to move forward and discuss how this
graph-name-unicity problem cand be solved.

> More inline.

and some answers from me, just for the sake of the argumentation ;)

> 
> On 23 Aug 2011, at 09:51, Pierre-Antoine Champin wrote:
>> The only way out of the vicious circle is, IMHO, to specify a way to
>> "project" (N+1)-uples into N-uples, so that arbitrary level of embedding
>> can still be represented in the abstract syntax of RDF.
> 
> I'm sympathetic to that.
> 
>> My proposal was to work with N=3, as this is what we already have, and
>> attempted to define a way to perform that projection.
>>
>> You would prefer, it seems, to extend the abstract syntax to N=4, but
>> didn't make any proposal yet to project data with N>4 into that abstract
>> syntax.
> 
> I think this projection is trivial *if* all “quad publishers” synchronize their graph names to avoid overlaps.
> 
> That's a big “if” of course, 

indeed!

> and there is nothing in the RDF Datasets proposal that demands or even encourages such synchronization.

and if there was, that would *not* be a simple copy-paste of the SPARQL
proposal... This amounts to asking to people to use their SPARQL stores
*in a certain way*.

So your argument of ensuring interoperability by reusing an existing and
working solution seems to backfire a little bit...

> I'd be interested to learn about use cases that require N>4.

well, none of course, by the virtue of the flattening allowed by your
strong hypothesis above :)

But the kind of thing that I had in mind was like:

  <pa-says> = {
    <nostradamus-profecy> = {
      <end-of-world> <in-year> 2012 .
    }
  }

  <richard-says> = {
    <nostradamus-profecy> = {
      <end-of-world> <in-year> 2468 .
    }
  }


>> Note that this notion of "projecting" (N+1)-uples to N-uples does not
>> mean that we will never work with (N+1)-uples anymore... In my proposal,
>> the fact that Trig and SPARQL datasets *can* be expressed as a plain
>> triples does not mean that people *must* do it that way: they would
>> probably continue to serialize in Trig or store their datasets in
>> quad-store... But the "projection" ensures a level of interoperability
>> by allowing anyone working with the plain concrete syntax to retrieve,
>> store and handle more complex data, even if non-optimally.
> 
> “Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.” [1]
> 
> Your proposal is a triple tar-pit. We know that *everything* can be represented in triples, but that doesn't mean that doing so is useful. The representation that you propose isn't. Let's not standardize triple tar-pits.

Thanks for that reference, I didn't know it, and think it is useful to
keep it in mind.

But I also think that someone's tarpit is someone else's swimming pool.
Many people around me keep complaining about RDF being a triple-tarpit
already (without extending it to support multi-graphs, that is).

  pa

> 
> Best,
> Richard
> 
> [1] http://en.wikipedia.org/wiki/Turing_tarpit
Received on Tuesday, 23 August 2011 15:07:20 UTC