Re: giving up on datasets/trig as more than a web cache from Sandro Hawke on 2012-09-28 (public-rdf-wg@w3.org from September 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 28 Sep 2012 08:25:52 -0400
To: public-rdf-wg@w3.org
Message-ID: <50659750.70204@w3.org>
On 09/28/2012 08:04 AM, Yves Raimond wrote:
> +1 to this as well. "Dataset metadata" can perfectly be put in its own named graph.

I stand by my point that if you want trig to be useful for expressing 
general knowledge -- to be an extension of Turtle, etc -- then it has to 
have some kind of distinguished graph, some triples that carry that 
knowledge.

My point in this thread is that I not longer see any great need for trig 
to be used that way.   My uses cases for datasets can just be addressed 
with the Web.

So the only use cases I see left for trig are (1) SPARQL dumps and (2) 
snapshots of RDF on the Web.  This turns the design space from being 
painfully over-constrained to being one that just requires a few 
painless coin-flips.

        -- Sandro

> Best,
> y
>
> On Thu, Sep 27, 2012 at 02:47:38PM +0200, Antoine Zimmermann wrote:
>> +1 to this.
>>
>> Note that, even if we do not recommend putting metadata in the
>> default graph, it's still possible to do it. Metadata in RDF is also
>> data in RDF, so you can put it in a named graph or in a default
>> graph.
>>
>>
>> Le 27/09/2012 14:09, Sandro Hawke a écrit :
>>> Recently, I've tried to argue that trig (or whatever it's called) needs
>>> to be able to carry distinguished metadata. This morning I've decided it
>>> doesn't, really, at least for the use cases I think about. My
>>> replacement idea is to think about trig as *just* being a Web Cache, as
>>> just a convenient shorthand for pairing a bunch of URLs and their RDF
>>> contents, so you can publish or fetch them all at once. I had been
>>> thinking about it as something else, as more of a first-class KR, but
>>> that doesn't seem to be flying. (I guess this is yet another hold-over
>> >from my years of working with N3.)
>>> Let's see if I can explain, for anyone else who might think a dataset
>>> could/should mean something more, and maybe myself, tomorrow.
>>>
>>> The use cases I think about are nearly all about data federation, the
>>> stuff I wrote about and implemented as a federated phonebook [1].
>>> They're all about data being gathered from original sources and
>>> processing systems and passed on toward data consumers, as a package, as
>>> a new combined-source. This seems to me like an incredibly important use
>>> case that requires standardization and something could really benefit
>> >from the idea of datasets and a dataset syntax.
>>> I envisioned it as a converging pipeline, starting with turtle files
>>> (rdf graphs) as the leaves, but then having trig files (rdf datasets) as
>>> the major trunks. The clients would always be getting a trig file (or
>>> using a sparql endpoint with the same dataset). For example, in 2.4 we
>>> get the situation where a division is gathering the data from its
>>> departments, and then passing them up to headquarters in one combined feed.
>>>
>>> But if the feed is trig, and one is going to be able to figure out what
>>> really came from where/when so that bugs and incorrect data can be
>>> addressed, then trig has to have distinguished metadata. And I hear a
>>> lot of people opposed to that, or at least opposed to any convenient was
>>> of supporting it, because SPARQL doesn't really have it. So, instead,
>>> how about we just make the main feed be turtle, and it only contains the
>>> metadata. All the data I was putting in named graphs stays out on the
>>> web, to be dereferenced by clients if they want.
>>>
>>> And then, for performance, if desired, the feed can also link to a trig
>>> file, saying "here, I've done all the fetching for you; if you're going
>>> to be dereferencing all this stuff anyway, you might as well take this
>>> instead". It can do the same with providing a SPARQL end-point,
>>> providing it for convenience/performance.
>>>
>>> *shrug* It should work fine. Maybe it's even better architecture. It
>>> certain means the name should not be "SuperTurtle", since now trig
>>> remains a fairly obscure/internal/dump format, and (unlike Turtle) can
>>> not actually be used to express data, other than simple pairings of URLs
>>> and graphs.
>>>
>>> -- Sandro
>>>
>>> [1]
>>> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html#use-cases
>>>
>>>
>> -- 
>> Antoine Zimmermann
>> ISCOD / LSTI - Institut Henri Fayol
>> École Nationale Supérieure des Mines de Saint-Étienne
>> 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66
>> http://zimmer.aprilfoolsreview.com/
>>
>
Received on Friday, 28 September 2012 12:26:03 UTC