Re: giving up on datasets/trig as more than a web cache from Antoine Zimmermann on 2012-09-27 (public-rdf-wg@w3.org from September 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 27 Sep 2012 14:47:38 +0200
To: public-rdf-wg@w3.org
Message-ID: <50644AEA.8000900@emse.fr>
+1 to this.

Note that, even if we do not recommend putting metadata in the default 
graph, it's still possible to do it. Metadata in RDF is also data in 
RDF, so you can put it in a named graph or in a default graph.


Le 27/09/2012 14:09, Sandro Hawke a écrit :
> Recently, I've tried to argue that trig (or whatever it's called) needs
> to be able to carry distinguished metadata. This morning I've decided it
> doesn't, really, at least for the use cases I think about. My
> replacement idea is to think about trig as *just* being a Web Cache, as
> just a convenient shorthand for pairing a bunch of URLs and their RDF
> contents, so you can publish or fetch them all at once. I had been
> thinking about it as something else, as more of a first-class KR, but
> that doesn't seem to be flying. (I guess this is yet another hold-over
> from my years of working with N3.)
>
> Let's see if I can explain, for anyone else who might think a dataset
> could/should mean something more, and maybe myself, tomorrow.
>
> The use cases I think about are nearly all about data federation, the
> stuff I wrote about and implemented as a federated phonebook [1].
> They're all about data being gathered from original sources and
> processing systems and passed on toward data consumers, as a package, as
> a new combined-source. This seems to me like an incredibly important use
> case that requires standardization and something could really benefit
> from the idea of datasets and a dataset syntax.
>
> I envisioned it as a converging pipeline, starting with turtle files
> (rdf graphs) as the leaves, but then having trig files (rdf datasets) as
> the major trunks. The clients would always be getting a trig file (or
> using a sparql endpoint with the same dataset). For example, in 2.4 we
> get the situation where a division is gathering the data from its
> departments, and then passing them up to headquarters in one combined feed.
>
> But if the feed is trig, and one is going to be able to figure out what
> really came from where/when so that bugs and incorrect data can be
> addressed, then trig has to have distinguished metadata. And I hear a
> lot of people opposed to that, or at least opposed to any convenient was
> of supporting it, because SPARQL doesn't really have it. So, instead,
> how about we just make the main feed be turtle, and it only contains the
> metadata. All the data I was putting in named graphs stays out on the
> web, to be dereferenced by clients if they want.
>
> And then, for performance, if desired, the feed can also link to a trig
> file, saying "here, I've done all the fetching for you; if you're going
> to be dereferencing all this stuff anyway, you might as well take this
> instead". It can do the same with providing a SPARQL end-point,
> providing it for convenience/performance.
>
> *shrug* It should work fine. Maybe it's even better architecture. It
> certain means the name should not be "SuperTurtle", since now trig
> remains a fairly obscure/internal/dump format, and (unlike Turtle) can
> not actually be used to express data, other than simple pairings of URLs
> and graphs.
>
> -- Sandro
>
> [1]
> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html#use-cases
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 27 September 2012 12:48:08 UTC