Re: giving up on datasets/trig as more than a web cache from Steve Harris on 2012-09-27 (public-rdf-wg@w3.org from September 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Thu, 27 Sep 2012 14:12:02 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: W3C RDF WG <public-rdf-wg@w3.org>
Message-Id: <A4480583-0D00-4D85-A353-D9E9E2E0DD2E@garlik.com>

Good, I believe that reflects how people actually use TriG in the real world.

- Steve

On 2012-09-27, at 13:09, Sandro Hawke wrote:

> Recently, I've tried to argue that trig (or whatever it's called) needs to be able to carry distinguished metadata.  This morning I've decided it doesn't, really, at least for the use cases I think about.   My replacement idea is to think about trig as *just* being a Web Cache, as just a convenient shorthand for pairing a bunch of URLs and their RDF contents, so you can publish or fetch them all at once.   I had been thinking about it as something else, as more of a first-class KR, but that doesn't seem to be flying.  (I guess this is yet another hold-over from my years of working with N3.)
> 
> Let's see if I can explain, for anyone else who might think a dataset could/should mean something more, and maybe myself, tomorrow.
> 
> The use cases I think about are nearly all about data federation, the stuff I wrote about and implemented as a federated phonebook [1].   They're all about data being gathered from original sources and processing systems and passed on toward data consumers, as a package, as a new combined-source.   This seems to me like an incredibly important use case that requires standardization and something could really benefit from the idea of datasets and a dataset syntax.
> 
> I envisioned it as a converging pipeline, starting with turtle files (rdf graphs) as the leaves, but then having trig files (rdf datasets) as the major trunks.   The clients would always be getting a trig file (or using a sparql endpoint with the same dataset). For example, in 2.4 we get the situation where a division is gathering the data from its departments, and then passing them up to headquarters in one combined feed.
> 
> But if the feed is trig, and one is going to be able to figure out what really came from where/when so that bugs and incorrect data can be addressed, then trig has to have distinguished metadata.    And I hear a lot of people opposed to that, or at least opposed to any convenient was of supporting it, because SPARQL doesn't really have it.   So, instead, how about we just make the main feed be turtle, and it only contains the metadata.  All the data I was putting in named graphs stays out on the web, to be dereferenced by clients if they want.
> 
> And then, for performance, if desired, the feed can also link to a trig file, saying "here, I've done all the fetching for you; if you're going to be dereferencing all this stuff anyway, you might as well take this instead".    It can do the same with providing a SPARQL end-point, providing it for convenience/performance.
> 
> *shrug*    It should work fine.    Maybe it's even better architecture.    It certain means the name should not be "SuperTurtle", since now trig remains a fairly obscure/internal/dump format, and (unlike Turtle) can not actually be used to express data, other than simple pairings of URLs and graphs.
> 
>    -- Sandro
> 
> [1] http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html#use-cases
> 

-- 
Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL

Received on Thursday, 27 September 2012 13:12:37 UTC