Re: ORG browser from Richard Cyganiak on 2013-12-06 (public-gld-wg@w3.org from December 2013)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 6 Dec 2013 20:55:50 +0000
To: Sarven Capadisli <sarven.capadisli@deri.org>
Cc: Benedikt Kaempgen <kaempgen@fzi.de>, Dave Reynolds <dave.e.reynolds@gmail.com>, "GLD Working Group (Government Linked Data)" <public-gld-wg@w3.org>
Message-Id: <00E22031-2B7D-4684-AC4E-A399A3F1C516@cyganiak.de>

Hi Sarven,

I know it’s taken as an axiom that we should publish our RDF graphs with dereferenceable URIs and with links contained in the representations that allow us to navigate along the graph’s edges. But let me question the dogma.

On 6 Dec 2013, at 12:10, Sarven Capadisli <sarven.capadisli@deri.org> wrote:
> It is an interface problem. Whether there is an HTML or an RDF representation, how do we reasonably let the consumer reach to each observation?

What kind of consumer wants to reach each observation from the dataset, and why?

Seriously. What useful activity can be achieved that way?

To be honest, I can’t think of anything, except for really small datasets that have only a handful of observations, or for general debugging and checking-that-all-the-URIs-link-up.

Useful RDF-consuming clients don’t work by “crawling” the RDF graph. It’s way too slow because it requires way too many HTTP requests. RDF-consuming clients are usually more or less custom-built apps that interact with the data by extracting parts of it for further processing, and they almost always do that via SPARQL. So the interfaces they need are SPARQL endpoints, or dumps that you can load into your own SPARQL endpoint.

Useful HTML interfaces to a data cube usually allow interaction with the entire cube, or large chunks of it, at a time, either as a table or as a chart of some sort. They may provide a way of accessing individual observations, but you access them by clicking on a cell in a table or on a slice in a pie chart. You don’t access them by interacting with a list (paginated or not) of all resources.

Tell me if I’m wrong, but I’m not aware of any use of QB that relies on the kind of dataset-to-(slice-to-)observation navigation that you describe.

I agree that a discovery mechanism that allows us to automatically assemble an entire dataset from its web-published parts is really important. But I think instead of focusing on direct dereferenceable URIs, we’d be much better off focusing on VoID as the glue that points from the qb:DataSet instance to the mechanisms where the entire dataset can be accessed. A good approach would be to look for void:sparqlEndpoint or void:dataDump triples attached to the dataset URI. Possibly looking for void:subsets too.

> Under normal circumstances, the consumer should probably get to the bulk of the dataset through discovering qb:DataSet or void:Dataset. When this is not possible for non-trivial datasets, applications end up resorting to SPARQL - which is not necessarily a "bad" thing, but still can be a PITA in comparison to dereferencing.

Bulk downloads cause the least PITA, compared to SPARQL *or* dereferencing, IMHO.

Personally, I’m a believer in good web-friendly machine-readable metadata that points to bulk downloads. Hence my involvement with VoID, DCAT, and Tarql, and cheerleading for stuff like SPARQL-SD, GraphPusher and HDT. Bulk downloads of up to a few Mtriples can be downloaded, indexed and queried in near real time on a home broadband connection and laptop machine, these days.

Best,
Richard

Received on Friday, 6 December 2013 20:56:11 UTC