- From: Sarven Capadisli <sarven.capadisli@deri.org>
- Date: Sat, 07 Dec 2013 12:32:26 +0100
- To: Richard Cyganiak <richard@cyganiak.de>
- CC: Benedikt Kaempgen <kaempgen@fzi.de>, Dave Reynolds <dave.e.reynolds@gmail.com>, "GLD Working Group (Government Linked Data)" <public-gld-wg@w3.org>
On 12/06/2013 09:55 PM, Richard Cyganiak wrote: > Hi Sarven, > > I know it’s taken as an axiom that we should publish our RDF graphs > with dereferenceable URIs and with links contained in the > representations that allow us to navigate along the graph’s edges. > But let me question the dogma. By all means, this is great. Actually, this is practically a continuation of our chat off-list which you might recall. One main difference was that, IIRC, our positions on the discoverability was reversed - in fact, you pointed me to the LDP issue when I stated my interface engineering issues with the arcs from slices to observations. Perhaps we've convinced the other? :) > On 6 Dec 2013, at 12:10, Sarven Capadisli <sarven.capadisli@deri.org> > wrote: >> It is an interface problem. Whether there is an HTML or an RDF >> representation, how do we reasonably let the consumer reach to each >> observation? > > What kind of consumer wants to reach each observation from the > dataset, and why? > > Seriously. What useful activity can be achieved that way? Right now, I can only think of crawling, without breaking my head over the possibilities. And, one group that carries out that action are the search services. Aside: I'm quite skeptical on how the crawlers currently arrive at my observation URIs i.e., something tells me that the search services first get a hold of the data dumps and then poke each URI within. In any case, the question you raise is important. I would love to know more from the community or application developers sharing their experiences. So for all: How do you get a hold of QB observations in your application? > To be honest, I can’t think of anything, except for really small > datasets that have only a handful of observations, or for general > debugging and checking-that-all-the-URIs-link-up. > > Useful RDF-consuming clients don’t work by “crawling” the RDF graph. > It’s way too slow because it requires way too many HTTP requests. > RDF-consuming clients are usually more or less custom-built apps that > interact with the data by extracting parts of it for further > processing, and they almost always do that via SPARQL. So the > interfaces they need are SPARQL endpoints, or dumps that you can load > into your own SPARQL endpoint. I am also assuming that those are the most common approaches. > Useful HTML interfaces to a data cube usually allow interaction with > the entire cube, or large chunks of it, at a time, either as a table > or as a chart of some sort. They may provide a way of accessing > individual observations, but you access them by clicking on a cell in > a table or on a slice in a pie chart. You don’t access them by > interacting with a list (paginated or not) of all resources. > > Tell me if I’m wrong, but I’m not aware of any use of QB that relies > on the kind of dataset-to-(slice-to-)observation navigation that you > describe. Point well taken. I don't disagree on how an useful HTML interface to show large chunks of the data may be. Let me pose this: how do you imagine a table of cells showing for a dataset? Isn't that simply one particular view, and the possibility to interact (navigating to an observation)? I see no essential difference between showing a list or a table of observations, whether they are "clickable" or not. > I agree that a discovery mechanism that allows us to automatically > assemble an entire dataset from its web-published parts is really > important. But I think instead of focusing on direct dereferenceable > URIs, we’d be much better off focusing on VoID as the glue that > points from the qb:DataSet instance to the mechanisms where the > entire dataset can be accessed. A good approach would be to look for > void:sparqlEndpoint or void:dataDump triples attached to the dataset > URI. Possibly looking for void:subsets too. I'm with you on that with the exception that the focus should remain broader (or at least should not look the other way when it comes to basic functionality). Aside: it is another side-benefit of having qb:DataSet and void:Dataset on the same resource. In fact, I even go another step and consider that qb:DataSet, void:Dataset, prov:Entity, and sd:Graph can be attached to the same resource. >> Under normal circumstances, the consumer should probably get to the >> bulk of the dataset through discovering qb:DataSet or void:Dataset. >> When this is not possible for non-trivial datasets, applications >> end up resorting to SPARQL - which is not necessarily a "bad" >> thing, but still can be a PITA in comparison to dereferencing. > > Bulk downloads cause the least PITA, compared to SPARQL *or* > dereferencing, IMHO. Agreed. I was only considering and comparing single call methods (without additional processing) to get to the data. > Personally, I’m a believer in good web-friendly machine-readable > metadata that points to bulk downloads. Hence my involvement with > VoID, DCAT, and Tarql, and cheerleading for stuff like SPARQL-SD, > GraphPusher and HDT. Bulk downloads of up to a few Mtriples can be > downloaded, indexed and queried in near real time on a home broadband > connection and laptop machine, these days. > > Best, Richard > GraphPusher! Ha, someone should get back on that ;) I think the core of the discussion raises the following and it would be good to have some real-world scenarios documented: What kind of clients out there would like to be able to navigate from a dataset to its observations, and for what purpose? (Are there any surveys on this already or anyone planning on it? Hint hint.) Not meant to be a serious proposal, but is there any real harm (I don't mean costs) in having a property from a dataset to its observations e.g., qb:obs (too bad qb:observation is taken :P ), rdfs:seeAlso, or maybe already possible with qb:observationGroup? - I'm a bit unclear on this one. -Sarven http://csarven.ca/#i
Received on Saturday, 7 December 2013 11:32:55 UTC