AW: Discovering QB observations (Was: Re: ORG browser)

Hello,

Thanks for your thoughts.

> Tell me if I’m wrong, but I’m not aware of any use of QB that relies on the kind of dataset-to-(slice-to-)observation navigation that you describe.

Since in the Linked Data Cubes Explorer [1] we rely on resolveable URIs, I am looking for recommendations of how to fetch all relevant content for a qb:DataSet.

>From the discussions, I see the following possibilities that the application should try:

* Resolving the dataset URI also returns blank nodes of observations including their outgoing links such as to the dataset or dimension values.
* Resolving the dataset URI returns incoming links from observation URIs that can be resolved, separately, to retrieve their outgoing links.
* Resolving the dataset URI returns outgoing links to slices, that in turn link to either blank nodes of observations with their outgoing links or to observation URIs that can be resolved for their outgoing links.
* Resolving the dataset URI returns outgoing links void:sparqlEndpoint or void:dataDump that lead to RDF representations of all observations.

> If you mean there is no link from a qb:DataSet to each of its own 
qb:Observations, that's true. But, there are slices in some 270.info 
datasets. Can you clarify? Is there a particular dataset you are looking 
at or are you only referring to the Linked SDMX implementation? Would 
you mind creating an issue https://github.com/csarven/linked-sdmx if 
there is a fundamental problem with the transformation? Much appreciated!

Thanks, Sarven. If your datasets provide one of the above possibilities to reach the observations via lookups that is great.

Not related to observations but metadata: If I only want to crawl the metadata of your datasets, however, I see a problem with blank nodes. 

For instance, dataset http://oecd.270a.info/dataset/HEALTH_STAT links to its structure http://oecd.270a.info/structure/HEALTH_STAT that however only links to blank nodes of components but does not return their outgoing links. Can you tell me how I would find such outgoing links?

Best,

Benedikt

[1] <http://www.ldcx.linked-data-cubes.org:8000/ldcx-trunk/ldcx/ld-cubes-explorer.html>

________________________________________
Von: Sarven Capadisli [sarven.capadisli@deri.org]
Gesendet: Samstag, 7. Dezember 2013 12:32
An: Richard Cyganiak
Cc: Benedikt Kaempgen; Dave Reynolds; GLD Working Group (Government Linked Data)
Betreff: Discovering QB observations (Was: Re: ORG browser)

On 12/06/2013 09:55 PM, Richard Cyganiak wrote:
> Hi Sarven,
>
> I know it’s taken as an axiom that we should publish our RDF graphs
> with dereferenceable URIs and with links contained in the
> representations that allow us to navigate along the graph’s edges.
> But let me question the dogma.

By all means, this is great. Actually, this is practically a
continuation of our chat off-list which you might recall. One main
difference was that, IIRC, our positions on the discoverability was
reversed - in fact, you pointed me to the LDP issue when I stated my
interface engineering issues with the arcs from slices to observations.
Perhaps we've convinced the other? :)

> On 6 Dec 2013, at 12:10, Sarven Capadisli <sarven.capadisli@deri.org>
> wrote:
>> It is an interface problem. Whether there is an HTML or an RDF
>> representation, how do we reasonably let the consumer reach to each
>> observation?
>
> What kind of consumer wants to reach each observation from the
> dataset, and why?
>
> Seriously. What useful activity can be achieved that way?

Right now, I can only think of crawling, without breaking my head over
the possibilities. And, one group that carries out that action are the
search services.

Aside: I'm quite skeptical on how the crawlers currently arrive at my
observation URIs i.e., something tells me that the search services first
get a hold of the data dumps and then poke each URI within.

In any case, the question you raise is important. I would love to know
more from the community or application developers sharing their
experiences. So for all:

How do you get a hold of QB observations in your application?

> To be honest, I can’t think of anything, except for really small
> datasets that have only a handful of observations, or for general
> debugging and checking-that-all-the-URIs-link-up.
>
> Useful RDF-consuming clients don’t work by “crawling” the RDF graph.
> It’s way too slow because it requires way too many HTTP requests.
> RDF-consuming clients are usually more or less custom-built apps that
> interact with the data by extracting parts of it for further
> processing, and they almost always do that via SPARQL. So the
> interfaces they need are SPARQL endpoints, or dumps that you can load
> into your own SPARQL endpoint.

I am also assuming that those are the most common approaches.

> Useful HTML interfaces to a data cube usually allow interaction with
> the entire cube, or large chunks of it, at a time, either as a table
> or as a chart of some sort. They may provide a way of accessing
> individual observations, but you access them by clicking on a cell in
> a table or on a slice in a pie chart. You don’t access them by
> interacting with a list (paginated or not) of all resources.
>
> Tell me if I’m wrong, but I’m not aware of any use of QB that relies
> on the kind of dataset-to-(slice-to-)observation navigation that you
> describe.

Point well taken. I don't disagree on how an useful HTML interface to
show large chunks of the data may be. Let me pose this: how do you
imagine a table of cells showing for a dataset? Isn't that simply one
particular view, and the possibility to interact (navigating to an
observation)? I see no essential difference between showing a list or a
table of observations, whether they are "clickable" or not.

> I agree that a discovery mechanism that allows us to automatically
> assemble an entire dataset from its web-published parts is really
> important. But I think instead of focusing on direct dereferenceable
> URIs, we’d be much better off focusing on VoID as the glue that
> points from the qb:DataSet instance to the mechanisms where the
> entire dataset can be accessed. A good approach would be to look for
> void:sparqlEndpoint or void:dataDump triples attached to the dataset
> URI. Possibly looking for void:subsets too.

I'm with you on that with the exception that the focus should remain
broader (or at least should not look the other way when it comes to
basic functionality).

Aside: it is another side-benefit of having qb:DataSet and void:Dataset
on the same resource. In fact, I even go another step and consider that
qb:DataSet, void:Dataset, prov:Entity, and sd:Graph can be attached to
the same resource.

>> Under normal circumstances, the consumer should probably get to the
>> bulk of the dataset through discovering qb:DataSet or void:Dataset.
>> When this is not possible for non-trivial datasets, applications
>> end up resorting to SPARQL - which is not necessarily a "bad"
>> thing, but still can be a PITA in comparison to dereferencing.
>
> Bulk downloads cause the least PITA, compared to SPARQL *or*
> dereferencing, IMHO.

Agreed. I was only considering and comparing single call methods
(without additional processing) to get to the data.

> Personally, I’m a believer in good web-friendly machine-readable
> metadata that points to bulk downloads. Hence my involvement with
> VoID, DCAT, and Tarql, and cheerleading for stuff like SPARQL-SD,
> GraphPusher and HDT. Bulk downloads of up to a few Mtriples can be
> downloaded, indexed and queried in near real time on a home broadband
> connection and laptop machine, these days.
>
> Best, Richard
>

GraphPusher! Ha, someone should get back on that ;)

I think the core of the discussion raises the following and it would be
good to have some real-world scenarios documented:

What kind of clients out there would like to be able to navigate from a
dataset to its observations, and for what purpose? (Are there any
surveys on this already or anyone planning on it? Hint hint.)

Not meant to be a serious proposal, but is there any real harm (I don't
mean costs) in having a property from a dataset to its observations
e.g., qb:obs (too bad qb:observation is taken :P ), rdfs:seeAlso, or
maybe already possible with qb:observationGroup? - I'm a bit unclear on
this one.


-Sarven
http://csarven.ca/#i

Received on Saturday, 7 December 2013 15:50:24 UTC