RE: WG discussion: proposal to remove BP 13 - Provide subsets for large spatial datasets

+1 - as Linda says.

Andrea

----
Andrea Perego, Ph.D.
Scientific / Technical Project Officer
European Commission DG JRC
Directorate B - Growth and Innovation
Unit B6 - Digital Economy
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/

----
The views expressed are purely those of the writer and may
not in any circumstances be regarded as stating an official
position of the European Commission.

________________________________
From: Linda van den Brink [l.vandenbrink@geonovum.nl]
Sent: 03 March 2017 11:20
To: Jeremy Tandy; Clemens Portele
Cc: SDW WG Public List
Subject: RE: WG discussion: proposal to remove BP 13 - Provide subsets for large spatial datasets

+1 from me for removing BP13 + the suggested changes in this thread sofar.

Van: Jeremy Tandy [mailto:jeremy.tandy@gmail.com]
Verzonden: donderdag 2 maart 2017 17:10
Aan: Clemens Portele
CC: SDW WG Public List
Onderwerp: Re: WG discussion: proposal to remove BP 13 - Provide subsets for large spatial datasets

WRT relationship between subset and source; it would be good to include the text from your email in the BP. Somewhere.

Thanks :)
On Thu, 2 Mar 2017 at 13:12, Clemens Portele <portele@interactive-instruments.de<mailto:portele@interactive-instruments.de>> wrote:
Hi Jeremy,

your suggestions make sense to me, I agree with them. If we decide to remove BP13 I could create a PR for the changes.

Regarding the relationship between the subset and the source, I agree that it would be good practice to be clear about the relationship. In HTML this could be descriptive text or it is implicitly clear for humans, in schema.org<http://schema.org> it could be http://schema.org/isPartOf (this is what we use in ldproxy in subsets), in RDF there is PROV-O, in 19115 there is LI_Lineage, etc. Arguably this should be part of DW BP 18, but since it is missing there we should include it, too. Paging is a very special case of a subset and here the previous/next links etc will provide the context, I think.

Thanks,
Clemens

On 2 Mar 2017, at 13:33, Jeremy Tandy <jeremy.tandy@gmail.com<mailto:jeremy.tandy@gmail.com>> wrote:

Clemens

+1 from me.

I would suggest the following changes to accommodate the removal of BP13 ...

Up in the §12.6 intro material, where you refer to DWBP's BP18, add a comment about why subsetting spatial data is often necessary. BP13 "why" already says:

```
Spatial datasets, particularly coverages<http://w3c.github.io/sdw/bp/#dfn-coverage> such as satellite imagery, sensor measurement time-series and climate prediction data, are often very large. In these cases it is useful to provide subsets by having identifiers for conveniently sized subsets of large datasets that Web applications can work with.
```

Effectively, breaking up a large coverage into pre-defined lumps that you can access via HTTP Get requests is a _very simple_ API!

In the examples for SDW BP13 we refer to DataCube slices. This is already covered in DWBP so we can ditch that. Another of the [suggested] examples is "Mapping a URI template (as specified in [RFC6570<http://w3c.github.io/sdw/bp/#bib-RFC6570>]) to a WCS<http://w3c.github.io/sdw/bp/#dfn-web-coverage-service-wcs> or OPeNDAP<http://www.opendap.org/> service end-point". Reflecting on this, I wonder if this approach should be listed as a mechanism that can help to "Reuse your existing spatial data infrastructure" - as stated in BP11? You already mention "wrapper, proxy or a shim layer", but the mentioning the URI template would be useful. Alternatively, Example 22 (talking about the Environment Agency Bathing Water Quality API and the Linked Data API) might be a good point too; as the Linked Data API configuration uses URI templates to provide RESTful access to SPARQL queries thereby taking away from the user the challenge of writing generalised SPARQL queries and understanding the underpinning data model. In fact, I think it would be worth fleshing out this example anyway.

(for reference, documentation on Epimorhic's implementation "ELDA" can be found here: http://epimorphics.github.io/elda/current/index.html)


Finally, I wonder whether we have a gap. Currently BP13 talks about using "PROV-O<https://www.w3.org/TR/prov-o/> to describe the relationship between the subset, the original large dataset and the mechanism used to derive the subset". I'm not so worried about PROV-O, but I think that it would be worth asserting that it is useful to relate the sub-set to the complete resource from whence it came. Re-reading your edits to BP11, I think that we may have this covered where you talk about "paging" responses (using LDP or Hydra pagination).

Hope that helps.

Jeremy

On Wed, 1 Mar 2017 at 17:41 Clemens Portele <portele@interactive-instruments.de<mailto:portele@interactive-instruments.de>> wrote:
Hi all,

in the BP call today [1] we discussed, if BP 13 [2] could or should be removed.

The rationale would be:
* DWBP now has BP 18 ("Provide Subsets for Large Datasets") [3] which has almost the same name and already covers most of the aspects. It also mentions the RDF Data Cube Vocabulary.
* DW BP 18 is referenced and discussed in the introduction of section 12.6 and in BP 11 [4].
* Currently it feels as if there is not enough content left to keep a separate BP providing actionable guidance (beyond what is already in DW BP 18 and SDW BP 11 on that topic).
* If content from BP13 should be kept, it could be integrated into BP 11.

Any thoughts?

Clemens

[1] https://www.w3.org/2017/03/01-sdwbp-minutes.html
[2] http://w3c.github.io/sdw/bp/#ids-for-chunks
[3] https://www.w3.org/TR/dwbp/#ProvideSubsets
[4] http://w3c.github.io/sdw/bp/#bp-exposing-via-api

Received on Friday, 3 March 2017 10:38:38 UTC