Re: Re-write of use case 48 from Annette Greiner on 2017-08-26 (public-dxwg-wg@w3.org from August 2017)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Fri, 25 Aug 2017 17:36:24 -0700
To: public-dxwg-wg@w3.org
Message-ID: <c801f41c-b928-15c7-fdec-5634711e0ca1@lbl.gov>
I think this discussion would be enhanced by starting from actual use 
cases rather than requirements. I have a hard time seeing how these 
really are requirements without one. You mention scaling as an issue. 
What scaling problem are you trying to solve?

My concern here is that interpreting a profile might end up requiring 
data sharers to have expertise in each step up the hierarchy. If I tell 
a researcher that they need to follow a certain profile, but doing so 
requires that they track down and interpret all the parent profiles and 
their versions, they are not going to be happy to do that. On the other 
hand, we might be able to include inheritance information in such a way 
that it is informational rather than definitional. For example, a 
profile might include big sections labeled as themselves conformant to 
another profile, but those sections still could include all the 
information they would include were they not inherited. The one use case 
that makes sense to me is that a server may be asked to return metadata 
matching one of several profiles derived from each other, and it could 
programmatically determine that it has metadata matching a more general 
profile if it knows it can match a more specific child profile. But it 
might be easier to simply list which profiles a set of metadata conforms to.

To address the numbered requirements, 1 can be done with a diff; 2 can 
be done with the informal inheritance we've discussed; 3 can be done 
with a list; 4 is not always what we want; and 5 creates the usability 
issue mentioned above.

-Annette


On 8/22/17 9:16 PM, Rob Atkinson wrote:
>
> I think "cascading profiles" does need more discussion.  In one sense 
> its a mechanism for meeting requirements, and lives in the solution 
> space. We should not deny requirements because at this stage we havent 
> collectively agreed on a solution in advance.
>
> The underlying requirements IMHO are:
>
> 1)  to be able to define what is common and what is different between 
> two profiles
> 2) to be able to easily extend an existing profile to add additional 
> constraints
> 3) to be able to declare conformance to multiple profiles, including 
> those baseline profiles derived from/extended.
> 4) to be able to detect change to the original profile so that the 
> intent to extend and remain conformant with it as a baseline can be 
> managed
> 5) to be able to inherit additional validation rules and other 
> supporting resources from a profile into derived profiles
>
> Whilst its hard to point at existing distributed systems based on HTTP 
> services to evidence these as requirements, we see every modern 
> closed-system where components interact ( e.g. programming language) 
> supporting them, and we see major scalability issues in catalogs of 
> web  based data and services where these requirements are not 
> supported. We see inheritance emerging in non-machine-readable 
> profiles already, so the idea of using machine readability to support 
> tools to help us is actually a way to simplify the end-user 
> interaction.  An analysis of such challenges around infrastructure for 
> supporting citizen science (i.e. a Use Case) undertaken by the OGC [1] 
> [2] has identified that both hierarchy and polymorphism (i.e. multiple 
> inheritance) is a direct requirement to realise distributed system 
> interoperability given the underlying governance constraints on the 
> stakeholders involved.
>
> I suspect that inheritance will prove by far the easiest solution to 
> meet these requirements. however, other approaches such as provision 
> of infrastructure services to perform comparisons may be tractable, 
> and even pushing it to the client to do all the hard work is possible 
> - but only if we end up with very, very tight and comprehensive 
> standards for describing all the aspects of each resource that a 
> client will need to be able to interpret to infer the underlying 
> inheritance patterns from a corpus of "flattened" descriptions. Maybe 
> there is some other pattern that can be demonstrated to work?
>
>
> [1] https://cobwebproject.eu/about/swe4citizenscience-github
> [2] 
> https://www.slideshare.net/CobwebFP7/cobweb-towards-an-optimised-interoperability-framework-for-citizen-science
>
>
> On Wed, 23 Aug 2017 at 13:40 Karen Coyle <kcoyle@kcoyle.net 
> <mailto:kcoyle@kcoyle.net>> wrote:
>
>     Thanks, Rob. Note below under 1)
>
>     On 8/22/17 6:44 PM, Rob Atkinson wrote:
>     > Hi Karen
>     >
>     > That's now formed as a Use Case :-)  We might need to think
>     about the
>     > specific implementations as examples, rather than solutions.
>     >
>     > a few points to consider:
>     >
>     > 1) it is unlikely that a dataset conforms to "a profile" - rather it
>     > conforms to 0-N profiles - and these are sometime "hierarchical"
>     > specialisations - take for example NetCDF convention - CF
>     extends COARDS [1]
>
>     This came up at the f2f, and probably needs more discussion. In
>     general,
>     my understanding is that the group was not comfortable with "cascading
>     profiles" due to considerations like: having a profile fragment change
>     could mean it no longer describes your (static) dataset. However,
>     it was
>     agreed that there would be "copying" of profiles or profile fragments
>     into new profiles that would be based on them.
>
>     Discussion is at
>     https://www.w3.org/2017/07/18-dxwg-minutes#item06
>     use case 37.
>
>     This probably also relates to #4, below. I know that SHACL is
>     developing
>     re-usable common patterns, but I haven't looked into how one is
>     expected
>     to integrate those. In general, I think that SHACL (and ShEx) work
>     fine
>     as atomistic rules, and I am hoping that profiles will provide a
>     coherent view of a usable set of data. The purposes are different.
>
>     (p.s. I referred to SHACL because it came up in discussion, and I had
>     the impression that some folks see it as a key aspect of profiles
>     due to
>     the validation-like functions. I'm happy to not include it in this use
>     case.)
>
>     kc
>
>     > 2) distributions also conform to profiles - both for content and
>     > potentially service behaviours
>     > 3) SHACL is an RDF technology, but I dont see any reason why it
>     cannot
>     > be applied to any structure if the mapping to RDF is predictable.
>     > 4) multiple SHACL rules can co-exist and be re-used across different
>     > profiles - i would expect a profile to bind a set of appropriate
>     SHACL
>     > rules - rather than be a large complex monolithic artefact
>     > 5) there is no reason to restrict validation rules to SHACL -
>     some rules
>     > may be better expressed other ways - thus a profile is a
>     collection of
>     > rules, and each rule should be associated with explanatory text.
>     > 6) rules should have identity and if two different languages are
>     used to
>     > express the same rules this should be easily detectable
>     > 7) rules may span multiple requirements from the profile - may be
>     > inconvenient or inefficient to test related requirements separately
>     > 8) The minimum case is a single text document describing the
>     > convention/profile (cf the NetDCF conventions and DCAT-AP
>     profiles) -
>     > these still have useful semantics of declaring conformance with the
>     > profile via a machine-readable identifier.
>     > 9) SHACL is a potential solution - i dont see a strong
>     requirement for
>     > it as a specific choice yet
>     >
>     > Please do a sanity check on this reasoning, if you disagree lets
>     discuss
>     > the specific issue, and if I've missed anything lets capture it.
>     Then
>     > perhaps review the wording against the constraints, and I will
>     undertake
>     > to ensure the first-draft of the requirements are properly
>     expressed to
>     > capture the intent.
>     >
>     > [1] ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions/README
>     >
>     > On Wed, 23 Aug 2017 at 09:24 Karen Coyle <kcoyle@kcoyle.net
>     <mailto:kcoyle@kcoyle.net>
>     > <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> wrote:
>     >
>     >     (Just the description portion)
>     >
>     >     Project X has decided to make its datasets available as open
>     access,
>     >     downloadable. They do not know who will find the datasets
>     useful but
>     >     assume that some potential users are outside of Project X's
>     immediate
>     >     community. They need a way to describe their metadata and
>     its usage such
>     >     that anyone can work with the datasets, and they hope to do
>     this with a
>     >     profile that is machine-readable, human-understandable, and
>     that defines
>     >     the criteria for valid data.
>     >
>     >     Some of their datasets are in RDF and Project X could
>     potentially
>     >     provide a SHACL document that fulfills the functions above,
>     either
>     >     instead of or in addition to a profile. However, they also
>     have many
>     >     datasets that are in metadata schemas for which there is no
>     standard
>     >     validation language. For those datasets, the profile will
>     need to
>     >     suffice.
>     >
>     >     Note that there is also a question about the RDF datasets
>     and SHACL. If
>     >     one expects users of the datasets to be fully conversant in
>     SHACL and to
>     >     have SHACL tools, then it isn't clear if a profile will
>     provide any
>     >     additional information to a SHACL validation document. There
>     may,
>     >     however, be users who wish to work with Project X's RDF data
>     but who are
>     >     not (yet) using SHACL. There could be both a profile for
>     that RDF data
>     >     as well as a SHACL document, but the programmers at Project
>     X are wary
>     >     of having two entirely separate definitions of the data,
>     since it may be
>     >     difficult to guarantee that they are 100% equivalent.
>     >
>     >
>     >     --
>     >     Karen Coyle
>     > kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>
>     <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
>     http://kcoyle.net
>     >     m: 1-510-435-8234 (Signal)
>     >     skype: kcoylenet/+1-510-984-3600 <tel:+1%20510-984-3600>
>     <tel:+1%20510-984-3600>
>     >
>
>     --
>     Karen Coyle
>     kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>     m: 1-510-435-8234 (Signal)
>     skype: kcoylenet/+1-510-984-3600 <tel:+1%20510-984-3600>
>

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Saturday, 26 August 2017 00:36:55 UTC