Re: Re-write of use case 48 from Rob Atkinson on 2017-08-28 (public-dxwg-wg@w3.org from August 2017)

From: Rob Atkinson <rob@metalinkage.com.au>
Date: Mon, 28 Aug 2017 10:39:52 +0000
To: public-dxwg-wg@w3.org
Message-ID: <CACfF9LzuhPt02RydeCYp-a91Z6XqDP_5+6ACf7pi9zZfmtPRuw@mail.gmail.com>
Thanks Antoine.

Please review the extracted requirements in the UCR doc and make sure these
meet those goals and dont prematurely introduce specific solutions.

Rob Atkinson

On Mon, 28 Aug 2017 at 18:14 Rob Atkinson <robatkinson101@gmail.com> wrote:

> Thanks Antoine.
>
> Please review the extracted requirements in the UCR doc and make sure
> these meet those goals and dont prematurely introduce specifiv solutions.
>
> Rob Atkinson
> UCR editing team.
>
>
> On 28 Aug 2017 17:30, "Antoine Isaac" <aisaac@few.vu.nl> wrote:
>
>> Hi everyone,
>>
>> If I understand correctly what is meant by 'cascading' profiles, then in
>> the Europeana use case we have a similar requirement (that would be
>> especially in 'refining' profiles).
>>
>> What it brings to someone building an AP on top of others:
>> - less work, merely piling new constraints on top of others instead of
>> re-defining everything.
>> - benefit from 'more authoritative' work somewhere else
>> And I agree with Annette, it requires good knowledge of the re-used
>> profile, if just to know that the re-used profile has such status in a
>> given community.
>>
>> Granted, there are drawbacks, especially it may create issues if the
>> re-used profile change - which is why we are very careful in trying to stay
>> backwards-compatible when we update bits that are close to 'core'. But it
>> seems quite desirable within a community that's eager to homogenize its
>> practices.
>>
>> Cheers,
>>
>> Antoine
>>
>> On 26/08/17 02:36, Annette Greiner wrote:
>>
>>> I think this discussion would be enhanced by starting from actual use
>>> cases rather than requirements. I have a hard time seeing how these really
>>> are requirements without one. You mention scaling as an issue. What scaling
>>> problem are you trying to solve?
>>>
>>> My concern here is that interpreting a profile might end up requiring
>>> data sharers to have expertise in each step up the hierarchy. If I tell a
>>> researcher that they need to follow a certain profile, but doing so
>>> requires that they track down and interpret all the parent profiles and
>>> their versions, they are not going to be happy to do that. On the other
>>> hand, we might be able to include inheritance information in such a way
>>> that it is informational rather than definitional. For example, a profile
>>> might include big sections labeled as themselves conformant to another
>>> profile, but those sections still could include all the information they
>>> would include were they not inherited. The one use case that makes sense to
>>> me is that a server may be asked to return metadata matching one of several
>>> profiles derived from each other, and it could programmatically determine
>>> that it has metadata matching a more general profile if it knows it can
>>> match a more specific child profile. But it might be easier to simply list
>>> which profiles a set of metadata conforms to.
>>>
>>> To address the numbered requirements, 1 can be done with a diff; 2 can
>>> be done with the informal inheritance we've discussed; 3 can be done with a
>>> list; 4 is not always what we want; and 5 creates the usability issue
>>> mentioned above.
>>>
>>> -Annette
>>>
>>>
>>> On 8/22/17 9:16 PM, Rob Atkinson wrote:
>>>
>>>>
>>>> I think "cascading profiles" does need more discussion.  In one sense
>>>> its a mechanism for meeting requirements, and lives in the solution space.
>>>> We should not deny requirements because at this stage we havent
>>>> collectively agreed on a solution in advance.
>>>>
>>>> The underlying requirements IMHO are:
>>>>
>>>> 1)  to be able to define what is common and what is different between
>>>> two profiles
>>>> 2) to be able to easily extend an existing profile to add additional
>>>> constraints
>>>> 3) to be able to declare conformance to multiple profiles, including
>>>> those baseline profiles derived from/extended.
>>>> 4) to be able to detect change to the original profile so that the
>>>> intent to extend and remain conformant with it as a baseline can be managed
>>>> 5) to be able to inherit additional validation rules and other
>>>> supporting resources from a profile into derived profiles
>>>>
>>>> Whilst its hard to point at existing distributed systems based on HTTP
>>>> services to evidence these as requirements, we see every modern
>>>> closed-system where components interact ( e.g. programming language)
>>>> supporting them, and we see major scalability issues in catalogs of web
>>>> based data and services where these requirements are not supported. We see
>>>> inheritance emerging in non-machine-readable profiles already, so the idea
>>>> of using machine readability to support tools to help us is actually a way
>>>> to simplify the end-user interaction.  An analysis of such challenges
>>>> around infrastructure for supporting citizen science (i.e. a Use Case)
>>>> undertaken by the OGC [1] [2] has identified that both hierarchy and
>>>> polymorphism (i.e. multiple inheritance) is a direct requirement to realise
>>>> distributed system interoperability given the underlying governance
>>>> constraints on the stakeholders involved.
>>>>
>>>> I suspect that inheritance will prove by far the easiest solution to
>>>> meet these requirements. however, other approaches such as provision of
>>>> infrastructure services to perform comparisons may be tractable, and even
>>>> pushing it to the client to do all the hard work is possible - but only if
>>>> we end up with very, very tight and comprehensive standards for describing
>>>> all the aspects of each resource that a client will need to be able to
>>>> interpret to infer the underlying inheritance patterns from a corpus of
>>>> "flattened" descriptions. Maybe there is some other pattern that can be
>>>> demonstrated to work?
>>>>
>>>>
>>>> [1] https://cobwebproject.eu/about/swe4citizenscience-github
>>>> [2]
>>>> https://www.slideshare.net/CobwebFP7/cobweb-towards-an-optimised-interoperability-framework-for-citizen-science
>>>>
>>>>
>>>> On Wed, 23 Aug 2017 at 13:40 Karen Coyle <kcoyle@kcoyle.net <mailto:
>>>> kcoyle@kcoyle.net>> wrote:
>>>>
>>>>     Thanks, Rob. Note below under 1)
>>>>
>>>>     On 8/22/17 6:44 PM, Rob Atkinson wrote:
>>>>     > Hi Karen
>>>>     >
>>>>     > That's now formed as a Use Case :-)  We might need to think about
>>>> the
>>>>     > specific implementations as examples, rather than solutions.
>>>>     >
>>>>     > a few points to consider:
>>>>     >
>>>>     > 1) it is unlikely that a dataset conforms to "a profile" - rather
>>>> it
>>>>     > conforms to 0-N profiles - and these are sometime "hierarchical"
>>>>     > specialisations - take for example NetCDF convention - CF extends
>>>> COARDS [1]
>>>>
>>>>     This came up at the f2f, and probably needs more discussion. In
>>>> general,
>>>>     my understanding is that the group was not comfortable with
>>>> "cascading
>>>>     profiles" due to considerations like: having a profile fragment
>>>> change
>>>>     could mean it no longer describes your (static) dataset. However,
>>>> it was
>>>>     agreed that there would be "copying" of profiles or profile
>>>> fragments
>>>>     into new profiles that would be based on them.
>>>>
>>>>     Discussion is at
>>>>     https://www.w3.org/2017/07/18-dxwg-minutes#item06
>>>>     use case 37.
>>>>
>>>>     This probably also relates to #4, below. I know that SHACL is
>>>> developing
>>>>     re-usable common patterns, but I haven't looked into how one is
>>>> expected
>>>>     to integrate those. In general, I think that SHACL (and ShEx) work
>>>> fine
>>>>     as atomistic rules, and I am hoping that profiles will provide a
>>>>     coherent view of a usable set of data. The purposes are different.
>>>>
>>>>     (p.s. I referred to SHACL because it came up in discussion, and I
>>>> had
>>>>     the impression that some folks see it as a key aspect of profiles
>>>> due to
>>>>     the validation-like functions. I'm happy to not include it in this
>>>> use
>>>>     case.)
>>>>
>>>>     kc
>>>>
>>>>     > 2) distributions also conform to profiles - both for content and
>>>>     > potentially service behaviours
>>>>     > 3) SHACL is an RDF technology, but I dont see any reason why it
>>>> cannot
>>>>     > be applied to any structure if the mapping to RDF is predictable.
>>>>     > 4) multiple SHACL rules can co-exist and be re-used across
>>>> different
>>>>     > profiles - i would expect a profile to bind a set of appropriate
>>>> SHACL
>>>>     > rules - rather than be a large complex monolithic artefact
>>>>     > 5) there is no reason to restrict validation rules to SHACL -
>>>> some rules
>>>>     > may be better expressed other ways - thus a profile is a
>>>> collection of
>>>>     > rules, and each rule should be associated with explanatory text.
>>>>     > 6) rules should have identity and if two different languages are
>>>> used to
>>>>     > express the same rules this should be easily detectable
>>>>     > 7) rules may span multiple requirements from the profile - may be
>>>>     > inconvenient or inefficient to test related requirements
>>>> separately
>>>>     > 8) The minimum case is a single text document describing the
>>>>     > convention/profile (cf the NetDCF conventions and DCAT-AP
>>>> profiles) -
>>>>     > these still have useful semantics of declaring conformance with
>>>> the
>>>>     > profile via a machine-readable identifier.
>>>>     > 9) SHACL is a potential solution - i dont see a strong
>>>> requirement for
>>>>     > it as a specific choice yet
>>>>     >
>>>>     > Please do a sanity check on this reasoning, if you disagree lets
>>>> discuss
>>>>     > the specific issue, and if I've missed anything lets capture it.
>>>> Then
>>>>     > perhaps review the wording against the constraints, and I will
>>>> undertake
>>>>     > to ensure the first-draft of the requirements are properly
>>>> expressed to
>>>>     > capture the intent.
>>>>     >
>>>>     > [1] ftp://ftp.unidata.ucar.edu/pub/netcdf/Conventions/README
>>>>     >
>>>>     > On Wed, 23 Aug 2017 at 09:24 Karen Coyle <kcoyle@kcoyle.net
>>>> <mailto:kcoyle@kcoyle.net>
>>>>     > <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> wrote:
>>>>     >
>>>>     >     (Just the description portion)
>>>>     >
>>>>     >     Project X has decided to make its datasets available as open
>>>> access,
>>>>     >     downloadable. They do not know who will find the datasets
>>>> useful but
>>>>     >     assume that some potential users are outside of Project X's
>>>> immediate
>>>>     >     community. They need a way to describe their metadata and its
>>>> usage such
>>>>     >     that anyone can work with the datasets, and they hope to do
>>>> this with a
>>>>     >     profile that is machine-readable, human-understandable, and
>>>> that defines
>>>>     >     the criteria for valid data.
>>>>     >
>>>>     >     Some of their datasets are in RDF and Project X could
>>>> potentially
>>>>     >     provide a SHACL document that fulfills the functions above,
>>>> either
>>>>     >     instead of or in addition to a profile. However, they also
>>>> have many
>>>>     >     datasets that are in metadata schemas for which there is no
>>>> standard
>>>>     >     validation language. For those datasets, the profile will
>>>> need to
>>>>     >     suffice.
>>>>     >
>>>>     >     Note that there is also a question about the RDF datasets and
>>>> SHACL. If
>>>>     >     one expects users of the datasets to be fully conversant in
>>>> SHACL and to
>>>>     >     have SHACL tools, then it isn't clear if a profile will
>>>> provide any
>>>>     >     additional information to a SHACL validation document. There
>>>> may,
>>>>     >     however, be users who wish to work with Project X's RDF data
>>>> but who are
>>>>     >     not (yet) using SHACL. There could be both a profile for that
>>>> RDF data
>>>>     >     as well as a SHACL document, but the programmers at Project X
>>>> are wary
>>>>     >     of having two entirely separate definitions of the data,
>>>> since it may be
>>>>     >     difficult to guarantee that they are 100% equivalent.
>>>>     >
>>>>     >
>>>>     >     --
>>>>     >     Karen Coyle
>>>>     > kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> <mailto:
>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> http://kcoyle.net
>>>>     >     m: 1-510-435-8234 (Signal)
>>>>     >     skype: kcoylenet/+1-510-984-3600 <tel:+1%20510-984-3600>
>>>> <tel:+1%20510-984-3600>
>>>>     >
>>>>
>>>>     --
>>>>     Karen Coyle
>>>>     kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>>>>     m: 1-510-435-8234 (Signal)
>>>>     skype: kcoylenet/+1-510-984-3600 <tel:+1%20510-984-3600>
>>>>
>>>>
>>> --
>>> Annette Greiner
>>> NERSC Data and Analytics Services
>>> Lawrence Berkeley National Laboratory
>>>
>>>
>>
Received on Monday, 28 August 2017 10:40:41 UTC