Re: ID48 - Relationship of profile to validation

Profiles are IMHO a general term, and contain any constraints (including
extension options) needed for any specifification - whether its for  DCAT
entitites, datasets, distributions, service APIs, schemas, methods etc.

All a profile needs formally is an identifier - with the semantics of
comparability.

There may be many different validation resources for a profile - each with
its own expressive scope. SHACL can tell us more than schema validation,
but its still one of many options.

In a W3C context i can envisage that a "W3C profile" of the abstract
Profile concept would require at least a SHACL expression of  _those parts
of formal profiles of W3C specifications that can be expressed in SHACL_.
Transforming SHACL formalism into other descriptions of validation
mechanisms may be lossy, and other validation methods may require
statistical measures of quality or manual inspection, and be described in
text.  I suspect SHACL will provide a fair degree of utility in practice
however.

Note that a definition of Profile is a Requirement now.  Lets get it done
early and refine it if we come across cases where it doesnt work well.

Rob



On Tue, 15 Aug 2017 at 04:02 Karen Coyle <kcoyle@kcoyle.net> wrote:

> All,
>
> Given the discussion we had today about ID46 I thought I should give
> more background for ID48. I think a lot of the confusion comes from some
> significant differences between the realities of GLAM (galleries,
> libraries, archives, museums) and more coordinated data arenas.
>
> Someone mentioned today that validation rules may be carried in a
> validation language such as SHACL or ShEx. That is definitely one
> possibility, and it clearly should be possible to link from a profile to
> the validation rules in one of those languages (kind of "XML document to
> related schema"). However, use of SHACL or ShEx is not, so far, a
> possibility for many players in the GLAM community, and is unlikely to
> penetrate that community in the near future.
>
> In the GLAM community some providers of metadata have a very minimal
> workflow which will not include a formal validation language. For
> example, there are thousands of small libraries and archives that create
> a only a small number of metadata records supporting individual
> projects. At times this metadata flows into a larger data store like
> Europeana or DPLA[1] where it is validated, but these validation
> functions are not attached to the local workflow. The current use of
> profiles in BIBFRAME[2] and Dublin Core's DSP[3] are examples of
> profiles that document a metadata set (for humans and machines) and
> include basic rules of usage. These profiles can generate simple input
> forms and/or documentation for metadata instance creators. We should
> also look at the Wikidata metadata definitions[4, example], as they
> appear to have arrived at some similar functionality.
>
> The lack of a formal validation application based on a standard language
> can be hindrance also for small ingestors of metadata. If a data
> provider links to a SHACL or ShEx document instead of providing
> information in the profile, then we have to consider that everyone using
> that data must be able to work with the provided validation language.
> That may eventually be a reasonable requirement, but as SHACL and ShEx
> are both very new we need to think about how that coordinates the DX
> profiles.
>
> Another question is where we see the function of the profile in general
> in relation to metadata definition and use. In our near past, we have
> worked with data and record definitions that include the same types of
> rules that we see today in SHACL and ShEx - a listing of valid terms or
> properties, cardinality rules, and definitions of valid values. These
> are the classic elements of a data dictionary, for example. Having been
> on the SHACL working group and having looked at ShEx I'm not fully
> convinced that these languages fully replace that documentation. I see a
> difference between *defining* and the code that effects actual
> validation (and I think this needs more study).(*) I also think that we
> should look at the Wikidata metadata definitions[4, example], as they
> appear to have arrived at some similar functionality.
>
> While I come to this from the GLAM perspective, I suspect that some of
> these concerns also arise in the schema.org camp, which is designed not
> only for well-functioning enterprise situations but must also
> accommodate the occasional user who has minimal technical support. In
> another similarity with schema.org, GLAM data providers and users may
> have little direct contact with each other and therefore need a way to
> communicate the shape of their data to an unknown user base.
>
> So the question that is brought up in this use case is: how do we
> separate profiles from validation, when there is clearly a great deal of
> overlap between them? What can we realistically assume for today and for
> the near future? Is there a way to accommodate all of the needs I've
> listed here? If not, where do we need to compromise?
>
> kc
> (*) I can well imagine situations where there is no need to validate a
> particular value and therefore the property is not included in the
> validation document. Validation may also be relative to certain
> application functions, and I haven't seen an example that would show
> bringing all such segments together such that it documents a coherent
> whole.
>
>
> [1] Digital Public Library of America http://dp.la
> [2] http://www.loc.gov/bibframe/docs/bibframe-profiles.html#grammar
> [3] http://dublincore.org/documents/dc-dsp/
> [4] https://www.wikidata.org/wiki/Property:P21
>
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234 (Signal)
> skype: kcoylenet/+1-510-984-3600 <+1%20510-984-3600>
>
>

Received on Monday, 14 August 2017 21:36:12 UTC