ID48 - Relationship of profile to validation

All,

Given the discussion we had today about ID46 I thought I should give
more background for ID48. I think a lot of the confusion comes from some
significant differences between the realities of GLAM (galleries,
libraries, archives, museums) and more coordinated data arenas.

Someone mentioned today that validation rules may be carried in a
validation language such as SHACL or ShEx. That is definitely one
possibility, and it clearly should be possible to link from a profile to
the validation rules in one of those languages (kind of "XML document to
related schema"). However, use of SHACL or ShEx is not, so far, a
possibility for many players in the GLAM community, and is unlikely to
penetrate that community in the near future.

In the GLAM community some providers of metadata have a very minimal
workflow which will not include a formal validation language. For
example, there are thousands of small libraries and archives that create
a only a small number of metadata records supporting individual
projects. At times this metadata flows into a larger data store like
Europeana or DPLA[1] where it is validated, but these validation
functions are not attached to the local workflow. The current use of
profiles in BIBFRAME[2] and Dublin Core's DSP[3] are examples of
profiles that document a metadata set (for humans and machines) and
include basic rules of usage. These profiles can generate simple input
forms and/or documentation for metadata instance creators. We should
also look at the Wikidata metadata definitions[4, example], as they
appear to have arrived at some similar functionality.

The lack of a formal validation application based on a standard language
can be hindrance also for small ingestors of metadata. If a data
provider links to a SHACL or ShEx document instead of providing
information in the profile, then we have to consider that everyone using
that data must be able to work with the provided validation language.
That may eventually be a reasonable requirement, but as SHACL and ShEx
are both very new we need to think about how that coordinates the DX
profiles.

Another question is where we see the function of the profile in general
in relation to metadata definition and use. In our near past, we have
worked with data and record definitions that include the same types of
rules that we see today in SHACL and ShEx - a listing of valid terms or
properties, cardinality rules, and definitions of valid values. These
are the classic elements of a data dictionary, for example. Having been
on the SHACL working group and having looked at ShEx I'm not fully
convinced that these languages fully replace that documentation. I see a
difference between *defining* and the code that effects actual
validation (and I think this needs more study).(*) I also think that we
should look at the Wikidata metadata definitions[4, example], as they
appear to have arrived at some similar functionality.

While I come to this from the GLAM perspective, I suspect that some of
these concerns also arise in the schema.org camp, which is designed not
only for well-functioning enterprise situations but must also
accommodate the occasional user who has minimal technical support. In
another similarity with schema.org, GLAM data providers and users may
have little direct contact with each other and therefore need a way to
communicate the shape of their data to an unknown user base.

So the question that is brought up in this use case is: how do we
separate profiles from validation, when there is clearly a great deal of
overlap between them? What can we realistically assume for today and for
the near future? Is there a way to accommodate all of the needs I've
listed here? If not, where do we need to compromise?

kc
(*) I can well imagine situations where there is no need to validate a
particular value and therefore the property is not included in the
validation document. Validation may also be relative to certain
application functions, and I haven't seen an example that would show
bringing all such segments together such that it documents a coherent
whole.


[1] Digital Public Library of America http://dp.la
[2] http://www.loc.gov/bibframe/docs/bibframe-profiles.html#grammar
[3] http://dublincore.org/documents/dc-dsp/
[4] https://www.wikidata.org/wiki/Property:P21

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234 (Signal)
skype: kcoylenet/+1-510-984-3600

Received on Monday, 14 August 2017 18:04:02 UTC