Re: ID48 - Relationship of profile to validation

Hi Karen,

I would be interested in learning more about informal approaches to documenting requirements for metadata and whether this could pave the way for simplifying the generation of validation rules.  One approach I’ve looked at is graphical validation languages inspired by work from the 70’s on augmented transition networks.  Other ideas relate to how to deal with vagueness, uncertainty, inconsistency, etc., see:

       https://en.wikipedia.org/wiki/Semantic_Web#Challenges <https://en.wikipedia.org/wiki/Semantic_Web#Challenges>

A lot of this depends upon the context ...

> On 14 Aug 2017, at 19:02, Karen Coyle <kcoyle@kcoyle.net> wrote:
> 
> All,
> 
> Given the discussion we had today about ID46 I thought I should give
> more background for ID48. I think a lot of the confusion comes from some
> significant differences between the realities of GLAM (galleries,
> libraries, archives, museums) and more coordinated data arenas.
> 
> Someone mentioned today that validation rules may be carried in a
> validation language such as SHACL or ShEx. That is definitely one
> possibility, and it clearly should be possible to link from a profile to
> the validation rules in one of those languages (kind of "XML document to
> related schema"). However, use of SHACL or ShEx is not, so far, a
> possibility for many players in the GLAM community, and is unlikely to
> penetrate that community in the near future.
> 
> In the GLAM community some providers of metadata have a very minimal
> workflow which will not include a formal validation language. For
> example, there are thousands of small libraries and archives that create
> a only a small number of metadata records supporting individual
> projects. At times this metadata flows into a larger data store like
> Europeana or DPLA[1] where it is validated, but these validation
> functions are not attached to the local workflow. The current use of
> profiles in BIBFRAME[2] and Dublin Core's DSP[3] are examples of
> profiles that document a metadata set (for humans and machines) and
> include basic rules of usage. These profiles can generate simple input
> forms and/or documentation for metadata instance creators. We should
> also look at the Wikidata metadata definitions[4, example], as they
> appear to have arrived at some similar functionality.
> 
> The lack of a formal validation application based on a standard language
> can be hindrance also for small ingestors of metadata. If a data
> provider links to a SHACL or ShEx document instead of providing
> information in the profile, then we have to consider that everyone using
> that data must be able to work with the provided validation language.
> That may eventually be a reasonable requirement, but as SHACL and ShEx
> are both very new we need to think about how that coordinates the DX
> profiles.
> 
> Another question is where we see the function of the profile in general
> in relation to metadata definition and use. In our near past, we have
> worked with data and record definitions that include the same types of
> rules that we see today in SHACL and ShEx - a listing of valid terms or
> properties, cardinality rules, and definitions of valid values. These
> are the classic elements of a data dictionary, for example. Having been
> on the SHACL working group and having looked at ShEx I'm not fully
> convinced that these languages fully replace that documentation. I see a
> difference between *defining* and the code that effects actual
> validation (and I think this needs more study).(*) I also think that we
> should look at the Wikidata metadata definitions[4, example], as they
> appear to have arrived at some similar functionality.
> 
> While I come to this from the GLAM perspective, I suspect that some of
> these concerns also arise in the schema.org camp, which is designed not
> only for well-functioning enterprise situations but must also
> accommodate the occasional user who has minimal technical support. In
> another similarity with schema.org, GLAM data providers and users may
> have little direct contact with each other and therefore need a way to
> communicate the shape of their data to an unknown user base.
> 
> So the question that is brought up in this use case is: how do we
> separate profiles from validation, when there is clearly a great deal of
> overlap between them? What can we realistically assume for today and for
> the near future? Is there a way to accommodate all of the needs I've
> listed here? If not, where do we need to compromise?
> 
> kc
> (*) I can well imagine situations where there is no need to validate a
> particular value and therefore the property is not included in the
> validation document. Validation may also be relative to certain
> application functions, and I haven't seen an example that would show
> bringing all such segments together such that it documents a coherent
> whole.
> 
> 
> [1] Digital Public Library of America http://dp.la
> [2] http://www.loc.gov/bibframe/docs/bibframe-profiles.html#grammar
> [3] http://dublincore.org/documents/dc-dsp/
> [4] https://www.wikidata.org/wiki/Property:P21
> 
> -- 
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234 (Signal)
> skype: kcoylenet/+1-510-984-3600
> 

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C champion for the Web of things & W3C Data Activity Lead

Received on Tuesday, 15 August 2017 08:50:46 UTC