Re: Questions about ShEx from Eric Prud'hommeaux on 2013-11-18 (public-rdf-shapes@w3.org from November 2013)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 18 Nov 2013 13:07:14 -0500
To: "Solbrig, Harold R." <Solbrig.Harold@mayo.edu>
Cc: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>, Sławek Staworko <slawomir.staworko@inria.fr>, Iovka Boneva <iovka.boneva@univ-lille1.fr>, Iovka Boneva <iovka.boneva@inria.fr>, Radu Ciucanu <radu.ciucanu@inria.fr>
Message-ID: <20131118180712.GA23854@w3.org>

* Solbrig, Harold R. <Solbrig.Harold@mayo.edu> [2013-11-18 16:55+0000]
> Hello,
> 
> I’ve been watching the evolution of the ShEx (http://www.w3.org/2013/ShEx/Primer.html) model with considerable interest.   I’m curious, however, how ShEx is envisioned to fit into the larger RDF ‘validation’ (publication of invariants + validation) use cases.  Questions:
> 
> 
>   1.  This model is “… constrained scenario with a starting point in both the graph and the schema.”   — how does one determine such a starting point?  Whether one is validating the contents of a (potentially huge) triple store or a single set of triples, it seems like one first needs to evaluate some set of preconditions to determine what (or whether) the graph should be validated using the given shape expression.

I've been operating under the model that there's a defined behavior for validating a particular node in a graph with respect to a shape expression and that validating a dataset involves:

  1. Select the nodes of interest.
  2. Decide what to do with the errors.

I wanted to drive this with a constrained set of candidate nodes 'cause that was the most restrictive case and it enabled the most specific user feedback (i.e. I expect <foo> to be an <Xshape>; tell me exactly how it does not comply). There are a lot of choices for what you might want when you validate/type check a dataset. Here's my guess at the 12 for a few examples:

  protocol document validation:
    optimistically, 1: the protocol selects one node, 2: accept or reject the protocol interaction based on that result.
    otherwise, 1: select each subject node, 2: accept or reject based on exactly one successful result.

  (Some protocols may expect, tollerate, or reject the existance of triples not covered in the validation.)

  mass validation of existing store:
    1: select a set of nodes based on some e.g. type or membership in a list. 2: validate each according to the optimistic rules above.


>   2.  Some of the use cases presented in the workshop involved two graphs — one containing the information to be added (or removed?) and a second the current state of an RDF dataset.  Evaluation of validity involved the combination of the two — whether, together, they met the requirements.  As an example, an instance issue report well reference an EmployeeShape in an existing triple store, or it may extend an existing UserShape to include the information needed to qualify it as a UserShape.  Is it envisioned that ShEx could be applied to two (or more) graphs concurrently?

I can trivially meeting that requirement by creating a virtual combined store for the validation and electing whether to e.g. incorporate the candidate data based on the results of that validation. Does that seem like it would meet the use cases?


>   3.  Traditional state transforms involve a combination of invariants (things that are always true about a given RDF Dataset), preconditions (things that must be true before an RDF dataset can undergo a transformation from one state to another) and postconditions (things that are true once a given transformation takes place).   Is it envisioned that ShEx could be used for all three aspects?

There's always a balance between simplicity for deployment and expressivity for use cases. I can imagine packaging three shape expressions to meet those use cases. If there are a sufficient number of them, we could add that to the spec, or write it into another spec which could be used separately (packaging issue). I would like to make progress on decisions like this by seeing use cases and getting some sense of the energy behind any particular extension of expressivity and the ROI on incorporating it.


> Harold Solbrig

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Monday, 18 November 2013 18:07:49 UTC