Re: preliminary analysis of shape expressions proposal from Iovka Boneva on 2015-05-07 (public-data-shapes-wg@w3.org from May 2015)

From: Iovka Boneva <iovka.boneva@univ-lille1.fr>
Date: Thu, 07 May 2015 15:02:32 +0200
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <554B6268.4050701@univ-lille1.fr>
Le 06/05/2015 07:34, Peter F. Patel-Schneider a écrit :
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>> Dear Peter, all,
>>
>> Thank you for your comments, here are my answers.
>>
>> Le 01/05/2015 15:13, Peter F. Patel-Schneider a écrit :
>>> I spent some time yesterday and today looking over the Core SHACL
>>> Semantics documenthttp://w3c.github.io/data-shapes/semantics/
>>>
>>> This is a new formal basis for a shapes/constraint language.
>>>
>>> The language (in Section 2) is quite limited.  It includes closed
>>> shape expressions and open shape expressions.  It has disjunction and a
>>> construct called grouping.  It has cardinality ranges.  It has values
>>> and node kinds and datatyping and facets.  It has inverse properties
>>> (which are not handled in the semantics).  That's it.
>> Indeed, that's it! Because these few constructs allow to express complex
>> constraints on the neighbourhood of a node, we didn't consider it
>> necessary to add anything else for now. By the way, the unique difficulty
>> in integrating inverse constraints is that it makes the definitions more
>> verbose, handling two distinct cases each time. We added explanations how
>> it works, to be found in the actual version of the document.
> A few more things have been added since my initial analysis, including
> inverse triple constraints and a partial treatment of the association of
> data with shapes, but there is still no negation, no conjunction, no
> keys, no transitive traversal, and no extensibility.  That's quite a bit
> of stuff that is not handled.

We did define how extensions are to be handled in the semantics, see end 
of Sect. 2 for the syntax and item 7. in Definition [Typing, Valid typing].
As I said previously, conjunction can be added if required. I however 
think that we can meet most of the use cases w/o conjunction, thanks to 
the grouping operator.
Some sort of negation can be modelled with maximum cardinality of 0. For 
now it is not clear whether more involved negation would be useful 
(negation of types).


>>> In particular there is no conjunction of shape expressions, and adding
>>> this conjunction would break a property of the semantics.
>> Adding conjunction would not break anything in the semantics.  We
>> already have conjunction of types in triple constraints. It is actually
>> very easy to add general conjunction as construct of shape expressions.
>> If a user case justifies the need of conjunction (i.e. cannot be handled
>> with what is already in the language), then conjunction is a an evolution
>> to be considered.
> Adding a normal conjunction would break the property that each triple in
> Neigh is introduced exactly once in a proof tree.

Indeed. This however is not an issue: if a triple is used several times 
in the proof tree (against several triple constraints), then all the 
shapes of all these triple constraints will be required for the object 
node of the triple.


> [Section eliminated. See previous message for my response.]
>
>>> There are some minor problems with the proposal.
>>>
>>> The language (Section 2) is very ambiguous.  The productions for
>>> disjunction and grouping need to be modified to require them to be
>>> disjunctions or groupings, respectively.
>> I do not understand this remark. Does this relate to the abstract syntax
>> ?
> Yes it does.  In the abstract syntax a simple shape expression can be
> just a shape expression, or it can be a disjunctive shape, or it can be
> a group shape, or a disjunctive shape made up of a group shape, or lots
> of other possibilities.   There are probably no ill-effects of this
> ambiguity.
>
>>> The abstract syntax used in Section 5 is different from that in Section
>>> 2.
>>>
>>>
>>> There are some broken parts of the proposal.
>>>
>>> There is a claim that disjunctive shape expressions are similar to
>>> logical disjunction.  However disjunctive shape expressions here end up
>>> being quite different from logical disjunction.  See below for more on
>>> this.
>>>
>>>
>>> There are some serious problems with the proposal
>>>
>>> There is no connection between how shapes are associated with data
>>> (Section 4) and the main part of the semantics (Section 5).
>> Fixed: 2 lines in the end of the document.
> That covers only part of Section 4, which itself only covers two simple
> ways to associate data with shapes.
>
>>> The semantics in Section 5 does not cover the full language from
>>> Section 2. It is missing inverse triple constraints.  Adding inverse
>>> triple constraints to the semantics appears to be problematic.
>>>
>>> The semantics depends on witness mappings.  The definition of witness
>>> mappings talks about triples appearing in the conclusion of
>>> applications of the rule for the empty shape expression.  However,
>>> applications of this rule do not include any triples at all.  I think
>>> that there is a simple fix---using the rule for triple constraints
>>> instead---but that remains to be verified.
>> I do not think there is a problem with the empty shape. Actually, I am
>> not sure I understand your remark. In any case, a non-empty set of
>> triples does not satisfy the empty rule.
> You are misunderstanding the error that I pointed out.  The document
> states:
>    Given such proof tree, it can be shown that every triple (n, p, u) in
>    Neigh appears in the conclusion of exactly one application of
>    rule-empty.
> However, rule-empty does not involve triples at all so its applications
> will not have any triples in their conclusion.

You are right ! This is a typo, s/rule-empty/rule-triple-constraint/
Fixed.


>>> Disjunctive shape expressions
>>>
>>> Disjunctive shape expressions work differently from what one might
>>> expect.
>> Unfortunately, users' expectations are subjective, and differ depending
>> on the user's experience. Our proposal is closer to the intuition of
>> users of structural schemas (like XMLSchema, DTD, ...). By giving precise
>> semantics of a language, one allows to users to agree on a shared
>> meaning, rather than relying on their personal expectations. Let's focus
>> more on the capability of the language to meet the use cases, and less on
>> the (necessary subjective) expectations.
> Well I would say that disjunction is more closely associated with
> standard propositional, predicate, and modal logic, and disjunctive
> shapes here work differently than disjunction in these logics.  I agree
> that there needs to be a precise semantics to support language features,
> but I also believe that the semantics should generally correspond with
> the expectations of users.   All I was doing was pointing out that there
> is a potential mismatch between user expectations and the semantics of
> disjunction here.

The vocabulary that we use can clearly be improved, so that it better 
matches user expectations.

>> Follows a more technical explanation trying to give a better intuition of
>> our disjunction. Stating that disjunction is similar to logical
>> disjunction probably gives a wrong intuition. It would have been more
>> correct to say that disjunction is similar to disjunction in regular
>> languages, or to choice in XMLSchema which is a restriction of the
>> former. The difference with the choice in XMLSchema is that, for defining
>> the semantics, we do not require something like the Unique Particle
>> Attribution (UPA) of XMLSchema (which makes that our disjunction is not
>> an exclusive choice, but real disjunction like in regular languages).
>> Restrictions like UPA are useful for efficient validation, but not for
>> defining a proper semantics.
> I'm not in favour of calling alternation in regular expressions
> disjunction, partly because it can lead to this mismatch of
> expectations.
>
>
> What I find most disturbing about | here is how it interacts with open
> shapes.  I find the results very unintuitive.  I find the interaction
> between open shapes and triple matching similarly unintuitive.
I see. This will be improved by giving more explanations about the 
design choices around open shapes.


>>> For example, there is no valid typing for the shape expression schema S
>>> open { p , q } p E [1;1] | q E [1;1] E open { } empty on the graph a p
>>> b . a q c . that includes S in the shapes of a.
>>>
>>>
>>> Open shape expressions
>>>
>>> Open shape expressions work differently from what one might expect.
>>>
>>> For example, there is a valid typing for the shape expression schema S
>>> open { p } p { b } [1;1] on the graph a p b . a p c . that includes S
>>> in the shapes for a.  However, there is no such valid typing for either
>>> the shape expression schema S open { p } p { b , c } [1;1] or the shape
>>> expression schema S open { p } p E [1;1] E open { } empty


-- 
Iovka Boneva
Associate professor (MdC) Université de Lille
http://www.cristal.univ-lille.fr/~boneva/
+33 6 95 75 70 25
Received on Thursday, 7 May 2015 13:03:20 UTC