differing accounts of Shape Expressions from Peter F. Patel-Schneider on 2015-02-25 (public-data-shapes-wg@w3.org from February 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 25 Feb 2015 10:04:36 -0800
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <54EE0EB4.9040508@gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There are three quite different accounts of Shape Expressions available off
of http://www.w3.org/2001/sw/wiki/ShEx


There is a primer and denotational semantics which appear to be early
versions of the W3C submission http://www.w3.org/Submission/2014/03/, with
the Primer by Eric Prud'hommeaux at www.w3.org/Submission/shex-primer/ and
the definition by Harold Solbrig and Eric Prud'hommeaux at
www.w3.org/Submission/shex-defn/.  There is also
https://github.com/w3c/ShEx/blob/master/ShExZ/ShExZ.pdf?raw=true which is
close to the definition in the submission.  Let's call this SE1.


There is "Validating RDF with Shape Expressions" by Iovka Boneva, Jose
Emilio Labra Gayo, Samuel Hym, Eric G. Prud'hommeau, Harold Solbrig, Sławek
Staworko at arxiv.org/abs/1404.1270.  A paper with the same treatment is
"Complexity and Expressiveness of ShEx for RDF" by Sławek Staworko , Iovka
Boneva, Jose E. Labra Gayo, Samuel Hym, Eric G. Prud’hommeaux, and Harold
Solbrig at www.grappa.univ-lille3.fr/~staworko/papers/staworko-icdt15a.pdf
Let's call these SE2.

There is the ShEx/Operational Semantics at
http://www.w3.org/2001/sw/wiki/ShEx/OperationalSemantics_Operational_semantics
This appears similar in spirit to an axiomatic semantics for SHACL prepared
by Jose Emilio Labra Gayo.  Let's call this SE3.


SE1:

The syntax of SE1 uses rules like:
<IssueShape> {
    ex:state (ex:unassigned ex:assigned),
    ex:reportedBy @<UserShape>,
    ex:reportedOn xsd:dateTime,
    ( ex:reproducedBy @<EmployeeShape>,
      ex:reproducedOn xsd:dateTime      )?,
    ex:related @<IssueShape>*
}

SE1 works on RDF graphs.  SE1 has conjunction, counting, cyclicity,
exclusive disjunction, and a number of atomic constructs that involve node
labels.

The semantics of SE1 is given in an extension of Z.  The semantics is based
on judgements of how well a node in an RDF graph matches a shape in a set of
SE1 rules.  The semantics of SE1 is inherently an open semantics, i.e.,
adding irrelevant edges to a graph does not affect results.  A closed
semantics for SE1 woulld require a complete rewrite, as there is no
determination of relevancy in the semantics.  The semantics of SE1 requires
negative judgements, to handle the exclusive disjunction.

SE1 is problematic because it uses an ill-founded extension to Z.  The
meaning of cyclic references in rules cannot be determined in many
situations, even if the cycles are all positive.


SE2:

SE2 works on labelled-edge graphs, not RDF graphs.  The language of SE2 uses
rules like:
   t0 -> a::t1 || (b::t2)[2;4]
SE1 has disjunction, unordered concatenation, and cyclicity.  The language
of SE2 is missing many of the features of SE1.

The semantics of SE2 is in part an algebraic semantics, building up bag
languages.  The other part of the semantics of SE2 is based on assignments.
The semantics of SE2 is a closed semantics, i.e., all outgoing edges from a
node have to be matched.  A syntactic transform can be used to open up the
semantics, adding in universal types to soak up extra edges.

SE2 is problematic because its language is missing many of the features in
SE1.


SE3:

The semantics in SE3 is an axiomatic semantics.  It works on a version of
RDF graphs.  It appears to be non-additive, in that the matched portions of a
graph are unioned together, but the account is missing many details.
Apparently adding the missing details results in a semantics that is
non-additive, i.e., conjoining something with itself requires two copies of
whatever is matched by the conjunct.  The semantics also appears to be
closed, i.e., every edge in the graph has to be used.

SE3 is problematic because there are many missing portions of the
axiomatization.



Differences between SE1 and SE2:

SE1 and SE2 are very different, even aside from the differences in surface
syntax.

SE1 works on RDF graphs; SE2 works on edge-labelled graphs, which don't have
node labels.

SE1 has conjunction; in
  ex:a ex:p ex:v .
with
  <S0> { ex:p (ex:v) , ex:p (ex:v) }
ex:a has type <S0>
SE2 has unordered concatenation, which is not conjunction; in
  { < a p b > }
with
   t0 -> p::e || p::e
   e ->
a cannot have type t0.

The basic edge construct in SE1 applies to all the outgoing edges for a
particular property; in
  ex:a ex:p ex:v .
  ex:a ex:p ex:w .
  ex:a ex:p ex:x .
with
  <S0> { ex:p (ex:v ex:w)>{2,3} }
ex:a does not have type <S0>.
The basic edge construct in SE2 does not need to apply to all of the
outgoing edges for a particular property; in
  { < a p b >, <b q x>, <a p c>, <c r y> }
with
  t0 -> (p::t1)+ || (p::t2)+
  t1 -> q::e
  t2 -> r::e
  e ->
a has type t0.

SE1 can require that nodes have multiple types; in
  ex:a ex:p ex:c .
  ex:c ex:q ex:d .
  ex:c ex:r ex:e .
with
  <S0> { ex:p <S1>, ex:p <S2> }
  <S1> { ex:q ( ex:d ) }
  <S2> { ex:r ( ex:e ) }
ex:a having type <S0> requires that ex:c has both type <S1> and <S2>.




Where does SE3 fit:

SE3 appears to be an axiomatic semantics for the language of SE1 but it is
hard to figure out just what is going on.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU7g60AAoJECjN6+QThfjzxBAH/1/CpTLE/ZCIbs/J+pdf4DA6
+9tjYRjK+IXtJzSSu8gBilzpaCxlUo7/bP1iwsIxCUQ5EZ+V3vMDYgegivmPfzeb
+rKx0S3dv30iBKpgmDk0yagErL+/vz//h6y51aG0Uwi/S0TYyB+BuwRuiad5+HXk
96x9E+L6d1tRKSYnzOeP7fiKqz1BaZ8yASiyYzPWcc+7rN457eofACXhu3anQeuz
4lHMILUCLw3IFfhJSFFNYYFgeam8jTSl5XRbkLMCMi4Ergle5sd1VKNCY0/muHCg
zQMBXseISexrz83XDHQhty7Ju7x7UY5p8sTqrYkRLBDw76CFy6ocEsSSqMeuq0Q=
=gafT
-----END PGP SIGNATURE-----
Received on Wednesday, 25 February 2015 18:05:09 UTC