on shape recognition

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

The first example in http://www.w3.org/Submission/shex-primer/ is
IssueShape.  The core of this shape is, roughly, "an issue has a state that
is either unassigned or assigned and everything that it is related to is
also a shape."  In ShEx this is
<IssueShape> {
  ex:state (ex:unassigned ex:assigned),
  ex:related @<IssueState>* }


Consider this shape on the following graph:

ex:i1 ex:state ex:unassigned .

ex:i2 ex:state ex:unassigned ;
      ex:related ex:i2 .

ex:i31 ex:state ex:unassigned ;
       ex:related ex:i32 .
ex:i32 ex:state ex:assigned .

ex:i41 ex:state ex:unassigned ;
       ex:related ex:i42 .
ex:i42 ex:state ex:solved .

ex:i53 foaf:name "Peter F. Patel-Schneider" ;
       ex:related ex:i54 ;
       ex:state ex:content .
ex:i54 foaf:name "Sandy Patel-Schneider" ;
       ex:related ex:i53 ;
       ex:state ex:happy .


It seems that the shape recognition is going wrong here.  It correctly
recognizes (at least under the newer semantics for shapes) nodes that should
be issues, namely ex:i1, ex:i2, ex:i31, and ex:32.  However, it treats nodes
that appear to be "incorrect" issues, namely ex:41 and ex:42, the same as
nodes that do not appear to be issues at all, namely ex:53 and ex:54.

It seems to me that, even for this initial example of shape recognition,
that shape recognition is only suitable under very limited circumstances,
namely when there is a very strong closure situation, i.e., each node in the
input must belong to a shape.  In such situations, shape recognition appears
to be viable, but I don't think that even here it is preferable to other
methods.


I think that a much better way of validating inputs is via constraints,
which are of the form of anything that satisifes some condition must also
meet some other condition.  The constraint view would slightly modify the
above shape, producing (in this sparse setup) "an issue is something with a
state of either unassigned or assigned, and everything that it is related to
is also an issue."  This is a recursive constraint, but it can be easily
turned into a non-recursive constraint by using the satisfaction condition
for issues instead of issue itself, ending up with "an issue is something
with a state of either unassigned or assigned, and everything that it is
related to has a state of either unassigned or assigned."

In the more normal case, I think, there would be a class for issue, where
this would be written instead as "issues have a state of either unassigned
or assigned, and everything that an issue is related to is also an issue."
This ends up with better discriminatory power and better explanatory power,
in my opinion.


So I'm even more puzzled as to why one would want to use something like
Shape Expressions.


Peter F. Patel-Schneider
Nuance Communications
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVdfDmAAoJECjN6+QThfjzLG8IAMPYwB5EPtYtJKDrrLd3Lret
zrmQSv4c8RkyPwI6vz8jjleRDonmnMzG9w6XFjCc/Brkc2NzOl7eUeKxeatkQ+P7
BbbhbpLxm3UfqFIxzXAmAHh3b+UhCxc0ykjp4HAC6sc4FKc6yFObBMwLAmSWo80E
spCMY/OZsrrz9JwaYueOTtXmgK1xNsdwDseNnLYcckE4FwpTSwuctLsOCLl3IT4M
AOO3lPx16i6wxXKWXz139So95YfCP+k/rToqkBupSbO0i2uTP87yN/u+W4VZFn5s
zzwqQmW9vbyurfnftWGqA92NIEKNkCAT5iT3kNXGsxui87LLZfRgWJeFeRls02o=
=zk5z
-----END PGP SIGNATURE-----

Received on Monday, 8 June 2015 19:46:13 UTC