Re: Readability in the face of complexity and internationalisation of human friendly syntaxes for constraints and documentation from Karen Coyle on 2014-08-06 (public-rdf-shapes@w3.org from August 2014)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Wed, 06 Aug 2014 16:58:54 -0700
To: public-rdf-shapes@w3.org
Message-ID: <53E2C13E.6050405@kcoyle.net>
I like the phrase "Controlled natural language" [1]. This is defined as:

"Controlled Natural Languages are subsets of natural languages whose 
grammars and dictionaries have been restricted in order to reduce or 
eliminate both ambiguity and complexity. Traditionally, controlled 
natural languages fall into two major categories: those that improve the 
readability for human readers, in particularly for non-native speakers, 
and those that improve the computational processing of a text."

It's *like* natural language, and can be understood (with a little 
instruction, perhaps) by anyone who knows that language, but it is also 
precise enough to express formal statements. It can, of course, be 
internationalised. No one is expected to know all "human-readable 
languages," and not everything that one human can read can be expected 
to be readable by everyone, to whit:

هذا هو بيان

కాబట్టి ఈ ఉంది

kc

[1] https://sites.google.com/site/controllednaturallanguage/

On 8/6/14, 3:41 PM, Kendall Clark wrote:
> It's kind of weird that SPARQL isn't considered a "human readable
> syntax" by the charter; and it'll be interesting to see this WG do
> better than DAWG did making *another* syntax for all of SPARQL.
>
> Good luck!
>
> Cheers,
> Kendall
>
> On Wednesday, August 6, 2014, Jerven Bolleman
> <jerven.bolleman@isb-sib.ch <mailto:jerven.bolleman@isb-sib.ch>> wrote:
>
>     Hi All,
>
>     This is just food for thought to be considered by the WG when it forms.
>     The key ideas are:
>       * complexity needs to be managed not ignored.
>       * internationalisation must be considered when talking about human
>     friendly syntaxes.
>
>
>     For those worried about human readable syntax feel free to skip to
>     the bottom, before deciding
>     if you want to read all of this long mail.
>
>     Recently we had a small example of a validation/documentation
>     requirement.
>     I showed how it looked in a number of syntaxes.
>
>
>     In ShEX Compact
>
>     <WebServicePersonShape> {
>        rdf:type foaf:Person ,
>        foaf:name xsd:string ,
>        foo:email xsd:string ,
>        foo:phone xsd:string *
>     }
>
>     In ShEx RDF
>     <WebServicePersonShape> a rs:ResourceShape ;
>         rs:property  [ rs:occurs              rs:Exactly-one .  ;
>                        rs:propertyDefinition  foaf:name ;
>                        rs:valueShape          foaf:Person ],
>                       [ rs:occurs              rs:Exactly-one ;
>                        rs:propertyDefinition  foo:email ;
>                        rs:valueShape          foaf:Person ] ,
>                       [ rs:occurs              rs:Zero-or-many ;
>                        rs:propertyDefinition  foo:phone ;
>                        rs:valueShape          foaf:Person ] .
>
>     In SPIN turtle
>
>     foaf:person a owl:Class;
>     spin:constraint [ spl:predicate foaf:name ;
>                        spl:count 1 ] ,
>                      [ spl:predicate foo:email ;
>                        spl:count 1 ] ,
>                      [ spl:predicate foo:phone ;
>                        spl:minCount 0 ] .
>
>     Now we change the requirements very slightly. Instead of any string
>     being acceptable as e-mail we need to ensure
>     it is actually a valid e-mail for an employee in our organisation.
>
>     How would you determine what is a valid e-mail for an employee in a
>     organisation?
>     The first approach is to say that it has to match a specific regex.
>     e.g. for the university I went to any e-mail
>     address of an employee needs to match the regex “.+@pl.hanze.nl
>     <http://pl.hanze.nl>^”.
>
>     Lets try that in ShEx compact shall we.
>
>     <WebServicePersonShape> {
>        rdf:type foaf:Person ,
>        foaf:name xsd:string ,
>        foo:email IRI %sparql{ ?s foo:email ?mbox . FILTER
>     (REGEX(str(?mbox), “.+@pl.hanze.nl <http://pl.hanze.nl>^”) %}
>        foo:phone xsd:string *
>     }
>
>     And in spin turtle
>     foaf:person a owl:Class;
>     spin:constraint [ spl:predicate foaf:name ;
>                        spl:count 1 ] ,
>                      [ spl:predicate foo:email ;
>                        spl:count 1 ] ,
>                      [ sp:text “ASK{?s foo:email ?mbox . FILTER
>     (!(REGEX(str(?mbox), “.+@pl.hanze.nl <http://pl.hanze.nl>^”)) }"
>                      [ spl:predicate foo:phone ;
>                        spl:minCount 0 ] .
>
>     This can be simplified again with a template library e.g. similar to
>     of the ones given at [1].
>
>     our-company:employee-email
>            a       spin:Template ;
>            rdfs:comment "This template check a IRI to be a valid
>     employee e-mail property ?arg1."@en ;
>            rdfs:label "syntax check in all instances:  valid employee
>     email"@en ;
>            spin:body
>                    [ sp:text """CONSTRUCT {
>                       _:b0 a spin:ConstraintViolation .
>                       _:b0 spin:violationRoot ?s .
>                       _:b0 spin:violationPath ?arg1 .
>                      }
>                      WHERE {
>                       ?arg1 foo:email ?mbox .
>                       FILTER (!(REGEX(str(?mbox), “.+@pl.hanze.nl
>     <http://pl.hanze.nl>^”))
>                      }""" ;
>
>     So now the spin turtle becomes
>
>     foaf:person a owl:Class;
>     spin:constraint [ spl:predicate foaf:name ;
>                        spl:count 1 ] ,
>                      [ spl:predicate foo:email ;
>                        spl:count 1 ;
>                        a our-company:employee-email] ,
>                      [ spl:predicate foo:phone ;
>                        spl:minCount 0 ] .
>
>     Which is more likely to be readable to the “business” users.
>
>     The second advantage of this kind of approach is that these
>     templates are maintainable independently from the rules themselves.
>
>     This becomes even more the case when the validation starts to be
>     more complicated.
>     Few companies have a clean separation between e-mails addresses for
>     employees and mailing lists.
>     However, most do have a LDAP or equivalent system. Assuming Squirrel
>     RDF for LDAP[2] is still working
>     a system can adapt that as an information source.
>
>     Shex now turns into this.
>
>     <WebServicePersonShape> {
>        rdf:type foaf:Person ,
>        foaf:name xsd:string ,
>        foo:email IRI %sparql{ ?s foo:email ?mbox .
>     SERVICE<ourcompanyLdap>{ ?employee foaf:mbox ?mbox ; a :Employee }) %}
>        foo:phone xsd:string *
>     }
>
>     While the SPIN solution stays like this.
>
>     foaf:person a owl:Class;
>     spin:constraint [ spl:predicate foaf:name ;
>                        spl:count 1 ] ,
>                      [ spl:predicate foo:email ;
>                        spl:count 1 ;
>                        a our-company:employee-email] ,
>                      [ spl:predicate foo:phone ;
>                        spl:minCount 0 ] .
>
>     The SPIN solution can be made more compact if it had more predicates
>     e.g.
>
>     foaf:person a owl:Class;
>     shape:hasOne  [ spl:predicate foaf:name  ] ,
>                    [ a our-company:employee-email ] .
>     shape:some    [ spl:predicate foo:phone ] .
>
>     Compare this to the compact ShEx form? Is the SPIN one that bad?
>
>     Now on internationalisation. I suggest the following strawman
>     syntaxes, replace each IRI with the rdfs:label in
>     the users language e.g. the above SPIN syntax “translated” into english.
>
>     “A person” “is a” “Class” ;
>                 “has one” [ “property” “name” ] ,
>                           [ “is a” “Has a company e-mail address" ] ;
>                 “might have” [ “property” “Phone number" ] .
>
>     And dutch
>
>     “Een persoon” “is een” “Klasse” ;
>                    “heeft een” [ “eigenschap” “naam” ] ,
>                                [ “is een” “Heeft een bedrijf e-mail
>     adres” ] ;
>                    “heeft mogelijk” [“eigenschap” “telefoon nummer” ] .
>
>     A consequence of this syntax per language choice is that it can not
>     be used for interchange.
>     i.e. its only a UI not ever a storage format.
>
>     Regards,
>     Jerven
>
>     [1] http://semwebquality.org/documentation/primer/20101124/index.html
>     [2] http://www.thefigtrees.net/lee/blog/2006/07/im_a_sparql_junkie.html
>
>
>     -------------------------------------------------------------------
>     Jerven Bolleman Jerven.Bolleman@isb-sib.ch <javascript:;>
>     SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
>     CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
>     1211 Geneve 4,
>     Switzerland www.isb-sib.ch <http://www.isb-sib.ch> - www.uniprot.org
>     <http://www.uniprot.org>
>     Follow us at https://twitter.com/#!/uniprot
>     -------------------------------------------------------------------
>
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet
Received on Wednesday, 6 August 2014 23:59:24 UTC