Re: Readability in the face of complexity and internationalisation of human friendly syntaxes for constraints and documentation

It's kind of weird that SPARQL isn't considered a "human readable syntax"
by the charter; and it'll be interesting to see this WG do better than DAWG
did making *another* syntax for all of SPARQL.

Good luck!

Cheers,
Kendall

On Wednesday, August 6, 2014, Jerven Bolleman <jerven.bolleman@isb-sib.ch>
wrote:

> Hi All,
>
> This is just food for thought to be considered by the WG when it forms.
> The key ideas are:
>  * complexity needs to be managed not ignored.
>  * internationalisation must be considered when talking about human
> friendly syntaxes.
>
>
> For those worried about human readable syntax feel free to skip to the
> bottom, before deciding
> if you want to read all of this long mail.
>
> Recently we had a small example of a validation/documentation requirement.
> I showed how it looked in a number of syntaxes.
>
>
> In ShEX Compact
>
> <WebServicePersonShape> {
>   rdf:type foaf:Person ,
>   foaf:name xsd:string ,
>   foo:email xsd:string ,
>   foo:phone xsd:string *
> }
>
> In ShEx RDF
> <WebServicePersonShape> a rs:ResourceShape ;
>    rs:property  [ rs:occurs              rs:Exactly-one .  ;
>                   rs:propertyDefinition  foaf:name ;
>                   rs:valueShape          foaf:Person ],
>                  [ rs:occurs              rs:Exactly-one ;
>                   rs:propertyDefinition  foo:email ;
>                   rs:valueShape          foaf:Person ] ,
>                  [ rs:occurs              rs:Zero-or-many ;
>                   rs:propertyDefinition  foo:phone ;
>                   rs:valueShape          foaf:Person ] .
>
> In SPIN turtle
>
> foaf:person a owl:Class;
> spin:constraint [ spl:predicate foaf:name ;
>                   spl:count 1 ] ,
>                 [ spl:predicate foo:email ;
>                   spl:count 1 ] ,
>                 [ spl:predicate foo:phone ;
>                   spl:minCount 0 ] .
>
> Now we change the requirements very slightly. Instead of any string being
> acceptable as e-mail we need to ensure
> it is actually a valid e-mail for an employee in our organisation.
>
> How would you determine what is a valid e-mail for an employee in a
> organisation?
> The first approach is to say that it has to match a specific regex. e.g.
> for the university I went to any e-mail
> address of an employee needs to match the regex “.+@pl.hanze.nl^”.
>
> Lets try that in ShEx compact shall we.
>
> <WebServicePersonShape> {
>   rdf:type foaf:Person ,
>   foaf:name xsd:string ,
>   foo:email IRI %sparql{ ?s foo:email ?mbox . FILTER (REGEX(str(?mbox),
> “.+@pl.hanze.nl^”) %}
>   foo:phone xsd:string *
> }
>
> And in spin turtle
> foaf:person a owl:Class;
> spin:constraint [ spl:predicate foaf:name ;
>                   spl:count 1 ] ,
>                 [ spl:predicate foo:email ;
>                   spl:count 1 ] ,
>                 [ sp:text “ASK{?s foo:email ?mbox . FILTER
> (!(REGEX(str(?mbox), “.+@pl.hanze.nl^”)) }"
>                 [ spl:predicate foo:phone ;
>                   spl:minCount 0 ] .
>
> This can be simplified again with a template library e.g. similar to of
> the ones given at [1].
>
> our-company:employee-email
>       a       spin:Template ;
>       rdfs:comment "This template check a IRI to be a valid employee
> e-mail property ?arg1."@en ;
>       rdfs:label "syntax check in all instances:  valid employee email"@en
> ;
>       spin:body
>               [ sp:text """CONSTRUCT {
>                  _:b0 a spin:ConstraintViolation .
>                  _:b0 spin:violationRoot ?s .
>                  _:b0 spin:violationPath ?arg1 .
>                 }
>                 WHERE {
>                  ?arg1 foo:email ?mbox .
>                  FILTER (!(REGEX(str(?mbox), “.+@pl.hanze.nl^”))
>                 }""" ;
>
> So now the spin turtle becomes
>
> foaf:person a owl:Class;
> spin:constraint [ spl:predicate foaf:name ;
>                   spl:count 1 ] ,
>                 [ spl:predicate foo:email ;
>                   spl:count 1 ;
>                   a our-company:employee-email] ,
>                 [ spl:predicate foo:phone ;
>                   spl:minCount 0 ] .
>
> Which is more likely to be readable to the “business” users.
>
> The second advantage of this kind of approach is that these templates are
> maintainable independently from the rules themselves.
>
> This becomes even more the case when the validation starts to be more
> complicated.
> Few companies have a clean separation between e-mails addresses for
> employees and mailing lists.
> However, most do have a LDAP or equivalent system. Assuming Squirrel RDF
> for LDAP[2] is still working
> a system can adapt that as an information source.
>
> Shex now turns into this.
>
> <WebServicePersonShape> {
>   rdf:type foaf:Person ,
>   foaf:name xsd:string ,
>   foo:email IRI %sparql{ ?s foo:email ?mbox . SERVICE<ourcompanyLdap>{
> ?employee foaf:mbox ?mbox ; a :Employee }) %}
>   foo:phone xsd:string *
> }
>
> While the SPIN solution stays like this.
>
> foaf:person a owl:Class;
> spin:constraint [ spl:predicate foaf:name ;
>                   spl:count 1 ] ,
>                 [ spl:predicate foo:email ;
>                   spl:count 1 ;
>                   a our-company:employee-email] ,
>                 [ spl:predicate foo:phone ;
>                   spl:minCount 0 ] .
>
> The SPIN solution can be made more compact if it had more predicates e.g.
>
> foaf:person a owl:Class;
> shape:hasOne  [ spl:predicate foaf:name  ] ,
>               [ a our-company:employee-email ] .
> shape:some    [ spl:predicate foo:phone ] .
>
> Compare this to the compact ShEx form? Is the SPIN one that bad?
>
> Now on internationalisation. I suggest the following strawman syntaxes,
> replace each IRI with the rdfs:label in
> the users language e.g. the above SPIN syntax “translated” into english.
>
> “A person” “is a” “Class” ;
>            “has one” [ “property” “name” ] ,
>                      [ “is a” “Has a company e-mail address" ] ;
>            “might have” [ “property” “Phone number" ] .
>
> And dutch
>
> “Een persoon” “is een” “Klasse” ;
>               “heeft een” [ “eigenschap” “naam” ] ,
>                           [ “is een” “Heeft een bedrijf e-mail adres” ] ;
>               “heeft mogelijk” [“eigenschap” “telefoon nummer” ] .
>
> A consequence of this syntax per language choice is that it can not be
> used for interchange.
> i.e. its only a UI not ever a storage format.
>
> Regards,
> Jerven
>
> [1] http://semwebquality.org/documentation/primer/20101124/index.html
> [2] http://www.thefigtrees.net/lee/blog/2006/07/im_a_sparql_junkie.html
>
>
> -------------------------------------------------------------------
> Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
> <javascript:;>
> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland     www.isb-sib.ch - www.uniprot.org
> Follow us at https://twitter.com/#!/uniprot
> -------------------------------------------------------------------
>
>
>

Received on Wednesday, 6 August 2014 22:42:23 UTC