Readability in the face of complexity and internationalisation of human friendly syntaxes for constraints and documentation

Hi All,

This is just food for thought to be considered by the WG when it forms.
The key ideas are:
 * complexity needs to be managed not ignored.
 * internationalisation must be considered when talking about human friendly syntaxes.


For those worried about human readable syntax feel free to skip to the bottom, before deciding 
if you want to read all of this long mail.

Recently we had a small example of a validation/documentation requirement.
I showed how it looked in a number of syntaxes.


In ShEX Compact

<WebServicePersonShape> {
  rdf:type foaf:Person ,
  foaf:name xsd:string ,
  foo:email xsd:string ,
  foo:phone xsd:string *
}

In ShEx RDF
<WebServicePersonShape> a rs:ResourceShape ;
   rs:property  [ rs:occurs              rs:Exactly-one .  ;
                  rs:propertyDefinition  foaf:name ;
                  rs:valueShape          foaf:Person ],
                 [ rs:occurs              rs:Exactly-one ;
                  rs:propertyDefinition  foo:email ;
                  rs:valueShape          foaf:Person ] ,
                 [ rs:occurs              rs:Zero-or-many ;
                  rs:propertyDefinition  foo:phone ;
                  rs:valueShape          foaf:Person ] .

In SPIN turtle

foaf:person a owl:Class;
spin:constraint [ spl:predicate foaf:name ;
                  spl:count 1 ] ,
                [ spl:predicate foo:email ;
                  spl:count 1 ] ,
                [ spl:predicate foo:phone ;
                  spl:minCount 0 ] .

Now we change the requirements very slightly. Instead of any string being acceptable as e-mail we need to ensure
it is actually a valid e-mail for an employee in our organisation.

How would you determine what is a valid e-mail for an employee in a organisation?
The first approach is to say that it has to match a specific regex. e.g. for the university I went to any e-mail
address of an employee needs to match the regex “.+@pl.hanze.nl^”.

Lets try that in ShEx compact shall we.

<WebServicePersonShape> {
  rdf:type foaf:Person ,
  foaf:name xsd:string ,
  foo:email IRI %sparql{ ?s foo:email ?mbox . FILTER (REGEX(str(?mbox), “.+@pl.hanze.nl^”) %}
  foo:phone xsd:string *
}

And in spin turtle
foaf:person a owl:Class;
spin:constraint [ spl:predicate foaf:name ;
                  spl:count 1 ] ,
                [ spl:predicate foo:email ;
                  spl:count 1 ] ,
                [ sp:text “ASK{?s foo:email ?mbox . FILTER (!(REGEX(str(?mbox), “.+@pl.hanze.nl^”)) }"
                [ spl:predicate foo:phone ;
                  spl:minCount 0 ] .

This can be simplified again with a template library e.g. similar to of the ones given at [1].

our-company:employee-email 
      a       spin:Template ;
      rdfs:comment "This template check a IRI to be a valid employee e-mail property ?arg1."@en ;
      rdfs:label "syntax check in all instances:  valid employee email"@en ;
      spin:body
              [ sp:text """CONSTRUCT {
                 _:b0 a spin:ConstraintViolation .
                 _:b0 spin:violationRoot ?s .
                 _:b0 spin:violationPath ?arg1 .
                }
                WHERE {
                 ?arg1 foo:email ?mbox .
                 FILTER (!(REGEX(str(?mbox), “.+@pl.hanze.nl^”))
                }""" ;

So now the spin turtle becomes

foaf:person a owl:Class;
spin:constraint [ spl:predicate foaf:name ;
                  spl:count 1 ] ,
                [ spl:predicate foo:email ;
                  spl:count 1 ;
		  a our-company:employee-email] ,
                [ spl:predicate foo:phone ;
                  spl:minCount 0 ] .

Which is more likely to be readable to the “business” users.

The second advantage of this kind of approach is that these templates are maintainable independently from the rules themselves.

This becomes even more the case when the validation starts to be more complicated.
Few companies have a clean separation between e-mails addresses for employees and mailing lists.
However, most do have a LDAP or equivalent system. Assuming Squirrel RDF for LDAP[2] is still working
a system can adapt that as an information source.

Shex now turns into this.

<WebServicePersonShape> {
  rdf:type foaf:Person ,
  foaf:name xsd:string ,
  foo:email IRI %sparql{ ?s foo:email ?mbox . SERVICE<ourcompanyLdap>{ ?employee foaf:mbox ?mbox ; a :Employee }) %}
  foo:phone xsd:string *
}

While the SPIN solution stays like this.

foaf:person a owl:Class;
spin:constraint [ spl:predicate foaf:name ;
                  spl:count 1 ] ,
                [ spl:predicate foo:email ;
                  spl:count 1 ;
		  a our-company:employee-email] ,
                [ spl:predicate foo:phone ;
                  spl:minCount 0 ] .

The SPIN solution can be made more compact if it had more predicates e.g.

foaf:person a owl:Class;
shape:hasOne  [ spl:predicate foaf:name  ] ,
	      [ a our-company:employee-email ] .
shape:some    [ spl:predicate foo:phone ] .

Compare this to the compact ShEx form? Is the SPIN one that bad?

Now on internationalisation. I suggest the following strawman syntaxes, replace each IRI with the rdfs:label in 
the users language e.g. the above SPIN syntax “translated” into english.

“A person” “is a” “Class” ;
           “has one” [ “property” “name” ] ,
                     [ “is a” “Has a company e-mail address" ] ;
           “might have” [ “property” “Phone number" ] .

And dutch

“Een persoon” “is een” “Klasse” ;
              “heeft een” [ “eigenschap” “naam” ] ,
                          [ “is een” “Heeft een bedrijf e-mail adres” ] ;
              “heeft mogelijk” [“eigenschap” “telefoon nummer” ] .

A consequence of this syntax per language choice is that it can not be used for interchange.
i.e. its only a UI not ever a storage format.

Regards,
Jerven

[1] http://semwebquality.org/documentation/primer/20101124/index.html
[2] http://www.thefigtrees.net/lee/blog/2006/07/im_a_sparql_junkie.html


-------------------------------------------------------------------
Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.isb-sib.ch - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------

Received on Wednesday, 6 August 2014 22:37:44 UTC