- From: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
- Date: Thu, 7 Aug 2014 00:37:11 +0200
- To: public-rdf-shapes@w3.org
Hi All, This is just food for thought to be considered by the WG when it forms. The key ideas are: * complexity needs to be managed not ignored. * internationalisation must be considered when talking about human friendly syntaxes. For those worried about human readable syntax feel free to skip to the bottom, before deciding if you want to read all of this long mail. Recently we had a small example of a validation/documentation requirement. I showed how it looked in a number of syntaxes. In ShEX Compact <WebServicePersonShape> { rdf:type foaf:Person , foaf:name xsd:string , foo:email xsd:string , foo:phone xsd:string * } In ShEx RDF <WebServicePersonShape> a rs:ResourceShape ; rs:property [ rs:occurs rs:Exactly-one . ; rs:propertyDefinition foaf:name ; rs:valueShape foaf:Person ], [ rs:occurs rs:Exactly-one ; rs:propertyDefinition foo:email ; rs:valueShape foaf:Person ] , [ rs:occurs rs:Zero-or-many ; rs:propertyDefinition foo:phone ; rs:valueShape foaf:Person ] . In SPIN turtle foaf:person a owl:Class; spin:constraint [ spl:predicate foaf:name ; spl:count 1 ] , [ spl:predicate foo:email ; spl:count 1 ] , [ spl:predicate foo:phone ; spl:minCount 0 ] . Now we change the requirements very slightly. Instead of any string being acceptable as e-mail we need to ensure it is actually a valid e-mail for an employee in our organisation. How would you determine what is a valid e-mail for an employee in a organisation? The first approach is to say that it has to match a specific regex. e.g. for the university I went to any e-mail address of an employee needs to match the regex “.+@pl.hanze.nl^”. Lets try that in ShEx compact shall we. <WebServicePersonShape> { rdf:type foaf:Person , foaf:name xsd:string , foo:email IRI %sparql{ ?s foo:email ?mbox . FILTER (REGEX(str(?mbox), “.+@pl.hanze.nl^”) %} foo:phone xsd:string * } And in spin turtle foaf:person a owl:Class; spin:constraint [ spl:predicate foaf:name ; spl:count 1 ] , [ spl:predicate foo:email ; spl:count 1 ] , [ sp:text “ASK{?s foo:email ?mbox . FILTER (!(REGEX(str(?mbox), “.+@pl.hanze.nl^”)) }" [ spl:predicate foo:phone ; spl:minCount 0 ] . This can be simplified again with a template library e.g. similar to of the ones given at [1]. our-company:employee-email a spin:Template ; rdfs:comment "This template check a IRI to be a valid employee e-mail property ?arg1."@en ; rdfs:label "syntax check in all instances: valid employee email"@en ; spin:body [ sp:text """CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?s . _:b0 spin:violationPath ?arg1 . } WHERE { ?arg1 foo:email ?mbox . FILTER (!(REGEX(str(?mbox), “.+@pl.hanze.nl^”)) }""" ; So now the spin turtle becomes foaf:person a owl:Class; spin:constraint [ spl:predicate foaf:name ; spl:count 1 ] , [ spl:predicate foo:email ; spl:count 1 ; a our-company:employee-email] , [ spl:predicate foo:phone ; spl:minCount 0 ] . Which is more likely to be readable to the “business” users. The second advantage of this kind of approach is that these templates are maintainable independently from the rules themselves. This becomes even more the case when the validation starts to be more complicated. Few companies have a clean separation between e-mails addresses for employees and mailing lists. However, most do have a LDAP or equivalent system. Assuming Squirrel RDF for LDAP[2] is still working a system can adapt that as an information source. Shex now turns into this. <WebServicePersonShape> { rdf:type foaf:Person , foaf:name xsd:string , foo:email IRI %sparql{ ?s foo:email ?mbox . SERVICE<ourcompanyLdap>{ ?employee foaf:mbox ?mbox ; a :Employee }) %} foo:phone xsd:string * } While the SPIN solution stays like this. foaf:person a owl:Class; spin:constraint [ spl:predicate foaf:name ; spl:count 1 ] , [ spl:predicate foo:email ; spl:count 1 ; a our-company:employee-email] , [ spl:predicate foo:phone ; spl:minCount 0 ] . The SPIN solution can be made more compact if it had more predicates e.g. foaf:person a owl:Class; shape:hasOne [ spl:predicate foaf:name ] , [ a our-company:employee-email ] . shape:some [ spl:predicate foo:phone ] . Compare this to the compact ShEx form? Is the SPIN one that bad? Now on internationalisation. I suggest the following strawman syntaxes, replace each IRI with the rdfs:label in the users language e.g. the above SPIN syntax “translated” into english. “A person” “is a” “Class” ; “has one” [ “property” “name” ] , [ “is a” “Has a company e-mail address" ] ; “might have” [ “property” “Phone number" ] . And dutch “Een persoon” “is een” “Klasse” ; “heeft een” [ “eigenschap” “naam” ] , [ “is een” “Heeft een bedrijf e-mail adres” ] ; “heeft mogelijk” [“eigenschap” “telefoon nummer” ] . A consequence of this syntax per language choice is that it can not be used for interchange. i.e. its only a UI not ever a storage format. Regards, Jerven [1] http://semwebquality.org/documentation/primer/20101124/index.html [2] http://www.thefigtrees.net/lee/blog/2006/07/im_a_sparql_junkie.html ------------------------------------------------------------------- Jerven Bolleman Jerven.Bolleman@isb-sib.ch SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 1211 Geneve 4, Switzerland www.isb-sib.ch - www.uniprot.org Follow us at https://twitter.com/#!/uniprot -------------------------------------------------------------------
Received on Wednesday, 6 August 2014 22:37:44 UTC