W3C home > Mailing lists > Public > public-data-shapes-wg@w3.org > August 2016

Re: an alternative proposal for partition

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 11 Aug 2016 21:01:06 -0400
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-data-shapes-wg@w3.org
Message-ID: <20160812010104.GD11245@w3.org>
* Holger Knublauch <holger@topquadrant.com> [2016-08-11 17:14+1000]
> This looks like quite a mega feature, if sh:and and sh:or become overloaded
> with very different meaning, requiring a new execution algorithm etc. What
> about spawning this off into an extension, just like the SPARQL stuff is in
> an extension?
> 
> Another option is to handle this on the Compact Syntax level, and produce
> QCRs under the hood. Are there any scenarios where QCRs could not (in
> principle) express your use cases?

The QCRs are relatively simple to generate but the universal
constraint is problematic. Taking the 2nd example below with a

shexc:
  <S> {
    (   dc:creator PATTERN "^mailto:.*@a.gov"  # either creator a.gov
      | dc:creator {                           # or a creator some node
          foaf:mbox PATTERN "^mailto:.*@a.gov" # with a foaf:mbox of a.gov
        }
    ) ;
    dc:creator PATTERN "^mailto:.*@b.mil"     # and one b.mil creator
  }

dc:creator which may be an email or a bnode with a foaf:mbox, we can
compose and additional universal constraint which limits the objects
of dc:creator to the three enumerated forms:

shacl:
  <S> 
    sh:and (
      [ sh:or (
  #   dc:creator PATTERN "^mailto:.*@a.gov"  # either creator a.gov
        [ sh:property
          [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@a.gov" ;
            sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
	] ]
  # | dc:creator {                           # or a creator some node
  #     foaf:mbox PATTERN "^mailto:.*@a.gov" # with a foaf:mbox of a.gov
  #   }
        [ sh:property
          [ sh:predicate dc:creator ; sh:shape [
	      sh:property
	        [ sh:predicate foaf:mbox ; sh:pattern "^mailto:.*@a.gov" ;
                  sh:minCount 1; sh:maxCount 1
	    ] ] ; sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
	] ]
      ) ]
  #   dc:creator PATTERN "^mailto:.*@b.mil"     # and one b.mil creator
      [ sh:property
        [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@b.mil" ;
          sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
      ] ]
  # universal constraint to handle closure
      [ sh:or (
        [ sh:property
          [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@a.gov" ;
            sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
	] ]
        [ sh:property
          [ sh:predicate dc:creator ; sh:shape [
	      sh:property
	        [ sh:predicate foaf:mbox ; sh:pattern "^mailto:.*@a.gov" ;
                  sh:minCount 1; sh:maxCount 1
	    ] ] ; sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
	] ]
        [ sh:property
          [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@b.mil" ;
            sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
        ] ]
      ) ]
    ) .

This gets us part way there, but round-tripping may be impossible and
it doesn't provide the OneOf OR behavior has described in the
foaf:name example below. A typical example from clinical data (FHIR)
analog to the (name|givenName,familyName) example would involve making
sure that an osteoporotic spiral fracture had either a single
component with a compound code ("46675001|73737008") or two
components, each with one of those codes. For instance, ShEx's
partitioning semantics allows us to easily capture this error:

  <Obs1> a fhir:Observation ;
    fhir:component
      [ fhir:code "46675001|73737008" ],
      [ fhir:code "46675001" ].


> Holger
> 
> 
> On 11/08/2016 11:32, Eric Prud'hommeaux wrote:
> >The current partition meets some additive use cases like:
> >   <S> {
> >     dc:creator PATTERN "^mailto:a.gov" ; # one a.gov creator
> >     dc:creator PATTERN "^mailto:b.mil"   # and one b.mil creator
> >   }
> >
> >but not ones with any algebraic operators like:
> >   <S> {
> >     (   dc:creator PATTERN "^mailto:.*@a.gov"  # either creator a.gov
> >       | dc:creator {                           # or a creator some node
> >           foaf:mbox PATTERN "^mailto:.*@a.gov" # with a foaf:mbox of a.gov
> >         }
> >     ) ;
> >     dc:creator PATTERN "^mailto:.*@b.mil"     # and one b.mil creator
> >   }
> >
> >An alternative which would be to create a syntax to capture ShEx's
> >partition semantics which say:
> >   Map the triples to the triple patterns with the same predicate.
> >   The node is valid with respect to a triple expression if there
> >   is a mapping of triple to triple pattern which satisfies the
> >   expression.
> >For instance, the data
> >   <s> dc:creator <mailto:a@b.mil> .
> >   <s> dc:creator _:b1 .
> >   _:b1 foaf:mbox <mailto:b@a.gov> .
> >satisfies the above pattern.
> >
> >I propose leveraging the current partition but allowing expressions:
> >   <S> sh:partition [
> >     sh:and (
> >       [ sh:property [
> >           sh:predicate ex:creator ; sh:minCount 1 ; sh:maxCount 1 ;
> >	  sh:pattern "^mailto:.*@a.gov" ] ]
> >       [ sh:property [
> >           sh:predicate ex:creator ; sh:minCount 1 ; sh:maxCount 1 ;
> >	  sh:pattern "^mailto:.*@b.mil" ] ]
> >     ) .
> >
> >This also handily provides a semantics with a disjunctive OR so e.g.
> >   <EmployeeShape> {
> >       foaf:name .          # either a foaf:name
> >     | ( foaf:givenName . ; # or a pair of givenName
> >         foaf:familyName .  # and familyName
> >       )
> >   }
> >would not be satisfied with a partial pair:
> >   <emp1> foaf:name "Alice Cooper" .
> >   <emp1> foaf:familyName "Cooper" .
> >because the 1st disjoint doesn't use the 2nd triple and the 2nd
> >disjoint is missing a familyName.
> >
> >Users wanting either additive properties or disjunctive OR could
> >use the sh:partition operator.
> >
> 
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Friday, 12 August 2016 01:01:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:30:36 UTC