W3C home > Mailing lists > Public > public-data-shapes-wg@w3.org > August 2016

Re: an alternative proposal for partition

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 12 Aug 2016 16:05:25 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <6b061484-b01a-98d0-4d25-9e56fde40cff@topquadrant.com>
I think this will come down to a general design choice. Do we want to 
add this very complex feature to the SHACL Core, or do we simply point 
people at the SPARQL extension mechanism. QCRs cover quite a number of 
use cases. Not every use case will be nicely expressible this way, but 
then OTOH many people already know SPARQL and use it every day for these 
very kinds of complex queries. Why do we need to reinvent everything 
into a higher-level language, esp given that - without doubt - someone 
else will request yet another design pattern that is not covered by our 
core language even with the most general sh:partition feature. We need 
to stop somewhere.

In SPARQL, the check for your scenario could be something like (untested):

SELECT ?this
WHERE {
     FILTER NOT EXISTS {
         ?this dc:creator ?mil .
         FILTER regex(?mil, "^mailto:.*@b.mil") .
         ?this dc:creator ?gov .
         FILTER (regex(?gov, "^mailto:.*@a.gov") ||
                 EXISTS { ?gov foaf:mbox ?mbox . FILTER regex(?mbox, 
"^mailto:.*@a.gov") })
     }
}

and we don't need to reinvent further wheels. You can even combine these 
with existing SHACL shapes, using sh:hasShape. Or you can make this 
nicer with SHACL functions, e.g.

SELECT ?this
WHERE {
     FILTER NOT EXISTS {
         ?this dc:creator ?mil .
         FILTER ex:isMilEmail(?mil) .
         ?this dc:creator ?gov .
         FILTER (ex:isGovEmail(?gov) || EXISTS { ?gov foaf:mbox ?mbox . 
FILTER ex:isGovEmail(?mbox) })
     }
}

which is IMHO quite an acceptable Compact Syntax, only far more general.

Holger


On 12/08/2016 11:01, Eric Prud'hommeaux wrote:
> * Holger Knublauch <holger@topquadrant.com> [2016-08-11 17:14+1000]
>> This looks like quite a mega feature, if sh:and and sh:or become overloaded
>> with very different meaning, requiring a new execution algorithm etc. What
>> about spawning this off into an extension, just like the SPARQL stuff is in
>> an extension?
>>
>> Another option is to handle this on the Compact Syntax level, and produce
>> QCRs under the hood. Are there any scenarios where QCRs could not (in
>> principle) express your use cases?
> The QCRs are relatively simple to generate but the universal
> constraint is problematic. Taking the 2nd example below with a
>
> shexc:
>    <S> {
>      (   dc:creator PATTERN "^mailto:.*@a.gov"  # either creator a.gov
>        | dc:creator {                           # or a creator some node
>            foaf:mbox PATTERN "^mailto:.*@a.gov" # with a foaf:mbox of a.gov
>          }
>      ) ;
>      dc:creator PATTERN "^mailto:.*@b.mil"     # and one b.mil creator
>    }
>
> dc:creator which may be an email or a bnode with a foaf:mbox, we can
> compose and additional universal constraint which limits the objects
> of dc:creator to the three enumerated forms:
>
> shacl:
>    <S>
>      sh:and (
>        [ sh:or (
>    #   dc:creator PATTERN "^mailto:.*@a.gov"  # either creator a.gov
>          [ sh:property
>            [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@a.gov" ;
>              sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
> 	] ]
>    # | dc:creator {                           # or a creator some node
>    #     foaf:mbox PATTERN "^mailto:.*@a.gov" # with a foaf:mbox of a.gov
>    #   }
>          [ sh:property
>            [ sh:predicate dc:creator ; sh:shape [
> 	      sh:property
> 	        [ sh:predicate foaf:mbox ; sh:pattern "^mailto:.*@a.gov" ;
>                    sh:minCount 1; sh:maxCount 1
> 	    ] ] ; sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
> 	] ]
>        ) ]
>    #   dc:creator PATTERN "^mailto:.*@b.mil"     # and one b.mil creator
>        [ sh:property
>          [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@b.mil" ;
>            sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
>        ] ]
>    # universal constraint to handle closure
>        [ sh:or (
>          [ sh:property
>            [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@a.gov" ;
>              sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
> 	] ]
>          [ sh:property
>            [ sh:predicate dc:creator ; sh:shape [
> 	      sh:property
> 	        [ sh:predicate foaf:mbox ; sh:pattern "^mailto:.*@a.gov" ;
>                    sh:minCount 1; sh:maxCount 1
> 	    ] ] ; sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
> 	] ]
>          [ sh:property
>            [ sh:predicate dc:creator ; sh:pattern "^mailto:.*@b.mil" ;
>              sh:qualfiedMinCount 1; sh:qualfiedMaxCount 1
>          ] ]
>        ) ]
>      ) .
>
> This gets us part way there, but round-tripping may be impossible and
> it doesn't provide the OneOf OR behavior has described in the
> foaf:name example below. A typical example from clinical data (FHIR)
> analog to the (name|givenName,familyName) example would involve making
> sure that an osteoporotic spiral fracture had either a single
> component with a compound code ("46675001|73737008") or two
> components, each with one of those codes. For instance, ShEx's
> partitioning semantics allows us to easily capture this error:
>
>    <Obs1> a fhir:Observation ;
>      fhir:component
>        [ fhir:code "46675001|73737008" ],
>        [ fhir:code "46675001" ].
>
>
>> Holger
>>
>>
>> On 11/08/2016 11:32, Eric Prud'hommeaux wrote:
>>> The current partition meets some additive use cases like:
>>>    <S> {
>>>      dc:creator PATTERN "^mailto:a.gov" ; # one a.gov creator
>>>      dc:creator PATTERN "^mailto:b.mil"   # and one b.mil creator
>>>    }
>>>
>>> but not ones with any algebraic operators like:
>>>    <S> {
>>>      (   dc:creator PATTERN "^mailto:.*@a.gov"  # either creator a.gov
>>>        | dc:creator {                           # or a creator some node
>>>            foaf:mbox PATTERN "^mailto:.*@a.gov" # with a foaf:mbox of a.gov
>>>          }
>>>      ) ;
>>>      dc:creator PATTERN "^mailto:.*@b.mil"     # and one b.mil creator
>>>    }
>>>
>>> An alternative which would be to create a syntax to capture ShEx's
>>> partition semantics which say:
>>>    Map the triples to the triple patterns with the same predicate.
>>>    The node is valid with respect to a triple expression if there
>>>    is a mapping of triple to triple pattern which satisfies the
>>>    expression.
>>> For instance, the data
>>>    <s> dc:creator <mailto:a@b.mil> .
>>>    <s> dc:creator _:b1 .
>>>    _:b1 foaf:mbox <mailto:b@a.gov> .
>>> satisfies the above pattern.
>>>
>>> I propose leveraging the current partition but allowing expressions:
>>>    <S> sh:partition [
>>>      sh:and (
>>>        [ sh:property [
>>>            sh:predicate ex:creator ; sh:minCount 1 ; sh:maxCount 1 ;
>>> 	  sh:pattern "^mailto:.*@a.gov" ] ]
>>>        [ sh:property [
>>>            sh:predicate ex:creator ; sh:minCount 1 ; sh:maxCount 1 ;
>>> 	  sh:pattern "^mailto:.*@b.mil" ] ]
>>>      ) .
>>>
>>> This also handily provides a semantics with a disjunctive OR so e.g.
>>>    <EmployeeShape> {
>>>        foaf:name .          # either a foaf:name
>>>      | ( foaf:givenName . ; # or a pair of givenName
>>>          foaf:familyName .  # and familyName
>>>        )
>>>    }
>>> would not be satisfied with a partial pair:
>>>    <emp1> foaf:name "Alice Cooper" .
>>>    <emp1> foaf:familyName "Cooper" .
>>> because the 1st disjoint doesn't use the 2nd triple and the 2nd
>>> disjoint is missing a familyName.
>>>
>>> Users wanting either additive properties or disjunctive OR could
>>> use the sh:partition operator.
>>>
>>
Received on Friday, 12 August 2016 06:06:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:30:36 UTC