Re: ISSUE-137: Proposal to add sh:langShape from Holger Knublauch on 2016-09-13 (public-data-shapes-wg@w3.org from September 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 13 Sep 2016 14:34:55 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <d44ac53b-7aa4-b8f9-b57d-55e9b15180a4@topquadrant.com>
On 12/09/2016 20:07, Andy Seaborne wrote:
>
>
> On 12/09/16 00:30, Holger Knublauch wrote:
>> Taking this and Andy's input into consideration, maybe sh:langShape is
>> an overkill and all we really need is a new parameter such as
>> sh:languageIn which takes a node and, if it has a language tag, verifies
>> that it matches one of the provided languages following the SPARQL
>> langMatches semantics. For example:
>>
>> ex:MyShape
>>     a sh:Shape ;
>>     sh:property [
>>         sh:predicate skos:prefLabel ;
>>         sh:or ( [ sh:datatype xsd:string ] [ sh:datatype rdf:langString
>> ] ) ;
>>         sh:langMatches ( "en" "fr" "de" ) .
>
> A note: this is a slightly different operation to sparql:langMatches 
> which takes a language tag and a language match, not a literal and 
> language match.  Some people prefer that local names are not reused to 
> mean slightly different things where possible.

Oops, yes. I intended to use sh:languageIn but forgot to update the 
example. So here it is again:

ex:MyShape
     a sh:Shape ;
     sh:property [
         sh:predicate skos:prefLabel ;
         sh:or ( [ sh:datatype xsd:string ] [ sh:datatype rdf:langString 
] ) ;
         sh:languageIn ( "en" "fr" "de" ) ;
] .


>
>>     ] .
>>
>> langMatches could be for just a single language, but having a list is
>> shorter for this (apparently) common case in multi-lingual countries
>> such as Belgium. I didn't know the RFC supports wildcards - this should
>> hopefully flexible enough to cover all given use cases, but others may
>> need to confirm.
>>
>> Regards,
>> Holger
>>
>> PS: Andy, I prefer sh:datatype rdf:langString because it would be one
>> thing less to check (by form builders etc), and furthermore I believe
>> the semantics of sh:langMatches needs to be that it only does something
>> if the literal really has a language tag. Otherwise it would be harder
>> to express mixed cases of either string or langString (which I believe
>> is quite common).
>
> Consider
>
>      sh:property [
>          sh:predicate skos:prefLabel ;
>          sh:langMatches ( "en" "fr" "de" ) .
>      ] .
>
> with data:
>
>     <uri> skos:prefLabel 123 .
>
> which is a violation when sh:langMatches requires the language tag but 
> passes if sh:langMatches only triggers if there is a language tag at 
> all.  I find the latter a strange natural interpretation of the shape.
>
> String or language match would be:
>
>   sh:or ( [ sh:datatype xsd:string ]
>           [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
>
> There is no need to test for [ sh:datatype rdf:langString ] as well as 
> it is implicit in having any language tag so it happens when 
> sh:langMatches requires the language tag.
>
> For error checking:
>
> This data:
>
>    "abcde"^^rdf:langString
>
> is malformed and not in the value-space of rdf:langString; it is like 
> writing
>
>    "abcde"^^xsd:integer
>
> It does have the datatype - it does not represent a legal value.
>
>
> Another way: make language match "" mean xsd:string. (c.f XML where 
> xml:lang="" means no language tag althouhg with slightly different 
> implications).
>
>   sh:property [
>      sh:or ( [ sh:datatype xsd:string ]
>              [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
>    ] .
>
> vs
>
>    sh:property [
>      sh:predicate skos:prefLabel ;
>      sh:langMatches ("" "en" "fr" "de" ) .
>    ] .

So the change you seem to be advocating is to make sh:languageIn produce 
violations if the value node is not a literal, or a literal that does 
not have any language tag. As you point out, this would lead to 
situations in which the sh:datatype rdfs:langString can be omitted in an 
sh:or. The meaning of sh:datatype would not change, and people can still 
state sh:datatype rdf:langString for the (common) case in which any 
language is permitted. I believe I would be OK with that interpretation.

Here is a SPARQL ASK validator query that is passing the English and 
Francais cases below:


         ASK {
             BIND (lang($value) AS ?valueLang) .
             FILTER (bound(?valueLang) && EXISTS {
                 GRAPH $shapesGraph {
                     $languageIn (rdf:rest*)/rdf:first ?lang .
                     FILTER (langMatches(?valueLang, ?lang))
                 } } )
         }


ex:TestShape
   rdf:type sh:Shape ;
   rdfs:label "Test shape" ;
   sh:languageIn (
       "en"
       "fr"
     ) ;
   sh:targetNode "English"@en ;
   sh:targetNode "Francais"@fr ;
   sh:targetNode rdfs:Resource ;     # Fails
   sh:targetNode "Deutsch"@de ;    # Fails
   sh:targetNode "Plain String" ;      # Fails
.

Holger
Received on Tuesday, 13 September 2016 04:35:26 UTC