- From: Holger Knublauch <holger@topquadrant.com>
- Date: Thu, 15 Sep 2016 10:25:33 +1000
- To: public-data-shapes-wg@w3.org
As resolved today, I have integrated sh:languageIn into the spec:
http://w3c.github.io/data-shapes/shacl/#LanguageInConstraintComponent
Any glitches in there?
Holger
On 13/09/2016 14:34, Holger Knublauch wrote:
>
>
> On 12/09/2016 20:07, Andy Seaborne wrote:
>>
>>
>> On 12/09/16 00:30, Holger Knublauch wrote:
>>> Taking this and Andy's input into consideration, maybe sh:langShape is
>>> an overkill and all we really need is a new parameter such as
>>> sh:languageIn which takes a node and, if it has a language tag,
>>> verifies
>>> that it matches one of the provided languages following the SPARQL
>>> langMatches semantics. For example:
>>>
>>> ex:MyShape
>>> a sh:Shape ;
>>> sh:property [
>>> sh:predicate skos:prefLabel ;
>>> sh:or ( [ sh:datatype xsd:string ] [ sh:datatype rdf:langString
>>> ] ) ;
>>> sh:langMatches ( "en" "fr" "de" ) .
>>
>> A note: this is a slightly different operation to sparql:langMatches
>> which takes a language tag and a language match, not a literal and
>> language match. Some people prefer that local names are not reused
>> to mean slightly different things where possible.
>
> Oops, yes. I intended to use sh:languageIn but forgot to update the
> example. So here it is again:
>
> ex:MyShape
> a sh:Shape ;
> sh:property [
> sh:predicate skos:prefLabel ;
> sh:or ( [ sh:datatype xsd:string ] [ sh:datatype
> rdf:langString ] ) ;
> sh:languageIn ( "en" "fr" "de" ) ;
> ] .
>
>
>>
>>> ] .
>>>
>>> langMatches could be for just a single language, but having a list is
>>> shorter for this (apparently) common case in multi-lingual countries
>>> such as Belgium. I didn't know the RFC supports wildcards - this should
>>> hopefully flexible enough to cover all given use cases, but others may
>>> need to confirm.
>>>
>>> Regards,
>>> Holger
>>>
>>> PS: Andy, I prefer sh:datatype rdf:langString because it would be one
>>> thing less to check (by form builders etc), and furthermore I believe
>>> the semantics of sh:langMatches needs to be that it only does something
>>> if the literal really has a language tag. Otherwise it would be harder
>>> to express mixed cases of either string or langString (which I believe
>>> is quite common).
>>
>> Consider
>>
>> sh:property [
>> sh:predicate skos:prefLabel ;
>> sh:langMatches ( "en" "fr" "de" ) .
>> ] .
>>
>> with data:
>>
>> <uri> skos:prefLabel 123 .
>>
>> which is a violation when sh:langMatches requires the language tag
>> but passes if sh:langMatches only triggers if there is a language tag
>> at all. I find the latter a strange natural interpretation of the
>> shape.
>>
>> String or language match would be:
>>
>> sh:or ( [ sh:datatype xsd:string ]
>> [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
>>
>> There is no need to test for [ sh:datatype rdf:langString ] as well
>> as it is implicit in having any language tag so it happens when
>> sh:langMatches requires the language tag.
>>
>> For error checking:
>>
>> This data:
>>
>> "abcde"^^rdf:langString
>>
>> is malformed and not in the value-space of rdf:langString; it is like
>> writing
>>
>> "abcde"^^xsd:integer
>>
>> It does have the datatype - it does not represent a legal value.
>>
>>
>> Another way: make language match "" mean xsd:string. (c.f XML where
>> xml:lang="" means no language tag althouhg with slightly different
>> implications).
>>
>> sh:property [
>> sh:or ( [ sh:datatype xsd:string ]
>> [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
>> ] .
>>
>> vs
>>
>> sh:property [
>> sh:predicate skos:prefLabel ;
>> sh:langMatches ("" "en" "fr" "de" ) .
>> ] .
>
> So the change you seem to be advocating is to make sh:languageIn
> produce violations if the value node is not a literal, or a literal
> that does not have any language tag. As you point out, this would lead
> to situations in which the sh:datatype rdfs:langString can be omitted
> in an sh:or. The meaning of sh:datatype would not change, and people
> can still state sh:datatype rdf:langString for the (common) case in
> which any language is permitted. I believe I would be OK with that
> interpretation.
>
> Here is a SPARQL ASK validator query that is passing the English and
> Francais cases below:
>
>
> ASK {
> BIND (lang($value) AS ?valueLang) .
> FILTER (bound(?valueLang) && EXISTS {
> GRAPH $shapesGraph {
> $languageIn (rdf:rest*)/rdf:first ?lang .
> FILTER (langMatches(?valueLang, ?lang))
> } } )
> }
>
>
> ex:TestShape
> rdf:type sh:Shape ;
> rdfs:label "Test shape" ;
> sh:languageIn (
> "en"
> "fr"
> ) ;
> sh:targetNode "English"@en ;
> sh:targetNode "Francais"@fr ;
> sh:targetNode rdfs:Resource ; # Fails
> sh:targetNode "Deutsch"@de ; # Fails
> sh:targetNode "Plain String" ; # Fails
> .
>
> Holger
>
Received on Thursday, 15 September 2016 00:26:04 UTC