W3C home > Mailing lists > Public > public-data-shapes-wg@w3.org > September 2016

Re: ISSUE-137: Proposal to add sh:langShape

From: Andy Seaborne <andy.seaborne@topquadrant.com>
Date: Mon, 12 Sep 2016 11:07:56 +0100
To: public-data-shapes-wg@w3.org
Message-ID: <da52690f-5896-946a-6b65-327fe6363a8e@topquadrant.com>


On 12/09/16 00:30, Holger Knublauch wrote:
> Taking this and Andy's input into consideration, maybe sh:langShape is
> an overkill and all we really need is a new parameter such as
> sh:languageIn which takes a node and, if it has a language tag, verifies
> that it matches one of the provided languages following the SPARQL
> langMatches semantics. For example:
>
> ex:MyShape
>     a sh:Shape ;
>     sh:property [
>         sh:predicate skos:prefLabel ;
>         sh:or ( [ sh:datatype xsd:string ] [ sh:datatype rdf:langString
> ] ) ;
>         sh:langMatches ( "en" "fr" "de" ) .

A note: this is a slightly different operation to sparql:langMatches 
which takes a language tag and a language match, not a literal and 
language match.  Some people prefer that local names are not reused to 
mean slightly different things where possible.

>     ] .
>
> langMatches could be for just a single language, but having a list is
> shorter for this (apparently) common case in multi-lingual countries
> such as Belgium. I didn't know the RFC supports wildcards - this should
> hopefully flexible enough to cover all given use cases, but others may
> need to confirm.
>
> Regards,
> Holger
>
> PS: Andy, I prefer sh:datatype rdf:langString because it would be one
> thing less to check (by form builders etc), and furthermore I believe
> the semantics of sh:langMatches needs to be that it only does something
> if the literal really has a language tag. Otherwise it would be harder
> to express mixed cases of either string or langString (which I believe
> is quite common).

Consider

      sh:property [
          sh:predicate skos:prefLabel ;
          sh:langMatches ( "en" "fr" "de" ) .
      ] .

with data:

     <uri> skos:prefLabel 123 .

which is a violation when sh:langMatches requires the language tag but 
passes if sh:langMatches only triggers if there is a language tag at 
all.  I find the latter a strange natural interpretation of the shape.

String or language match would be:

   sh:or ( [ sh:datatype xsd:string ]
           [ sh:langMatches ( "en" "fr" "de" ) ] ) ;

There is no need to test for [ sh:datatype rdf:langString ] as well as 
it is implicit in having any language tag so it happens when 
sh:langMatches requires the language tag.

For error checking:

This data:

    "abcde"^^rdf:langString

is malformed and not in the value-space of rdf:langString; it is like 
writing

    "abcde"^^xsd:integer

It does have the datatype - it does not represent a legal value.


Another way: make language match "" mean xsd:string. (c.f XML where 
xml:lang="" means no language tag althouhg with slightly different 
implications).

   sh:property [
      sh:or ( [ sh:datatype xsd:string ]
              [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
    ] .

vs

    sh:property [
      sh:predicate skos:prefLabel ;
      sh:langMatches ("" "en" "fr" "de" ) .
    ] .


	Andy

>
>
> On 9/09/2016 23:02, Dimitris Kontokostas wrote:
>> What Holger proposes is flexible and we have the option to reuse some
>> existing constructs but I  have some concerns about this design
>>
>> the reason is that we currently have focus node constraints and
>> property (path) constraints
>> with this approach we create a new construct only for languages that
>> is not clear what it is and how it operates e.g.
>>  - if there are any differences in the meaning of e.g. sh:in when it
>> is used in a language context and when not
>>  - how sh:langShape inter-operates with the extension mechanism and
>>  - what does it mean to have e.g. sh:class in a sh:langShape (does all
>> constraints apply in all places?)
>>
>> I would prefer the creation of a few new constraint components e.g.
>> sh:languageIn that allows us to enable (if we want) the RFCs Andy
>> suggested.
>>
>> Another option would be to generalize the mechanism Holger suggested
>> and provide transformation functions on the focus nodes / values a
>> shapes selects
>> This way we would be able to e.g. create a sets/lists of language
>> tags, unwrap RDF lists, etc and apply the shacl core components on the
>> transformed values
>> However, I think it is a bit late to try something in this direction
>>
>> Best,
>> Dimitris
>>
>> On Fri, Sep 9, 2016 at 2:58 AM, Holger Knublauch
>> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>>
>>     I was given the task of writing up sh:langShape today. I already
>>     did a few months back:
>>
>>     https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Mar/0262.html
>>     <https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Mar/0262.html>
>>
>>     From the list of requirements at
>>
>>     https://www.w3.org/2014/data-shapes/wiki/Proposals#ISSUE-137_Missing_constraint_for_language_tag
>>     <https://www.w3.org/2014/data-shapes/wiki/Proposals#ISSUE-137_Missing_constraint_for_language_tag>
>>
>>       * In SKOS, there can be only one prefLabel per language tag
>>
>>     Already exists: sh:uniqueLang true
>>
>>       * Constrain the valid language tags to a provided set, e.g.
>>         (@en, @de, @fr)
>>
>>     See my email, sh:langShape [ sh:in ( "en" "de" "fr" ) ]
>>
>>       * Require that all literals have/do not have a language tag
>>
>>     Already exists: sh:datatype rdf:langString
>>
>>       * Require that a particular property have a set of literals, one
>>         each language tag, e.g. "there must be 3 instances of
>>         dct:abstract; the values must be literals; there must be one
>>         literal for each valid language code (@en, @de, @fr)"
>>
>>     Can be expressed through a combination of sh:minCount = 3,
>>     sh:maxCount = 3, sh:uniqueLang. (What are "instances of
>>     dct:abstract"?)
>>
>>       * Check that the language tag is 2-letter | 3-letter | does/does
>>         not have hyphens
>>
>>     sh:langShape [ sh:minLength 2 ; sh:maxLength 2 ; or: sh:pattern
>>     "... regex ..." ]
>>
>>       * Check that the 2 or 3-letter tag is valid
>>
>>
>>     Assuming that the list of valid tags is stored somewhere, e.g. in
>>     an rdf:List iso:ValidLanguages:
>>
>>     sh:langShape [ sh:in iso:ValidLanguages ]
>>
>>     I don't think maintaining such a list ourselves is within the
>>     scope of the WG, yet it could be expressed in the Core language.
>>
>>
>>     PROPOSAL: Add sh:langShape as outlined. Meaning: if a value node
>>     has a language tag then the string of the language tag itself
>>     needs to have the given sh:Shape.
>>
>>
>>     Holger
>>
>>
>>
>>
>> --
>> Dimitris Kontokostas
>> Department of Computer Science, University of Leipzig & DBpedia
>> Association
>> Projects: http://dbpedia.org, http://rdfunit.aksw.org,
>> http://aligned-project.eu
>> Homepage: http://aksw.org/DimitrisKontokostas
>> Research Group: AKSW/KILT http://aksw.org/Groups/KILT
>>
>
Received on Monday, 12 September 2016 10:08:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:30:36 UTC