Re: ISSUE-137: Proposal to add sh:langShape from Arnaud Le Hors on 2016-09-12 (public-data-shapes-wg@w3.org from September 2016)

From: Arnaud Le Hors <lehors@us.ibm.com>
Date: Mon, 12 Sep 2016 10:55:48 +0200
To: Holger Knublauch <holger@topquadrant.com>
Cc: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-Id: <OF0F7960CB.70CF5F29-ONC125802C.00308CAA-C125802C.00310D77@notes.na.collabserv.c>
Having just seen another WG struggles getting to CR because of I18N 
related issues I strongly suggest we get whatever solution we come up with 
reviewed by the I18N WG sooner rather than later.
--
Arnaud  Le Hors - Senior Technical Staff Member, Open Web Technologies - 
IBM Cloud


Holger Knublauch <holger@topquadrant.com> wrote on 09/12/2016 01:30:02 AM:

> From: Holger Knublauch <holger@topquadrant.com>
> To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
> Date: 09/12/2016 01:31 AM
> Subject: Re: ISSUE-137: Proposal to add sh:langShape
> 
> Taking this and Andy's input into consideration, maybe sh:langShape 
> is an overkill and all we really need is a new parameter such as 
> sh:languageIn which takes a node and, if it has a language tag, 
> verifies that it matches one of the provided languages following the
> SPARQL langMatches semantics. For example:
> 
> ex:MyShape
>     a sh:Shape ;
>     sh:property [
>         sh:predicate skos:prefLabel ;
>         sh:or ( [ sh:datatype xsd:string ] [ sh:datatype rdf:langString 
] ) ;
>         sh:langMatches ( "en" "fr" "de" ) .
>     ] .
> 
> langMatches could be for just a single language, but having a list 
> is shorter for this (apparently) common case in multi-lingual 
> countries such as Belgium. I didn't know the RFC supports wildcards 
> - this should hopefully flexible enough to cover all given use 
> cases, but others may need to confirm.
> 
> Regards,
> Holger
> 
> PS: Andy, I prefer sh:datatype rdf:langString because it would be 
> one thing less to check (by form builders etc), and furthermore I 
> believe the semantics of sh:langMatches needs to be that it only 
> does something if the literal really has a language tag. Otherwise 
> it would be harder to express mixed cases of either string or 
> langString (which I believe is quite common).
> 

> On 9/09/2016 23:02, Dimitris Kontokostas wrote:
> What Holger proposes is flexible and we have the option to reuse 
> some existing constructs but I  have some concerns about this design 
> 
> the reason is that we currently have focus node constraints and 
> property (path) constraints 
> with this approach we create a new construct only for languages that
> is not clear what it is and how it operates e.g. 
>  - if there are any differences in the meaning of e.g. sh:in when it
> is used in a language context and when not
>  - how sh:langShape inter-operates with the extension mechanism and 
>  - what does it mean to have e.g. sh:class in a sh:langShape (does 
> all constraints apply in all places?)
> 
> I would prefer the creation of a few new constraint components e.g. 
> sh:languageIn that allows us to enable (if we want) the RFCs Andy 
suggested.
> 
> Another option would be to generalize the mechanism Holger suggested
> and provide transformation functions on the focus nodes / values a 
> shapes selects 
> This way we would be able to e.g. create a sets/lists of language 
> tags, unwrap RDF lists, etc and apply the shacl core components on 
> the transformed values
> However, I think it is a bit late to try something in this direction
> 
> Best,
> Dimitris
> 
> On Fri, Sep 9, 2016 at 2:58 AM, Holger Knublauch <holger@topquadrant.com
> > wrote:
> I was given the task of writing up sh:langShape today. I already did
> a few months back:
> 
> 
https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Mar/0262.html

> 
> From the list of requirements at
> 
> https://www.w3.org/2014/data-shapes/wiki/
> Proposals#ISSUE-137_Missing_constraint_for_language_tag

> In SKOS, there can be only one prefLabel per language tag
> Already exists: sh:uniqueLang true
> Constrain the valid language tags to a provided set, e.g. (@en, @de, 
@fr)
> See my email, sh:langShape [ sh:in ( "en" "de" "fr" ) ]
> Require that all literals have/do not have a language tag
> Already exists: sh:datatype rdf:langString
> Require that a particular property have a set of literals, one each 
> language tag, e.g. "there must be 3 instances of dct:abstract; the 
> values must be literals; there must be one literal for each valid 
> language code (@en, @de, @fr)"
> Can be expressed through a combination of sh:minCount = 3, 
> sh:maxCount = 3, sh:uniqueLang. (What are "instances of dct:abstract"?)
> Check that the language tag is 2-letter | 3-letter | does/does not 
> have hyphens
> sh:langShape [ sh:minLength 2 ; sh:maxLength 2 ; or: sh:pattern "...
> regex ..." ]
> Check that the 2 or 3-letter tag is valid
> 
> Assuming that the list of valid tags is stored somewhere, e.g. in an
> rdf:List iso:ValidLanguages:
> 
> sh:langShape [ sh:in iso:ValidLanguages ]
> 
> I don't think maintaining such a list ourselves is within the scope 
> of the WG, yet it could be expressed in the Core language.
> 
> 
> PROPOSAL: Add sh:langShape as outlined. Meaning: if a value node has
> a language tag then the string of the language tag itself needs to 
> have the given sh:Shape.
> 
> 
> Holger

> 

> 
> -- 
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig & DBpedia 
Association
> Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://
> aligned-project.eu
> Homepage: http://aksw.org/DimitrisKontokostas
> Research Group: AKSW/KILT http://aksw.org/Groups/KILT
Received on Monday, 12 September 2016 08:56:24 UTC