RE: SKOS concept scheme URIs as values for constraints

So we probably agree that the high level requirement 
> "what they say over there is right therefore if I validate against that, I'm good"
is one often encountered. i.e. there is interest in validation with respect to artefacts published at more than one URL or even domain. More specifically, many data validation scenarios involve checking if a property value is taken from a specified set published separately from the structural model. What this means for SHACL is challenging. 

I had identified the narrow issue that the way that set membership is expressed is highly variable, and not even consistent between sources using the same RDF vocabulary. Phil focusses more on the question of how much caching (pre-loading) is required, and how onerous that is, and on whom would the implied burden fall. Relating to both of these I must agree with Holger that at first glance at least it is difficult to see how a general solution could be devised. Or at the very least it depends on some deployment patterns and best practices which are clearly out of scope for SHACL. But it is a genuine requirement, so I wonder if we could at least walk through how it might look if there were a suitable practice for publishing vocabularies and registers. INSPIRE is doing it, and so is WMO (see http://codes.wmo.int/ - we also have a test deployment using the same technology, see http://registry.it.csiro.au/def ). There are many other vocabulary services around. 

Simon

-----Original Message-----
From: Phil Archer [mailto:phila@w3.org] 
Sent: Monday, 10 August 2015 6:48 PM
To: Irene Polikoff <irene@topquadrant.com>; Cox, Simon (L&W, Highett) <Simon.Cox@csiro.au>; martynas@graphity.org; lehors@us.ibm.com; holger@topquadrant.com
Cc: public-data-shapes-wg@w3.org; public-rdf-shapes@w3.org
Subject: Re: SKOS concept scheme URIs as values for constraints

Thanks for more discussion on this.

It's true that concept schemes, and RDF in general, are produced inconsistently. The concept scheme at http://inspire.ec.europa.eu/codelist/AdministrativeHierarchyLevel/, for example, does include skos:inScheme links but there's no guarantee that such properties will be included. And, for the use case at hand, it wouldn't matter either way.

That is, even if the European Commission hadn't included skos:inScheme properties, actually no matter how crappy the RDF was, I'd still want to be able to say "if it's in that list it's valid, if it ain't, it ain't" 
because it's an authoritative list of allowed values published by the organisation that makes those decisions.

Holger is, of course, perfectly correct, in saying the way to do this is to download the relevant triples, add them to the store and go from there, or use the enumerated values. Both of those work, yes. The issue though is one of workflow, not semantics.

It means that if I want to validate data that refers to the INSPIRE Registry (or the topic lists that, as Karen points out, are important in LODLAM etc.) then I have to do that downloading and ingesting. It also means that the publishers of those authoritative lists need to be encouraged to make their concept schemes available as a bulk download. 
That's easier said than done in some circumstances.

The INSPIRE Registry's underlying data is in XML. The SKOS Concept scheme is auto-generated from that (as is the Atom, CSV etc.) so making a bulk download is another task. The burden falls on the person wanting to use the Registry to do the regular checking that the reference URIs are still correct. That's what, if I could wave a magic wand, I'd want to avoid.

Each of the URIs of the concepts in the INSPIRE Registry does dereference. And you can get the SKOS concept scheme as an RDF/XML file, but you can't get *all* the concepts schemes in the registry from one place. Is that a deficient implementation of Linked Data by the EC? 
Isn't that using the distributed Web?

Bulk downloads are provided so that people can download data and use it internally. It's what happens, it's a fact of life and we specifically recommend it in the Data on the Web Best Practices [1], but I'd hate to discourage the people behind the INSPIRE Registry (who Simon and I know
well) - it's a whole lot better than many, many other services. What those people need is confirmation that what they have done already is useful and worth the effort, not demands for more work.

The provision of that Registry has greatly simplified the RDF modelling of spatial and environmental data [2] in one of my projects - from which I really must get some data that I can post/point to as Karen asked.

I fully understand that:
1. this sounds like an unreasonable expectation for the WG; 2. it is semantically suspect; 3. the aim can already be achieved by other means.

But I think that the closer we are to being able to effectively say "what they say over there is right therefore if I validate against that, I'm good" the more easily we cover some of these cases.

Cheers

Phil

[1] http://www.w3.org/TR/dwbp/#BulkAccess
[2] http://www.w3.org/2015/03/inspire/




On 10/08/2015 03:10, Irene Polikoff wrote:
> Yes, this was my thought as well and the use case described is not 
> looking for a definition, it is looking for inclusion as in "is this 
> concept in this scheme". If so, then it is valid.
>
> skos:inScheme "Relates a resource (for example a concept) to a concept 
> scheme in which it is included." and "A concept may be a member of 
> more than one concept scheme."
>
> It is not clear though that the URI of an instance of 
> skos:ConceptScheme would resolve to return a graph with all the 
> resources contained in the scheme as opposed to a graph containing all 
> triples that have the identified ConceptScheme URI as their subject.
>
> Further, many ontologies don't use either rdfs:isDefinedBy or skos:inScheme.
>
> Irene Polikoff, CEO
> TopQuadrant, Inc. www.topquadrant.com <http://www.topquadrant.com/> 
> Technology providers making enterprise information meaningful Blogs -- 
> http://www.topquadrant.com/the-semantic-ecosystems-journal/,
> http://www.topquadrant.com/composing-the-semantic-web/
> LinkedIn -- https://www.linkedin.com/company/topquadrant
> Twitter - https://twitter.com/topquadrant
>
>
> From:  <Simon.Cox@csiro.au>
> Date:  Sunday, August 9, 2015 at 9:43 PM
> To:  <martynas@graphity.org>, <lehors@us.ibm.com>, 
> <holger@topquadrant.com>, <phila@w3.org>
> Cc:  <public-data-shapes-wg@w3.org>, <public-rdf-shapes@w3.org>
> Subject:  RE: SKOS concept scheme URIs as values for constraints
> Resent-From:  <public-data-shapes-wg@w3.org>
> Resent-Date:  Mon, 10 Aug 2015 01:44:13 +0000
>
> rdfs:isDefinedBy has inconsistently interpreted semantics. Some like 
> to use it to link to an OWL ontology, some to another kind of 
> document. My observation is that the community is split.
>
> In the case of skos:Concept the native predicate would skos:inScheme.
>
> Simon
>
> From: Martynas Jusevičius [mailto:martynas@graphity.org]
> Sent: Saturday, 8 August 2015 2:55 AM
> To: Arnaud Le Hors <lehors@us.ibm.com>; Holger Knublauch 
> <holger@topquadrant.com>; Phil Archer <phila@w3.org>
> Cc: public-data-shapes-wg@w3.org; public-rdf-shapes@w3.org
> Subject: Re: SKOS concept scheme URIs as values for constraints
>
>
> Phil,
>
> why are you basing your design on the namespace URI? I think a more 
> semantic way would be to allow all values of ?concept, where ?concept 
> rdfs:isDefinedBy ?ontology, and ?ontology is the vocabulary you want to use.
>
>
> Martynas
> graphityhq.com <http://graphityhq.com>
>
>
> On Fri 7 Aug 2015 at 18:48 Phil Archer <phila@w3.org> wrote:
>> Thanks for the replies everyone.
>>
>> Hmm... templates, special code, DIY... Meh. In short, the use case is 
>> not covered out of the box.
>>
>> To be useful, I'd expect the validator to go and fetch the SKOS 
>> concept scheme and check that the value of a property is valid. So I 
>> guess the questions would be:
>>
>> 1. Does the URI given as the value of a property dereference?
>> 2. Does the type of that resource match what I expect (is it typed as 
>> a SKOS Concept in this case).
>>
>> Of course, that's a heavy burden, I well understand that, and the 
>> burden may be more than is needed in many cases, and too much in 
>> others, but authoritative lists of allowed values are not uncommon.
>>
>> If this is out of scope for the work, OK, that's my answer. If the 
>> answer is "you can bolt something on the side that does it" then, 
>> well, I'd likely not bother with the bolt and just do it myself 
>> anyway - which kind of defeats the object.
>>
>> Karen's Use Case 37 does indeed seem very similar and, yes, SHACL has 
>> regEx matching, enumerated lists and so on, so a lot of what I'm 
>> asking can be done - and that may be sufficient (or that may have to 
>> be sufficient), but without fetching the authoritative list of 
>> allowed values from an external source, the issue of synchronising 
>> will always come up.
>>
>> I should indeed have some test data imminently, if it's wanted.
>>
>> Thanks
>>
>> Phil.
>>
>> PS. I'm very likely to join the f2f in Lille next month as I'll be 
>> passing through on my way home from Brussels. Looking forward to 
>> catching up with the wider work of the group.
>>
>> On 05/08/2015 01:01, Holger Knublauch wrote:
>>>> This is correct and thanks for highlighting this. I wanted to be 
>>>> brief and could elaborate or even implement the template as an 
>>>> example. I was hoping that my statement "using a template" would 
>>>> have been sufficiently clear, but maybe it wasn't. Yes, there needs 
>>>> to be at least one person on the planet, knowledgeable of SPARQL 
>>>> and SHACL, who needed this feature to cast it into a template and publish it for everyone else to use.
>>>>
>>>> (BTW I later noticed that the original requirement may have been 
>>>> about checking for the presence of URIs in a certain named graph. 
>>>> In that case, the SPARQL GRAPH keyword could be used, assuming the 
>>>> named graphs are present in the same dataset, or SERVICE for 
>>>> external graphs. There are all kinds of variations here, which is 
>>>> why my inclination is to leave this as an opportunity for 
>>>> third-party templates, not the core
>>>> language.)
>>>>
>>>> Regards,
>>>> Holger
>>>>
>>>>
>>>> On 8/5/2015 9:29, Arnaud Le Hors wrote:
>>>>>> Holger,
>>>>>>
>>>>>> I think we ought to clarify that what you present here isn't all 
>>>>>> it takes because it relies on having shx:allowedValueNamespaces 
>>>>>> defined somewhere, presumably using the SPARQL extension.
>>>>>>
>>>>>> I know you wrote "an end-user syntax" and the implication is that 
>>>>>> some advanced-user has defined such a template for the end-user 
>>>>>> but we need to be careful not to set the wrong expectation.
>>>>>>
>>>>>> Regards.
>>>>>> --
>>>>>> Arnaud  Le Hors - Senior Technical Staff Member, Open Web 
>>>>>> Technologies
>>>>>> - IBM Software Group
>>>>>>
>>>>>>
>>>>>> Holger Knublauch <holger@topquadrant.com> wrote on 08/03/2015 
>>>>>> 03:29:13
>>>>>> PM:
>>>>>>
>>>>>>>> From: Holger Knublauch <holger@topquadrant.com>
>>>>>>>> To: public-data-shapes-wg@w3.org, "public-rdf-shapes@w3.org"
>>>>>>>> <public-rdf-shapes@w3.org>
>>>>>>>> Date: 08/03/2015 03:30 PM
>>>>>>>> Subject: Re: SKOS concept scheme URIs as values for constraints
>>>>>>>>
>>>>>>>> This could be represented in SHACL using a template, with an 
>>>>>>>> end-user syntax such as
>>>>>>>>
>>>>>>>> ex:MyShape
>>>>>>>>       a sh:Shape ;
>>>>>>>>       sh:property [
>>>>>>>>           a shx:AllowedValueNamespacesConstraint ;
>>>>>>>>           sh:predicate ps:siteDesignation ;
>>>>>>>>           shx:allowedValueNamespaces ( 
>>>>>>>> "http://inspire.ec.europa.eu/codelist/DesignationValue/" ) ;
>>>>>>>>           sh:valueClass skos:Concept ;
>>>>>>>>       ] .
>>>>>>>>
>>>>>>>> In the above scenario I am assuming that the algorithm will 
>>>>>>>> check that all values of the given property must be URIs 
>>>>>>>> starting with one of the enumerated strings (using STRSTARTS in 
>>>>>>>> SPARQL). It would not go to the web to check whether there is 
>>>>>>>> actually a Graph at that namespace -
> this
>>>>>>>> would be outside of what SPARQL can do right now.
>>>>>>>>
>>>>>>>> I cannot comment on whether this particular pattern should 
>>>>>>>> become part of the Core vocabulary too, but the whole point of 
>>>>>>>> the extension mechanism is to allow anyone to represent and 
>>>>>>>> publish their own
>>>>>> favorite
>>>>>>>> constraint design patterns, so that they don't rely on the 
>>>>>>>> choices
> made
>>>>>>>> by a particular working group in the year 2015.
>>>>>>>>
>>>>>>>> Holger
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/4/2015 5:39, Karen Coyle wrote:
>>>>>>>>>> Phil,
>>>>>>>>>>
>>>>>>>>>> Thanks for bringing this up. I thought that I had covered 
>>>>>>>>>> this in
> use
>>>>>>>>>> case #34 [1], and at one point I asked if all of these 
>>>>>>>>>> criteria
> were
>>>>>>>>>> met by the requirements and I was assured that they were. 
>>>>>>>>>> This is a key use case for the cultural heritage community, 
>>>>>>>>>> so if there are
> any
>>>>>>>>>> doubts that these requirements can be met we need to address this.
>>>>>>>>>> Perhaps the was to resolve this is to provide test cases. 
>>>>>>>>>> There
> seem
>>>>>>>>>> to be some functional versions of SHACL that could be used to 
>>>>>>>>>> test this, if I'm not mistaken. Would you be able to provide 
>>>>>>>>>> some test
>>>>>> data?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> kc
>>>>>>>>>> [1]
>>>>>>>>>> http://w3c.github.io/data-shapes/data-shapes-ucr/#uc37-defini
>>>>>>>>>> ng-
>>>>>>>> allowed-required-values
>>>>>>>>>>
>>>>>>>>>> On 8/3/15 9:48 AM, Phil Archer wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I've had an opportunity to take a look at the SHACL work 
>>>>>>>>>>>> today
> and I
>>>>>>>>>>>> notice one of the use cases looks set to be missed - 
>>>>>>>>>>>> although
>>>>>> only just.
>>>>>>>>>>>>
>>>>>>>>>>>> The UCR doc includes the one about self-describing Linked 
>>>>>>>>>>>> Data
>>>>>> [1] which
>>>>>>>>>>>> talks about the value of a property being a skos:Concept. 
>>>>>>>>>>>> Are you considering making this a little tougher, i.e. that 
>>>>>>>>>>>> the value of
>>>>>> a given
>>>>>>>>>>>> property is a concept defined in a specific scheme?
>>>>>>>>>>>>
>>>>>>>>>>>> I see that SHACL allows the enumeration of values [2], but 
>>>>>>>>>>>> I want
>>>>>> to be
>>>>>>>>>>>> able to say "any value from the SKOS Concept scheme at 
>>>>>>>>>>>> <foo>". It
>>>>>> looks
>>>>>>>>>>>> like SHACL won't support that?
>>>>>>>>>>>>
>>>>>>>>>>>> Use Case: INSPIRE
>>>>>>>>>>>>
>>>>>>>>>>>> INSPIRE [0] - the European Union's obligatory set of 
>>>>>>>>>>>> standards
> for
>>>>>>>>>>>> environmental and geospatial data - has a handy registry of 
>>>>>>>>>>>> SKOS
>>>>>> concept
>>>>>>>>>>>> schemes [3]. In one of my projects, I've been working on 
>>>>>>>>>>>> creating
>>>>>> RDF
>>>>>>>>>>>> vocabularies that are compatible with the INSPIRE data 
>>>>>>>>>>>> model,
>>>>>> such as
>>>>>>>>>>>> the one about protected sites [4]. That has a property 
>>>>>>>>>>>> ps:siteDesignation for which the range is defined as 
>>>>>>>>>>>> skos:Concept
>>>>>> but
>>>>>>>>>>>> really what it should say is:
>>>>>>>>>>>>
>>>>>>>>>>>> the value of this property should be a skos:Concept in the 
>>>>>>>>>>>> scheme
> at
>>>>>>>>>>>> http://inspire.ec.europa.eu/codelist/DesignationValue/.
>>>>>>>>>>>>
>>>>>>>>>>>> It would be inappropriate to enumerate the concepts in that
> concept
>>>>>>>>>>>> scheme (there are 6 of them) since it is under a different 
>>>>>>>>>>>> organisation's change control.
>>>>>>>>>>>>
>>>>>>>>>>>> I recognise that this leads to the possibility that a graph 
>>>>>>>>>>>> that
> is
>>>>>>>>>>>> valid today may become invalid if the INSPIRE Registry were 
>>>>>>>>>>>> to be amended but that's a management task for the European 
>>>>>>>>>>>> Commission
> to
>>>>>>>>>>>> worry about (i.e. the people responsible for the INSPIRE 
>>>>>>>>>>>> data
>>>>>> model) and
>>>>>>>>>>>> they would need to be mindful of such situations which 
>>>>>>>>>>>> would
> occur
>>>>>>>>>>>> whether we were talking about RDF graphs or dollops of GML, 
>>>>>>>>>>>> so I
>>>>>> don't
>>>>>>>>>>>> think that's a show stopper here.
>>>>>>>>>>>>
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>
>>>>>>>>>>>> Phil.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [0] http://inspire.ec.europa.eu/
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> http://w3c.github.io/data-shapes/data-shapes-ucr/#uc28-self
>>>>>>>>>>>> -
>>>>>>>> describing-linked-data-resources
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [2]
>>>>>>>>>>>> http://w3c.github.io/data-shapes/shacl/
>>>>>>>> #AbstractAllowedValuesPropertyConstraint
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [3] http://inspire.ec.europa.eu/registry/
>>>>>>>>>>>>
>>>>>>>>>>>> [4] http://www.w3.org/2015/03/inspire/ps
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>>
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Monday, 10 August 2015 11:14:37 UTC