Organizing the requirements (was: First Telecon for RDF Data Shapes WG)

On 10/14/2014 9:20, Eric Prud'hommeaux wrote:
> I propose that folks review the Dublin Core Application Profiles 
> requirements to get some shared terminology. I've done my best to 
> create a hierarchical version faithful to their database. I hope this 
> is a bit easier to review <http://www.w3.org/2014/10/rdfvalreqs/>. The 
> js I used to make expandable lists disables clicking on links so you 
> have to e.g. shift click in order to open in a new tab (sorry!).

Thanks for preparing this tree view, Eric! As it was stated in the 
meeting yesterday, it would be easy to spend many months on this topic 
alone, and eventually get lost in a deluge of details without seeing the 
big picture.

Pragmatically speaking, I believe we should aim at concluding on a key 
question early on: the role of SPARQL versus any alternatives. Judging 
from the discussions in the old mailing list, I believe many people 
agree that SPARQL is the most suitable existing language in terms of 
expressivity. That's because SPARQL is a general RDF pattern-matching 
language and covers the most common operations with its arithmetic and 
string manipulation functions. I don't really see alternatives.

The current requirements catalog already lists dozens of "expressivity" 
items, and we could easily add many more from our practical experience 
(if that's desired, where should we add them?). But again, pragmatically 
speaking, I believe the main thing we need to agree on is to allow the 
constraint language to use any SPARQL expression and we can then move on 
and outsource the general expressivity question to the SPARQL standard 
(and its future versions).

A more interesting question then becomes which expressivity patterns are 
most common ones that can be generalized into a higher-level vocabulary 
that doesn't require users to understand SPARQL (and serves as a more 
formal representation mechanism). I guess the catalog of requirements 
could capture the general relevance of each such pattern to inform the 
future deliverables. Quite possibly, there could be multiple Shapes 
profiles, where the most basic profile includes things like cardinality 
and range restrictions. More complex profiles could include other 
patterns such as primary keys or lexical patterns. These profiles could 
be captured in an ontology, and applications can then declare which 
profile they understand - e.g. by hard-coding against the URIs of the 
cardinality restrictions. Ontologies with constraints outside of such a 
profile could then be rejected by an application. A generic SPARQL-based 
constraint processor would obviously support all profiles and custom 
constraints defined in SPARQL directly.

I second the view expressed by others yesterday that we should try to 
associate the requirements with corresponding SPARQL queries (and OWL 
equivalents if possible) and hope that we can later reuse those formal 
representations for the actual implementation of the standard Shapes. Of 
course, if a pattern already has an equivalent in OSLC Resource Shapes 
or ShEx or OWL Closed World then this should be an annotation to the 
pattern too.

To make this more specific: would it make sense to move from 
"expressivity" items to "shapes" or "patterns" and collect those in a 
semi-formal format? Other specs have this button that allows you to see 
a snippet in Turtle, RDF/XML, JSON-LD etc, and we could do something 
similar. Each such Shape candidate would have a

- name (including what could become the local name of a URI for the pattern)
- arguments (e.g. cardinality shape has min/max and a property as arguments)
- description
- generality/reuse/importance level (could later lead to "profiles")
- example(s) in prose and RDF instances
- SPARQL implementation of the constraint check
- OWL-Closed-World implementation
- ShExC implementation
- Pointers to ShEx, Resource Shapes etc. equivalents

Such a model may also better address Peter's request for measurable and 
comparable evaluation criteria. And the requirements work would actually 
become the first step towards the formal Shapes declarations. In fact we 
could start to encode them as an RDF model (e.g. SPIN templates) and try 
them out in real-time. We could collectively edit a Turtle file and 
generate human-readable documents from that.

Holger

Received on Thursday, 16 October 2014 01:20:49 UTC