Re: shapes-ISSUE-30 (shape-and-data-graphs): Are shapes and data in the same graph? [SHACL Spec] from Holger Knublauch on 2015-04-14 (public-data-shapes-wg@w3.org from April 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 14 Apr 2015 15:25:37 +1000
To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <552CA4D1.1080501@topquadrant.com>

On 4/14/2015 0:34, Peter F. Patel-Schneider wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/12/2015 05:01 PM, Holger Knublauch wrote:
>> One of the main selling points of RDF technology has always been the fact
>> that instance and schema are represented uniformly. RDF Schema and OWL
>> class definitions are instances (of metaclasses) themselves. This means
>> that such data can not only be stored and shared together, but also be
>> queried uniformly. In general, SPARQL queries can freely walk between
>> meta-levels.
>>
>> Many other formalisms such as XML and SQL databases have a stricter
>> separation between those levels. If we agree on a similarly strict
>> separation by making it impossible to query the shapes graph from the
>> instances graph (and vice versa), then we may throw away a unique
>> advantage that RDF technology has. I am generally not in favor of
>> selecting the lowest common denominator for all use cases, only because
>> certain cases may not have the best performance.
>>
>> I understand that we need to maintain good performance, including the
>> ability to use native query optimizations on database level where
>> possible. Also there are cases where the shapes model is really totally
>> separate from the database. Yet I believe there are also cases where
>> being able to access the shapes definitions at runtime is beneficial.
>>
>> In this discussion here, I believe we should distinguish between what we
>> use in the SPARQL queries of the specification versus what optimized
>> implementations may do. I believe it should be doable to assume that - in
>> the context of the spec - the shapes graph can be in the same dataset as
>> the actual data. So by default we would have a single dataset and
>> validation gets two parameters:
>>
>> - the URI of the "instances" data graph (default graph) - the URI of the
>> shapes graph
> I would put this exactly the other way around, namely
>
>    I believe that it should be doable to assume that - in the context of the
>    spec - the shapes graph can be completely separate from the actual data. So
>    validation has two parameters:
>   - the "instances" data graph
>   - the shapes graph
>
> I believe that this setup is much cleaner than a design that *needs* to do
> something special when the shapes graph is inaccessible from the data graph.
> In this setup special things *can* but do *not* need to be done when the
> shapes graph is accessible from the data graph.
>
> [...]

First and foremost we need a resolution whether SHACL constraints can 
query other shape definitions consistently. Only after we have this 
option, the question becomes what variant of SPARQL mapping we select 
for the formal specification. Quite possibly the SHACL spec could define 
sh:allowedValues by a string generation algorithm that turns the 
rdf:List of allowed values into a FILTER NOT IN clause. Yet, once we 
have the possibility to query the named graph of shape definitions plus 
the template mechanism, then I'd argue that it will be more elegant and 
more compact to use the mechanism that we already have, as long as we 
make sure that string generation still remains a possibility to 
implementers. (A consequence of that would be to specify that the 
members of sh:allowedValues cannot be blank nodes, because those could 
not be mapped into a proper SPARQL string, but I assume we agree on that 
anyway).

Regards,
Holger

Received on Tuesday, 14 April 2015 05:27:07 UTC