Re: Signalling entailment in queries from Andy Seaborne on 2010-07-21 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Wed, 21 Jul 2010 19:27:10 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C473BFE.3050905@talis.com>
On 20/07/2010 10:45 PM, Kendall Clark wrote:
 > I don't think this distinction is that useful (KR, plain SPARQL); but
 > since we have customers&  apps in both spaces, I'll say that this
 > 'signal inference' thing is not a problem in practice. At least, we
 > haven't ever encountered it in 5+ years.
 >
 > If the SD says 'no inference' at 5pm and you turn on inference at 6pm,
 > then update the SD at 5:59pm. Seriously, this would just*never*
 > happen in practice in my experience. Changing how the system works in
 > this way is a configuration management issue&  would be handled as
 > such (i.e., an entry in a changelog for the new version, a blog post
 > on the SaaS blog explaining the new version, etc).
 >
 > Of course, our experience is not universal, but I suggest that this is
 > more of a corner than a core use case and we shouldn't do very much,
 > if anything, at this relatively late date about this issue.
 >
 > Just update the SD before you change the service that it describes.
 > Easy to do, simple to explain, good enough. :>
 >
 > Our two cents and probably worth at least 1 cent at this point. :>
 >
 > Cheers,
 > Kendall

I'm not sure which message to "reply to" but Kendall's points bring 
deployed experience into the picture which I think is important here as 
it calibrates whether there is a problem now or a potential issue later.


I see a difference between the case of whether the dataset is defined by 
the service and when it is defined by the query or protocol (client chosen).

If it is a fixed dataset for the service, the service description can 
describe the graphs.  It's a property of the graph, not of the access to 
the graph via GRAPH nor even an option for the client.  It's an offer 
the client can choose to accept or not.

The offer can be multi-aspect.  Different names can be given to the same 
data with different entailment levels (which is especially useful 
because the name is capturing the data and the process so you can talk 
about "<g1> derived from <g2> by doing process <x>").


Sandro's examples use FROM NAMED, where the client is describing the RDF 
dataset.  It is the FROM and FROM NAMED that matter, not the GRAPH 
access to the data, if we give different names to different views of the 
same underlying data.

If data is described in FROM / FROM NAMED and loaded [*] then service 
description could be applied through sd:feature, which has a domain of 
service, but we haven't defined details.  It does let the client have 
some control but not as much as, I think, is behind Sandro's concerns 
but could say "I apply OWL-DL to anything I see".  (Naming is a 
potential problem - but the default graph isn't named.)

In DAWG, FROM / FROM NAMED was just a description of the dataset, and 
how the data was obtained was not defined.  You could reasonably bind 
<http://here/g1> to data from <http://there/g2> with expansion done by 
parser (Steve's point).  None of SPARQL's business how the dataset gets 
formed - it is a declaration.  This is a bit black/white and the 
situation is more complex - DAWG ducked the issue but it was discussed.

If it's the graph description to be modified, not the access, I would 
expect:

     FROM <http://host/g1> USING RDFS NAMED <tag:mylocalname>

This is closer to the examples on
http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference
e.g.

SELECT ?X
FROM <http://xmlns.com/foaf/spec/index.rdf>
FROM <http://www.polleres.net/foaf.rdf>
USING RULESET RDFS
WHERE { ?X a foaf:Agent. }


The other area I have a problem with is the notion that it's just 
entailment.  Entailment (or rules) is just one process that can be 
applied on loading.  I don't see a clear dividing line with 
client-supplied rules (i.e. inline premises), graph building, or data 
cleaning and mangling stages and ETL.  For example, to query the union 
of graph G1 and G2 but without the data from G3 (yes - someone has asked 
for this recently).  It's process description that's needed; inference 
is one example but just addressing that without seeing a existing 
pain-point on the web worries me.

 Andy

[*] The FROM/FROM NAMED for the description of the dataset and do not 
necessary imply loading from the web.  Some systems (TDB; Glitter, I 
believe) pick them out of a pool of available graphs but let's just 
think of that as a web cache if the names are truly unique and global.
Received on Wednesday, 21 July 2010 18:27:39 UTC