- From: Tim Berners-Lee <timbl@w3.org>
- Date: Mon, 22 Nov 2004 12:15:39 -0500
- To: public-rdf-dawg-comments@w3.org
Reading the draft of 2004-10-13
The current specification of SOURCE assumes a particular sort of
application, which will not necessarily be more common than any other.
As a result, SPARQL as a query language lacks the flexibility to do the
general job of giving or querying metadata about the source of
information.
SOURCE and FROM are muddled, and bite off part of a general question
without solving it in general.
. Behind the SOURCE feature is the implicit notion that the database
being queried is a conjunction of graphs each corresponding to web
resources. The concept of the graph itself is not surfaced, but the
URI of the graph is the thing bound to. Meanwhile, servers have the
option of ignoring that structure and ignoring the binding of the
SOURCE variable. This seems to me fuzzy.
In fact, the database being queried may be generated in many ways, in
particular a triple may have arisen from a combination of triples in
different databases.
Random example 0:
foo.rdf: mary foaf:phone 1234.
bar.rdf: mary owl:sameAs maryJ
query includes
SOURCE ?s { maryJ foaf:phone ?y }.
The natural result is to bind s to a bnode expressing the virtual
graph which was formed
<foo.rdf> log:semantics ?f.
<bar.rdf> log:semantics ?g.
( ?f ?g ) log:conjunction ?h.
?h owldl:closure ?s.
There are a lot of combinations possible here of course, and many
complex things which will happen in the future.
That sort of graph could be returned in the query. It could also be
sent with the query to describe what has to be done. If you like, it
is a clear RDF expressionof the sort of thing which will otherwise get
relegated to more and more complex non-RDF syntax or server command
line out of band forms.
There is an assumption, in the SOURCE feature, that when multiple
graphs exist, then they are all believed. This is IMHO a major and
quite unnecessary flaw. Many systems will need to be distrustful of
most data. So I'd like to be able to use the SOURCE feature, which
overlaps with the FROM feature, so that *either* one is talking about
explicitly mentioned resources as the source to be queried, *OR* there
is a default knowledge base for the service.
When both are used, then the default KB can be a meta-kb which allows
the kbs being processed to be constrained and defined.
The feature of returning NULL but continuing should be dropped. The
whole idea of having things continuing when data when a requested
feature wasn't implemented I think is asking for interoperability
problems.
One way to clean it up is to make a SOURCE variable must be bound
elsewhere. this would mean that the set of resources which are queried
becomes explicit.
Otherwise we have added two implicit things to the SPARQL service --
the implicit set of sources and the impliciit kb.
Random Example 1:
SELECT ?x ,...
WHERE
?y roogle:search "Mary".
SOURCE ?y { ?x firstName "Mary" ...
So the default KB is defined for this server to know about
roogle:search which relates documents which contain strings to those
strings.
Random Example 2:
SELECT ?x, ...
WHERE
?x rdf:type QualifiedIndividual.
?x address:countrycode "fr".
...
?x foaf:personalProfile ?p.
SOURCE ?p { ?x diet:preference ?z }
...
Here the main database if trusted. The mass of FOAF out there isn't.
Just for one item, the query tests the person's personal profile to see
what they declare themselves as a vegetarian. The bulk of the query is
on a trusted database, and by default only that database is trusted.
This is an application where the idea that all known graphs are trusted
by default breaks.
Conclusion:
The current specification of SOURCE assumes a particular sort of
application, which will not necessarily be more common than any other.
As a result, SPARQL as a query language lacks the flexibility to do the
general job of giving or querying metadata about the source of
information.
A better solution is to used RDF graphs for the metadata in query
and/or in the returned information.
Received on Monday, 22 November 2004 17:15:43 UTC