- From: Tim Berners-Lee <timbl@w3.org>
- Date: Mon, 22 Nov 2004 12:15:39 -0500
- To: public-rdf-dawg-comments@w3.org
Reading the draft of 2004-10-13 The current specification of SOURCE assumes a particular sort of application, which will not necessarily be more common than any other. As a result, SPARQL as a query language lacks the flexibility to do the general job of giving or querying metadata about the source of information. SOURCE and FROM are muddled, and bite off part of a general question without solving it in general. . Behind the SOURCE feature is the implicit notion that the database being queried is a conjunction of graphs each corresponding to web resources. The concept of the graph itself is not surfaced, but the URI of the graph is the thing bound to. Meanwhile, servers have the option of ignoring that structure and ignoring the binding of the SOURCE variable. This seems to me fuzzy. In fact, the database being queried may be generated in many ways, in particular a triple may have arisen from a combination of triples in different databases. Random example 0: foo.rdf: mary foaf:phone 1234. bar.rdf: mary owl:sameAs maryJ query includes SOURCE ?s { maryJ foaf:phone ?y }. The natural result is to bind s to a bnode expressing the virtual graph which was formed <foo.rdf> log:semantics ?f. <bar.rdf> log:semantics ?g. ( ?f ?g ) log:conjunction ?h. ?h owldl:closure ?s. There are a lot of combinations possible here of course, and many complex things which will happen in the future. That sort of graph could be returned in the query. It could also be sent with the query to describe what has to be done. If you like, it is a clear RDF expressionof the sort of thing which will otherwise get relegated to more and more complex non-RDF syntax or server command line out of band forms. There is an assumption, in the SOURCE feature, that when multiple graphs exist, then they are all believed. This is IMHO a major and quite unnecessary flaw. Many systems will need to be distrustful of most data. So I'd like to be able to use the SOURCE feature, which overlaps with the FROM feature, so that *either* one is talking about explicitly mentioned resources as the source to be queried, *OR* there is a default knowledge base for the service. When both are used, then the default KB can be a meta-kb which allows the kbs being processed to be constrained and defined. The feature of returning NULL but continuing should be dropped. The whole idea of having things continuing when data when a requested feature wasn't implemented I think is asking for interoperability problems. One way to clean it up is to make a SOURCE variable must be bound elsewhere. this would mean that the set of resources which are queried becomes explicit. Otherwise we have added two implicit things to the SPARQL service -- the implicit set of sources and the impliciit kb. Random Example 1: SELECT ?x ,... WHERE ?y roogle:search "Mary". SOURCE ?y { ?x firstName "Mary" ... So the default KB is defined for this server to know about roogle:search which relates documents which contain strings to those strings. Random Example 2: SELECT ?x, ... WHERE ?x rdf:type QualifiedIndividual. ?x address:countrycode "fr". ... ?x foaf:personalProfile ?p. SOURCE ?p { ?x diet:preference ?z } ... Here the main database if trusted. The mass of FOAF out there isn't. Just for one item, the query tests the person's personal profile to see what they declare themselves as a vegetarian. The bulk of the query is on a trusted database, and by default only that database is trusted. This is an application where the idea that all known graphs are trusted by default breaks. Conclusion: The current specification of SOURCE assumes a particular sort of application, which will not necessarily be more common than any other. As a result, SPARQL as a query language lacks the flexibility to do the general job of giving or querying metadata about the source of information. A better solution is to used RDF graphs for the metadata in query and/or in the returned information.
Received on Monday, 22 November 2004 17:15:43 UTC