- From: Simon Raboczi <raboczi@tucanatech.com>
- Date: Tue, 24 Aug 2004 23:29:36 +1000
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Incorporating Dan's suggestions brings us to the following state of play: [[ 4.5 Querying multiple sources It should be possible for a query to specify which of the available RDF graphs it is to be executed against. If more than one RDF graph is specified, the result is as it the query had been executed against the merge[1] of the specified RDF graphs. Some services only offer to query one graph; they are considered to trivially satisfy this objective. While a variety of use cases motivate this feature, it is not a requirement because it is not clear whether this feature can be implemented in a generally scalable fashion. ]] Much as Requirement 3.1 "RDF Graph Pattern Matching -- Conjunction" and 3.13 "RDF Graph Pattern Matching -- Disjunction" each introduce a single operator (conjunction and disjunction respectively) into the WHERE clause, this proposal would introduce a set union/graph merge operator into the SOURCE/FROM clause. (The current BRQL grammar[2] in fact already covers this -- the SOURCE/FROM clause can take a list of documents to be merged.) The argument I'm about to make in favor of multiple sources is that it's going to make the query model simpler rather than more complicated. This is because 4.5 has the power to satisfy several other requirements and objectives simultaneously. The simplifying principle is that we should never need to deal with anything that isn't graph. When we do this, we have to add new grammar and query modeling to deal with these non-graph entities. Rather, make everything the query language needs to deal with into a graph that the WHERE clause can deal with. These are some of the other requirements and objectives we could satisfy purely by defining graphs and querying merges of these graphs with the base facts, rather than by adding grammar: * 3.3 "Extensible Value Testing" A monadic domain-specific function can be represented as a property taking its argument as the subject and returning its result as the object. Graph patterns can then be used to evaluate the function or its inverse. For example, the graph pattern { ?angle trig:cosine "0.5"^^xsd:double } could bind ?angle to "60"^^trig:degrees and "300"^^trig:degrees. Conceptually a trigonometry library is just a graph containing an infinite number of triples (including { "60"^^trig:degrees trig:cosine "0.5"^^xsd:double } and { "300"^^trig:degrees trig:cosine "0.5"^^xsd:double }). In practice, constraints resolved against the "infinite" graph produce finite variable bindings by algorithmic means rather than by consulting a store. Note that absolutely no special case grammatical support is required -- extensibility is just a matter of the graph that represents the extended function being made available to the query service. The query processor knows which extensions are required by a query because the graph which implements the extension appears explicitly in the SOURCE/FROM clause. One thing we do have to deal with once we introduce graphs of infinite size is safety -- the possibility that a query might not be constrained to a finite number of variable bindings. For example, the constraint { ?angle trig:cosine ?cos } is unsafe and unable to be converted into a finite set of variable bindings. What will normally happen during query resolution is that some of the variables in the unsafe constraint will become bound by others constraints, reducing the unsafe constraint to a safe form. If this doesn't occur, I think it'd be quite acceptable for the query processor to simply tell the user that the query is underconstrained. Dyadic and higher functions are admittedly less pleasant to deal with, although there are solutions (currying[4], or constructing topic map -style association within the query spring to mind as possibilities). * 3.7 "Limited Datatype Support" Datatype support can be almost entirely considered as a kind of extensible value testing. Datatypes require the following functions to be defined[3]: - the membership of its lexical space - the membership of its value space - the lexical-to-value mapping - domain-specific functions (e.g. signum, length) So our limited support for XSD could notionally be a graph asserting an infinite number of triples, including the following: xsd:double x:lexicalMember "3.14" # lexical space xsd:double x:valueMember "3.14"^^xsd:double # value space "3.14"^^xsd:double x:lexicalForm "3.14" # L2V mapping "3"^^xsd:integer x:lessThan "8"^^xsd:integer # domain-specific "3.14"^^xsd:double x:signum "1"^^xsd:integer # domain-specific The separate AND clause with its own grammar in BRQL has always bugged me. Datatyping constraints make perfect sense as first-class citizens in the WHERE clause -- the predicate ought to be enough to distinguish whether a constraint needs to be resolved from the triple store or the datatype processor. Note that to make this work, the graph has to permit literals as subjects. (Can someone explain to me why normal RDF graphs don't permit this? I've never seen an explanation of this restriction.) * 4.8 "Literal Search" Like datatype support, literal search can just be a specific instance of extensible value testing. Provide a graph that defines the substring predicate on plain literals: "cat" x:substring "c" "cat" x:substring "a" "cat" x:substring "t" "cat" x:substring "ca" "cat" x:substring "at" "cat" x:substring "cat" ... etc ... It would seem most convenient to include these triples as part of the same graph that provides the limited XSD support, forming something similar to the "standard library" in a programming language. [1] http://www.w3.org/TR/rdf-mt/#graphdefs [2] http://www.w3.org/2001/sw/DataAccess/rq23/ [3] http://www.w3.org/TR/rdf-mt/#dtype_interp [4] http://computing-dictionary.thefreedictionary.com/Curried%20function
Received on Tuesday, 24 August 2004 13:30:32 UTC