ACTION: discuss & promote union query (Was: ACTION: a replacement for 4.5 focussed on union query)

Incorporating Dan's suggestions brings us to the following state of 
play:

[[
4.5 Querying multiple sources

It should be possible for a query to specify which of the available RDF 
graphs it is to be executed against.  If more than one RDF graph is 
specified, the result is as it the query had been executed against the 
merge[1] of the specified RDF graphs.

Some services only offer to query one graph; they are considered to 
trivially satisfy this objective.

While a variety of use cases motivate this feature, it is not a 
requirement because it is not clear whether this feature can be 
implemented in a generally scalable fashion.
]]

Much as Requirement 3.1 "RDF Graph Pattern Matching -- Conjunction" and 
3.13 "RDF Graph Pattern Matching -- Disjunction" each introduce a 
single operator (conjunction and disjunction respectively) into the 
WHERE clause, this proposal would introduce a set union/graph merge 
operator into the SOURCE/FROM clause.  (The current BRQL grammar[2] in 
fact already covers this -- the SOURCE/FROM clause can take a list of 
documents to be merged.)

The argument I'm about to make in favor of multiple sources is that 
it's going to make the query model simpler rather than more 
complicated.  This is because 4.5 has the power to satisfy several 
other requirements and objectives simultaneously.  The simplifying 
principle is that we should never need to deal with anything that isn't 
graph.  When we do this, we have to add new grammar and query modeling 
to deal with these non-graph entities.   Rather, make everything the 
query language needs to deal with into a graph that the WHERE clause 
can deal with.

These are some of the other requirements and objectives we could 
satisfy purely by defining graphs and querying merges of these graphs 
with the base facts, rather than by adding grammar:


* 3.3 "Extensible Value Testing"

   A monadic domain-specific function can be represented as a property 
taking its argument as the subject and returning its result as the 
object.  Graph patterns can then be used to evaluate the function or 
its inverse.  For example, the graph pattern { ?angle trig:cosine 
"0.5"^^xsd:double } could bind ?angle to "60"^^trig:degrees and 
"300"^^trig:degrees.  Conceptually a trigonometry library is just a 
graph containing an infinite number of triples (including { 
"60"^^trig:degrees trig:cosine "0.5"^^xsd:double } and { 
"300"^^trig:degrees trig:cosine "0.5"^^xsd:double }).  In practice, 
constraints resolved against the "infinite" graph produce finite 
variable bindings by algorithmic means rather than by consulting a 
store.  Note that absolutely no special case grammatical support is 
required -- extensibility is just a matter of the graph that represents 
the extended function being made available to the query service.  The 
query processor knows which extensions are required by a query because 
the graph which implements the extension appears explicitly in the 
SOURCE/FROM clause.

   One thing we do have to deal with once we introduce graphs of 
infinite size is safety -- the possibility that a query might not be 
constrained to a finite number of variable bindings.  For example, the 
constraint { ?angle trig:cosine ?cos } is unsafe and unable to be 
converted into a finite set of variable bindings.  What will normally 
happen during query resolution is that some of the variables in the 
unsafe constraint will become bound by others constraints, reducing the 
unsafe constraint to a safe form.  If this doesn't occur, I think it'd 
be quite acceptable for the query processor to simply tell the user 
that the query is underconstrained.

   Dyadic and higher functions are admittedly less pleasant to deal 
with, although there are solutions (currying[4], or constructing topic 
map -style association within the query spring to mind as 
possibilities).


* 3.7 "Limited Datatype Support"

   Datatype support can be almost entirely considered as a kind of 
extensible value testing.  Datatypes require the following functions to 
be defined[3]:

   - the membership of its lexical space
   - the membership of its value space
   - the lexical-to-value mapping
   - domain-specific functions (e.g. signum, length)

   So our limited support for XSD could notionally be a graph asserting 
an infinite number of triples, including the following:

   xsd:double          x:lexicalMember  "3.14"              # lexical 
space
   xsd:double          x:valueMember    "3.14"^^xsd:double  # value space
   "3.14"^^xsd:double  x:lexicalForm    "3.14"              # L2V mapping
   "3"^^xsd:integer    x:lessThan       "8"^^xsd:integer    # 
domain-specific
   "3.14"^^xsd:double  x:signum         "1"^^xsd:integer    # 
domain-specific

   The separate AND clause with its own grammar in BRQL has always 
bugged me.  Datatyping constraints make perfect sense as first-class 
citizens in the WHERE clause -- the predicate ought to be enough to 
distinguish whether a constraint needs to be resolved from the triple 
store or the datatype processor.

   Note that to make this work, the graph has to permit literals as 
subjects.  (Can someone explain to me why normal RDF graphs don't 
permit this?  I've never seen an explanation of this restriction.)


* 4.8 "Literal Search"

   Like datatype support, literal search can just be a specific instance 
of extensible value testing.  Provide a graph that defines the 
substring predicate on plain literals:

   "cat" x:substring "c"
   "cat" x:substring "a"
   "cat" x:substring "t"
   "cat" x:substring "ca"
   "cat" x:substring "at"
   "cat" x:substring "cat"
   ... etc ...

   It would seem most convenient to include these triples as part of the 
same graph that provides the limited XSD support, forming something 
similar to the "standard library" in a programming language.


[1] http://www.w3.org/TR/rdf-mt/#graphdefs
[2] http://www.w3.org/2001/sw/DataAccess/rq23/
[3] http://www.w3.org/TR/rdf-mt/#dtype_interp
[4] http://computing-dictionary.thefreedictionary.com/Curried%20function

Received on Tuesday, 24 August 2004 13:30:32 UTC