RE: ACTION: discuss & promote union query (Was: ACTION: a replace ment for 4.5 focussed on union query) from Seaborne, Andy on 2004-08-24 (public-rdf-dawg@w3.org from July to September 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 24 Aug 2004 17:30:22 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>, Simon Raboczi <raboczi@tucanatech.com>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E80803E3BF8C@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: Simon Raboczi <>
> Date: 24 August 2004 14:31
> 
> Incorporating Dan's suggestions brings us to the following state of
> play:
> 
> [[
> 4.5 Querying multiple sources
> 
> It should be possible for a query to specify which of the available RDF
> graphs it is to be executed against.  If more than one RDF graph is
> specified, the result is as it the query had been executed against the
> merge[1] of the specified RDF graphs.

OK - that text says to me that it is not about named graphs / SOURCE
(provenance) issues.  It is a straight query over the merge as defined as in
the RDF model theory.  It is an query service implementation issue of
whether to allow it or bounce it with an error.

> 
> Some services only offer to query one graph; they are considered to
> trivially satisfy this objective.

Yes - but only if there is no interaction with SOURCE/provenance in the
query.

> 
> While a variety of use cases motivate this feature, it is not a
> requirement because it is not clear whether this feature can be
> implemented in a generally scalable fashion.
> ]]
> 
> Much as Requirement 3.1 "RDF Graph Pattern Matching -- Conjunction" and
> 3.13 "RDF Graph Pattern Matching -- Disjunction" each introduce a
> single operator (conjunction and disjunction respectively) into the
> WHERE clause, this proposal would introduce a set union/graph merge
> operator into the SOURCE/FROM clause.  (The current BRQL grammar[2] in
> fact already covers this -- the SOURCE/FROM clause can take a list of
> documents to be merged.)

This seems different - this is about merging etc for parts of a query, not
the target of the whole query.  e.g. can the output of one stage be a
CONSTRUCT for further query (that's called a rule system!).  Its an
interesting possibility but doesn't it get us outside RDF?  Its more a
computational model of query and (with the restriction later on) is going to
be order dependent on query execution (whether thigns are bound or not at a
given point in the query).

> 
> The argument I'm about to make in favor of multiple sources is that
> it's going to make the query model simpler rather than more
> complicated.  
> This is because 4.5 has the power to satisfy several
> other requirements and objectives simultaneously.  The simplifying
> principle is that we should never need to deal with anything that isn't
> graph.  

There isn't this restriction at the moment - see below.

> When we do this, we have to add new grammar and query modeling
> to deal with these non-graph entities.   Rather, make everything the
> query language needs to deal with into a graph that the WHERE clause
> can deal with.
> 
> These are some of the other requirements and objectives we could
> satisfy purely by defining graphs and querying merges of these graphs
> with the base facts, rather than by adding grammar:

"grammar", as in syntax, is also about convenience of expression.  It is
nice to have a simple, pure model of query execution but its also nice to be
able to write it down clearly.

> 
> 
> * 3.3 "Extensible Value Testing"
> 
>    A monadic domain-specific function can be represented as a property
> taking its argument as the subject and returning its result as the
> object.  Graph patterns can then be used to evaluate the function or
> its inverse.  For example, the graph pattern { ?angle trig:cosine
> "0.5"^^xsd:double } could bind ?angle to "60"^^trig:degrees and
> "300"^^trig:degrees.  Conceptually a trigonometry library is just a
> graph containing an infinite number of triples (including {
> "60"^^trig:degrees trig:cosine "0.5"^^xsd:double } and {
> "300"^^trig:degrees trig:cosine "0.5"^^xsd:double }).  In practice,
> constraints resolved against the "infinite" graph produce finite
> variable bindings by algorithmic means rather than by consulting a
> store.  Note that absolutely no special case grammatical support is
> required -- extensibility is just a matter of the graph that represents
> the extended function being made available to the query service.  The
> query processor knows which extensions are required by a query because
> the graph which implements the extension appears explicitly in the
> SOURCE/FROM clause.

Just to avoid confusion: FROM is about naming the sources over which the
whol query executes (and is not required), SOURCE is about quads and
accessing the 4th slot.

(SOURCE is a synonym for FROM in RDQL - the meaning changed in BRQL).

> 
>    One thing we do have to deal with once we introduce graphs of
> infinite size is safety -- the possibility that a query might not be
> constrained to a finite number of variable bindings.  For example, the
> constraint { ?angle trig:cosine ?cos } is unsafe and unable to be
> converted into a finite set of variable bindings.  What will normally
> happen during query resolution is that some of the variables in the
> unsafe constraint will become bound by others constraints, reducing the
> unsafe constraint to a safe form.  If this doesn't occur, I think it'd
> be quite acceptable for the query processor to simply tell the user
> that the query is underconstrained.

That would be statically determined at query parse/compile time or
dynamically determined at query execution time?  With OPTIONALs, it seems
that static determination is not always possible.  Hmm - for some functions
it may be value-dependent.

> 
>    Dyadic and higher functions are admittedly less pleasant to deal
> with, although there are solutions (currying[4], or constructing topic
> map -style association within the query spring to mind as
> possibilities).
> 
> 
> * 3.7 "Limited Datatype Support"
> 
>    Datatype support can be almost entirely considered as a kind of
> extensible value testing.  Datatypes require the following functions to
> be defined[3]:
> 
>    - the membership of its lexical space
>    - the membership of its value space
>    - the lexical-to-value mapping
>    - domain-specific functions (e.g. signum, length)
> 
>    So our limited support for XSD could notionally be a graph asserting
> an infinite number of triples, including the following:
> 
>    xsd:double          x:lexicalMember  "3.14"              # lexical
> space
>    xsd:double          x:valueMember    "3.14"^^xsd:double  # value
>    space "3.14"^^xsd:double  x:lexicalForm    "3.14"              # L2V
>    mapping "3"^^xsd:integer    x:lessThan       "8"^^xsd:integer    #
> domain-specific
>    "3.14"^^xsd:double  x:signum         "1"^^xsd:integer    #
> domain-specific
> 
>    The separate AND clause with its own grammar in BRQL has always
> bugged me.

And it is not in the new syntax :-)
http://lists.w3.org/Archives/Public/public-rdf-dawg/2004JulSep/0265.html

Sorry its not formally sorted out but I don't want the syntax-tail wagging
the specification-dawg.

SELECT *
WHERE { ?x :p ?v . ?v < 30 . ?x :q ?z }

could be:

SELECT *
WHERE { ?x :p ?v . ?v x:lessThan 30 . ?x :q ?z }

if the query processor so desires.  Or vice-versa.

Both Eric and I have implemented a parser that shows there is no problems
parsing wise (LALR in my case - lookahead 1 but locally 2 - the usual
transformation would allow LALR(1) uniformly).

The grammar (syntax) element is to let people write things naturally.  It
happens to also avoid literal-as-subjects and predicate-variables (that is
for bound ?v, { ?v ?p 30 . } - how is ?v related to 30.


>  Datatyping constraints make perfect sense as first-class
> citizens in the WHERE clause -- the predicate ought to be enough to
> distinguish whether a constraint needs to be resolved from the triple
> store or the datatype processor.
> 

Query has never precluded such "computed" properties - it is a matter of how
the graph is implemented, not the query language.  Example: rdf:type tends
to cause some computation on an OWL/RDFS inferencing model and it requires
no QL support.

Given URIs for properties, I don't see the need to name the graph here - by
analogy, it is more like owl:imports.  Have I missed something?

>    Note that to make this work, the graph has to permit literals as
> subjects.  (Can someone explain to me why normal RDF graphs don't
> permit this?  I've never seen an explanation of this restriction.)

I understand it is as much to do with serialization in RDF/XML.  Certainly,
restrictions on properties to URIs (and some certain URIs at that) stem from
that.

> 
> 
> * 4.8 "Literal Search"
> 
>    Like datatype support, literal search can just be a specific instance
> of extensible value testing.  Provide a graph that defines the
> substring predicate on plain literals:
> 
>    "cat" x:substring "c"
>    "cat" x:substring "a"
>    "cat" x:substring "t"
>    "cat" x:substring "ca"
>    "cat" x:substring "at"
>    "cat" x:substring "cat"
>    ... etc ...
> 
>    It would seem most convenient to include these triples as part of the
> same graph that provides the limited XSD support, forming something
> similar to the "standard library" in a programming language.
> 
> 
> [1] http://www.w3.org/TR/rdf-mt/#graphdefs
> [2] http://www.w3.org/2001/sw/DataAccess/rq23/
> [3] http://www.w3.org/TR/rdf-mt/#dtype_interp
> [4] http://computing-dictionary.thefreedictionary.com/Curried%20function
Received on Tuesday, 24 August 2004 16:31:14 UTC