RE: ACTION: discuss & promote union query (Was: ACTION: a replace ment for 4.5 focussed on union query) from Seaborne, Andy on 2004-08-24 (public-rdf-dawg@w3.org from July to September 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 24 Aug 2004 17:54:54 +0100
To: Rob Shearer <Rob.Shearer@networkinference.com>, Simon Raboczi <raboczi@tucanatech.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E80803E3BF91@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: Rob Shearer <>
> Date: 24 August 2004 17:27
> 
> > [[
> > 4.5 Querying multiple sources
> > 
> > It should be possible for a query to specify which of the
> > available RDF
> > graphs it is to be executed against.  If more than one RDF graph is
> > specified, the result is as it the query had been executed
> > against the
> > merge[1] of the specified RDF graphs.
> 
> I continue to feel that this feature, and beyond this objective the
> specific approach taken by BRQL, do more harm than good. Unlike SQL
> data, RDF is not segmented into tables and thus within a particular
> server there is absolutely no need to target a particular "piece" of
> RDF.

I agree with the sentiment here.

I not actually sure what approach in BRQL is being referred to as there is
confusion (in my mind at least) between FROM (merge some graphs then execute
query over that) and SOURCE (quads and provenance on a per triple, maybe per
graph pattern basis).

That said, it is the semantic WEB - RDF graphs, as documents can live at
locations on the web so there is some segmentation-like effect.  It does
give rise to the problems in FROM/SOURCE about repeated triples from
different sources, and inferencing over the combination.

> It makes sense to me for source selection to be performed at a
> different level than the query language, such as the network protocol.

+1 - it must work correctly with the protocol, especially a SOAP-based,
service centric approach where it is reasonable to have several target
graphs specified.

> 
> More importantly, the ability to aggregate graphs seems quite orthogonal
> to the ability to query an RDF graph. RDF aggregators will undoubtedly
> be important RDF applications, but in general I expect far more servers
> to only support queries against their one and only RDF store. Adding the
> feature can only hurt the development of truly general aggregation
> functionality.

Agreed.

> 
> > Some services only offer to query one graph; they are considered to
> > trivially satisfy this objective.
> 
> This would be a rather absurd limit to the functionality of that
> feature; what we're standardizing here is an optional feature which is
> by default completely unimplemented (but conformant) and in the vast
> majority of cases will only be partially supported (because it will only
> allow certain things to be aggregated).
> 
> An aggregator is an application in its own right. Why are we
> standardizing the functionality of aggregators? The whole point of RDF
> is that you can merge two graphs and the result is a single graph. Let
> RDF be RDF. Let aggregators be aggregators. Let a query language for RDF
> query RDF, and not some more complex data structure which partitions
> data in ways RDF does not.
> 
> > While a variety of use cases motivate this feature, it is not a
> > requirement because it is not clear whether this feature can be
> > implemented in a generally scalable fashion.
> > ]]
> > 
> > Much as Requirement 3.1 "RDF Graph Pattern Matching --
> > Conjunction" and
> > 3.13 "RDF Graph Pattern Matching -- Disjunction" each introduce a
> > single operator (conjunction and disjunction respectively) into the
> > WHERE clause, this proposal would introduce a set union/graph merge
> > operator into the SOURCE/FROM clause.  (The current BRQL
> > grammar[2] in
> > fact already covers this -- the SOURCE/FROM clause can take a list of
> > documents to be merged.) 
> > 
> > The argument I'm about to make in favor of multiple sources is that
> > it's going to make the query model simpler rather than more
> > complicated.  This is because 4.5 has the power to satisfy several
> > other requirements and objectives simultaneously.
> 
> I disagree with this contention. The ability to aggregate graphs is
> orthoganal from the ability to supplement a single graph with
> query-processor-specific supplementary arcs.

I agree.  I don't see it as a QL issue.

> 
> >  The simplifying
> > principle is that we should never need to deal with anything
> > that isn't
> > graph.  When we do this, we have to add new grammar and query modeling
> > to deal with these non-graph entities.   Rather, make everything the
> > query language needs to deal with into a graph that the WHERE clause
> > can deal with. 
> > 
> > These are some of the other requirements and objectives we could
> > satisfy purely by defining graphs and querying merges of these graphs
> > with the base facts, rather than by adding grammar:
> > 
> > 
> > * 3.3 "Extensible Value Testing"
> > 
> >    A monadic domain-specific function can be represented as a property
> > taking its argument as the subject and returning its result as the
> > object.  Graph patterns can then be used to evaluate the function or
> > its inverse.  For example, the graph pattern { ?angle trig:cosine
> > "0.5"^^xsd:double } could bind ?angle to "60"^^trig:degrees and
> > "300"^^trig:degrees.  Conceptually a trigonometry library is just a
> > graph containing an infinite number of triples (including {
> > "60"^^trig:degrees trig:cosine "0.5"^^xsd:double } and {
> > "300"^^trig:degrees trig:cosine "0.5"^^xsd:double }).  In practice,
> > constraints resolved against the "infinite" graph produce finite
> > variable bindings by algorithmic means rather than by consulting a
> > store.  Note that absolutely no special case grammatical support is
> > required -- extensibility is just a matter of the graph that
> > represents the extended function being made available to the query
> > service.  The query processor knows which extensions are required by a
> > query because
> > the graph which implements the extension appears explicitly in the
> > SOURCE/FROM clause.
> 
> But using SOURCE/FROM is almost certainly NOT what you'd want to do--you
> don't want to aggregate your particular RDF graph with some infinite
> graph you grab from somewhere. That infinite graph can never be
> realized. You'd need to add special functionality to your query
> processor to mimic its consequences.
> 
> Expressing such value tests as triples is certainly appealing, but its
> appeal lies in its simplicity for the language's formal model and for
> the syntax. The implementation doesn't get any easier and you don't get
> this feature "for free" just by being able to aggregate graphs.
> 
> >    One thing we do have to deal with once we introduce graphs of
> > infinite size is safety -- the possibility that a query might not be
> > constrained to a finite number of variable bindings.  For
> > example, the
> > constraint { ?angle trig:cosine ?cos } is unsafe and unable to be
> > converted into a finite set of variable bindings.  What will normally
> > happen during query resolution is that some of the variables in the
> > unsafe constraint will become bound by others constraints,
> > reducing the
> > unsafe constraint to a safe form.  If this doesn't occur, I
> > think it'd
> > be quite acceptable for the query processor to simply tell the user
> > that the query is underconstrained.
> > 
> >    Dyadic and higher functions are admittedly less pleasant to deal
> > with, although there are solutions (currying[4], or
> > constructing topic
> > map -style association within the query spring to mind as
> > possibilities).
> 
> The problems you bring up are issues we've faced. Data values and
> datatypes are fundamentally hard problems in RDF because RDF is so poor
> in the specifics of these things. Either you allow too little (OWL makes
> it hard to use simple ranges of numbers) or too much (full XML Schema
> datatypes are arbitrarily difficult to reason about).
> 
> > * 3.7 "Limited Datatype Support"
> > 
> >    Datatype support can be almost entirely considered as a kind of
> > extensible value testing.  Datatypes require the following
> > functions to
> > be defined[3]:
> > 
> >    - the membership of its lexical space
> >    - the membership of its value space
> >    - the lexical-to-value mapping
> >    - domain-specific functions (e.g. signum, length)
> > 
> >    So our limited support for XSD could notionally be a graph
> > asserting an infinite number of triples, including the following:
> > 
> >    xsd:double          x:lexicalMember  "3.14"              # lexical
> >    space xsd:double          x:valueMember    "3.14"^^xsd:double  #
> > value space
> >    "3.14"^^xsd:double  x:lexicalForm    "3.14"              #
> > L2V mapping
> >    "3"^^xsd:integer    x:lessThan       "8"^^xsd:integer    #
> >    domain-specific "3.14"^^xsd:double  x:signum        
> > "1"^^xsd:integer    # domain-specific 
> > 
> >    The separate AND clause with its own grammar in BRQL has always
> > bugged me.  Datatyping constraints make perfect sense as first-class
> > citizens in the WHERE clause -- the predicate ought to be enough to
> > distinguish whether a constraint needs to be resolved from the triple
> > store or the datatype processor.
> > 
> >    Note that to make this work, the graph has to permit literals as
> > subjects.  (Can someone explain to me why normal RDF graphs don't
> > permit this?  I've never seen an explanation of this restriction.)
> 
> Again, simple aggregation doesn't get you any of this. If you want a
> spiffy "values as graphs" syntax, then you can do it without aggregation
> and answer queries under the assumption that your original graph was
> supplemented by all these assertions. Common graph syntax is attractive,
> but I can't imagine that demanding a user write aggregation clauses
> every time they want to test an integer can be viewed as a feature.
> 
> > * 4.8 "Literal Search"
> > 
> >    Like datatype support, literal search can just be a
> > specific instance
> > of extensible value testing.  Provide a graph that defines the
> > substring predicate on plain literals:
> > 
> >    "cat" x:substring "c"
> >    "cat" x:substring "a"
> >    "cat" x:substring "t"
> >    "cat" x:substring "ca"
> >    "cat" x:substring "at"
> >    "cat" x:substring "cat"
> >    ... etc ...
> > 
> >    It would seem most convenient to include these triples as
> > part of the
> > same graph that provides the limited XSD support, forming something
> > similar to the "standard library" in a programming language.
> 
> I'm a broken record: you can't get this functionality by simply
> aggregating the graphs, because the graphs are infinite.
Received on Tuesday, 24 August 2004 16:55:37 UTC