Re: optionals, provenance and transformation into RDF queries from Alberto Reggiori on 2003-05-14 (www-rdf-rules@w3.org from May 2003)

From: Alberto Reggiori <alberto@asemantics.com>
Date: Wed, 14 May 2003 18:26:44 +0200
To: "Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>
Cc: Alberto Reggiori <alberto@asemantics.com>, "'www-rdf-rules@w3.org'" <www-rdf-rules@w3.org>, "'Dave Reynolds'" <Dave_Reynolds@hplb.hpl.hp.com>
Message-Id: <D9F87B9A-8628-11D7-BF54-000393A63C18@asemantics.com>
On Wednesday, April 23, 2003, at 07:16  PM, Seaborne, Andy wrote:

> Alberto,
>
> That's a great note!  I agree with the features your proposing.

Andy, apologies to catch this up so late

>
> Overall points:
>
> 1/ Let's be clear when we are expressing something that is at the RDF 
> level
> and when there are features that are outside the current RDF.  We all 
> know
> that current RDF isn't enough as your contextual/provenance discussion
> highlights.  I find Tim's stack [1] or the original layer cake diagram
> useful here.
>
> It is important to distinguish when a feature is RDF and when a 
> feature is
> beyond RDF because there will be other systems experimentation with 
> the next
> wave of "RDF" (RDF-ng) while the core RDF is more stable.

yes I agree - even if RDF specs are getting more and more stable these 
days, real-world applications need more; they need to layer/abstract 
over the core model to build something useful and meaningful for the 
user. IMO using RDF is just like using Lego bricks - you can build 
extremely flexible, fine grated, general and complex things; but once 
you have got lots of those bricks mounted you need something to reduce 
complexity, simplify access and better manage things. And that's why 
the role of RDF query languages is so important to me - they should 
simplify life to developers in all this process.

> 2/ As we go beyond the bound variables/tables approach, the idea that 
> an
> SQL-like syntax is helpful decreases.  The SQL-like assumption also
> influences our processing model for the query which, for me, is the key
> discussion for next query language.

true, most of the RDF query languages today are still too DBMS centric. 
On the other side relational database has proven to be a very effective 
and efficient technology and people still love SQL, and this will 
continue to be true for the foreseeable future. RDF query languages 
should instead try to support both ways to represent results: graphs 
and tables. Graphs are needed for RDF-to-RDF transformations while 
tables for end-developers.

>
> 3/ I think it is time to draw a line under RDQL/SquishQL whatever.  
> Leave
> that stable and start a new query language syntax that is more 
> suitable for
> the purpose - and define the purpose first!

I completely agree - we need a newer/better query model and query 
language at this stage.

>
> 4/ "query" is overloaded: there is query to get information (base 
> values)
> out of the RDF into something an application can use in, say, a web 
> page.
> That is the role of bound variables and the SQL-like motivation is very
> good.  There is also "query" in finding a subgraph, or sequence of
> subgraphs, out of a knowledge base, for further graph-level 
> processing.  See
> [2].  Being able to separate the various transformations and 
> manipulations
> into a clear processing model would be very powerful.

SQL and JDBC/ODBC/DBI  interfaces are very good for today's Web 
programmers - RDF graphs are instead needed for generic data 
manipulation and transformations tools on the Semantic Web. We need 
both views!

>
>> optional matches
>
> Very necessary.  There are some restrictions in DQL "may bind" aren't 
> there?
>
> I think that optional graph matching is also necessary (see RDF-QBE [3]
> which addresses the restricted case of trees).  Things like "get all 
> triples
> whose properties are in the Dublin Core namespace.

such a new query language should support standard database-style query 
primitives together with
the possibility to browse and navigate RDF data in hypertext style 
using very generic regular-path-expressions. Optional branches, may 
bind variables and wild-card style queries must be supported otherwise 
the poor application developer will have to invent those anyway to get 
its job done.

>
>> - implementation problems?
>
> A processing model would be the starting point here.  Something like: 
> "do
> the exact match", "augment with the optional parts" (this removes the
> problems for extract matches using values from the optional parts).

I agree again

>> provenance information
>
> I don't think there is a standard approach yet emerging - cwm formulae 
> nest
> strictly as trees as they can't be named although that seems to be a
> superficial issues and I believe the implementation and language could
> easily have named formulae (bNodes and URIs) and hence arbitrary 
> structures.
>
> Whether contexts are enough to build full provenance is unclear to me. 
>  I
> don't have a sense of the requirements for provenance.

In my experience when triples are being added to a graph it is often 
useful to be able to track back where they came from (e.g. Internet 
source Web site or domain), how they were added, by whom, why, when 
(e.g. date), when they will expire (e.g. Time-To-Live) and so on. This 
requires the flagging of triples as belonging to different 
contexts/scopes and then describing in RDF itself the relationships 
between the contexts. At query time such information can then be used 
by the application to define a search scope to filter the results. At 
this very moment I am not sure the idea and requirements I got for 
contexts/provenance do match the cwm formulae ones but I am sure they 
must be very close in scope.

>
> And what about reification? :-)

good one! :-)

>
>> results constructors
>
> These are a good idea and makes query a special case of rules 
> equivalent to
> "cwm -filter" - i.e. the output does not flow back into the original 
> data
> causing further rule firings, and specifically, no new rule are 
> generated by
> this route.

yes, I find them very useful especially if you will want to nest 
queries, pre o post-process results using XML tools and so on.

>
> In the most general case, there needs to be something like a preamble, 
> a
> template for each result solution and a post-amble.  As I understand 
> your
> sketch syntax does not make clear which bits get repeated for each 
> solution
> and which is not: looks like the rs:solution block is repeated and 
> this is a
> clue - should we have special properties whose semantics indicate how 
> to
> process the template?  For HTML generation, [4] is interesting.  We 
> have a
> student here looking at something similar - a combine rules (external 
> to
> data) engine and HTML generator.

I should have explained that I bit sorry - what I referred to was the 
idea to prototype what the input and the output of the query should be. 
The XML query result constructors turned out to be a familiar syntax to 
me and I guess it will be so for most PHP, XSLT and JSP programmers out 
there :)

>
> Minor: do we need bother SELECT and CONSTRUCT?

we might need both, especially if we would need to map RDF graphs back 
to SQL tables and vice-versa.

> Again - great note!

so, let's start! :-)

cheers

Alberto
Received on Wednesday, 14 May 2003 12:28:16 UTC