RE: optionals, provenance and transformation into RDF queries

Alberto,

That's a great note!  I agree with the features your proposing.

Overall points:

1/ Let's be clear when we are expressing something that is at the RDF level
and when there are features that are outside the current RDF.  We all know
that current RDF isn't enough as your contextual/provenance discussion
highlights.  I find Tim's stack [1] or the original layer cake diagram
useful here.

It is important to distinguish when a feature is RDF and when a feature is
beyond RDF because there will be other systems experimentation with the next
wave of "RDF" (RDF-ng) while the core RDF is more stable.

2/ As we go beyond the bound variables/tables approach, the idea that an
SQL-like syntax is helpful decreases.  The SQL-like assumption also
influences our processing model for the query which, for me, is the key
discussion for next query language.

3/ I think it is time to draw a line under RDQL/SquishQL whatever.  Leave
that stable and start a new query language syntax that is more suitable for
the purpose - and define the purpose first!

4/ "query" is overloaded: there is query to get information (base values)
out of the RDF into something an application can use in, say, a web page.
That is the role of bound variables and the SQL-like motivation is very
good.  There is also "query" in finding a subgraph, or sequence of
subgraphs, out of a knowledge base, for further graph-level processing.  See
[2].  Being able to separate the various transformations and manipulations
into a clear processing model would be very powerful.

> optional matches

Very necessary.  There are some restrictions in DQL "may bind" aren't there?

I think that optional graph matching is also necessary (see RDF-QBE [3]
which addresses the restricted case of trees).  Things like "get all triples
whose properties are in the Dublin Core namespace.

> - implementation problems?

A processing model would be the starting point here.  Something like: "do
the exact match", "augment with the optional parts" (this removes the
problems for extract matches using values from the optional parts).

> - possible syntax

I had been thinking of a new clause: the WHERE clause remains the exact
graph pattern match.  But that follows from the "do exact", "do optional"
outline above.

The optional matching is at the current RDF level so should be part of the
language that is not exposed to changes in RDF direction (assuming some sort
of backwards compatibility in RDF-ng).

> provenance information

I don't think there is a standard approach yet emerging - cwm formulae nest
strictly as trees as they can't be named although that seems to be a
superficial issues and I believe the implementation and language could
easily have named formulae (bNodes and URIs) and hence arbitrary structures.

Whether contexts are enough to build full provenance is unclear to me.  I
don't have a sense of the requirements for provenance.

And what about reification? :-)

> results constructors

These are a good idea and makes query a special case of rules equivalent to
"cwm -filter" - i.e. the output does not flow back into the original data
causing further rule firings, and specifically, no new rule are generated by
this route.

In the most general case, there needs to be something like a preamble, a
template for each result solution and a post-amble.  As I understand your
sketch syntax does not make clear which bits get repeated for each solution
and which is not: looks like the rs:solution block is repeated and this is a
clue - should we have special properties whose semantics indicate how to
process the template?  For HTML generation, [4] is interesting.  We have a
student here looking at something similar - a combine rules (external to
data) engine and HTML generator.

Minor: do we need bother SELECT and CONSTRUCT?

Again - great note!

	Andy

[1] http://www.w3.org/DesignIssues/diagrams/SemWave.png
[2] http://www.w3.org/2001/sw/meetings/tech-200303/query
[3] http://www.hpl.hp.com/semweb/publications.htm#RDF-QBE
[4] http://ninebynine.org/RDFNotes/RDFForLittleLanguages.htm


-----Original Message-----
From: Alberto Reggiori [mailto:alberto@asemantics.com] 
Sent: 20 April 2003 22:14
To: 'www-rdf-rules@w3.org'
Subject: optionals, provenance and transformation into RDF queries



hi there!

I have been reading through your IRC/chump [1] emailed to www-archive  
[2] - my strong feeling is that sooner or later we will need to add  
support at the SquishQL/RDQL syntax level for optional matching triples  
(or bindings). In addition, while developing some RDF apps I also found  
out the importance and usefulness to SELECT triples using one more  
dimension/component (s,p,o + c) aka kind of  
provenance/source/context/scope/quads information (whatever that is  
called or means in RDF :)

I would like to discuss the possibility to come up with a common  
(JDBC/ODBC/DBI friendly :) syntax how to express such extensions in our  
SQL-ish query languages - I am also wondering about the use of some  
kind of CONSTRUCT clause ala SeRQL [3] to "transform" or format the  
actual bound vars. IMO this could also help the integration of RDF  
query languages with XML semi-structured ones [4] and developers would  
love that :)

here, I will just quickly summarize some aspects related to this  
extensions (perhaps need to put them on the RDFQR Wiki page [5] later)

optional matches
------------------------
- as soon as you start writing real-world RDF applications you need  
those, otherwise you have to go back to API and "build the query by  
hand"; because RDF data nature is generally irregular, incomplete,  
perhaps expressed using different data granularity, deeply nested
- DQL supports it and perhaps others (???)
- RDF is flexible and tolerant and an RDF query language must be so too
- implementation problems?
- possible syntax

---> Andy's idea about "locating data" and "extracting data" - is it  
about splitting up the RDQL/SquishQL statement SELECT and WHERE parts  
in two?
---> use some special char on SELECTed vars to say they are "optional"  
(question mark at the end is not a good choice for JDBC/ODBC  
compatibility)
---> use square brackets (we are thinking about supporting this syntax  
perhaps flagging the SELECTed vars as optional too)
         WHERE
                         (?x,<some:mandatoryProp>,?y),
                         [ (?x, <some:optionalProp>, ?z)]

---> use full-blown SQL style syntax (really too verbose to me)
        WHERE
                        (  (?x,<some:mandatoryProp>,?y)  ) OR
                        (  (?x,<some:mandatoryProp>,?y),
                            (?x, <some:optionalProp>, ?z) )

provenance information
--------------------------------
- RDF sources once parsed and stored into an RDF database are flatten  
down and at the query time you very often need to filter them based on   
the "context" where they have been asserted - i.e. source URL or some  
other RDF resource which could be further described. This information  
can not be generally represented with triples, perhaps with  
reification, but I do not understand it much :-)
- N3 formulae are something similar
- Quads [6] use that extra component for that IMU
- possible syntax

---> Allow one more component on the triple-pattern ala Quads (we  
already support such a syntax in our implementation of RDQL)

        WHERE
                         (?x, <some:prop>,?y, ?context),
                         (?context, <rdf:type>, <some:MeaningfulContext>)

---> Allow N3 style curly brackets (ugly) - or is there any better  
syntax?

          WHERE ( { (?x, <some:prop>,?y, ?context) } <rdf:type>,  
<some:MeaningfulContext> )

---> Use some other special CONTEXT clause

results constructors
--------------------------
- most applications need to use RDF just to "grep" the Web to some kind  
of XML-ish syntax - RDBMS DBI/JDBC/ODBC are also fine. But having an  
XML result allows to play a lot more with it (e.g. XSLT) and pipe/chain  
things better; and developers would feel more familiar.
- Andy RDF Query result set could also benefit from this
- soon people will start to nest SquisQL/RDQL statements i.e. RDF --->  
RDF transformation
- XQuery supports it already
- SeRQL uses it
- possible syntax

---> ala XQuery using CONSTRUCT or TRANFORM clause

         SELECT
                ?x,?y
         WHERE
                (?res, <some:px>, ?x),
                (?res, <some:py>, ?y)
         CONSTRUCT
                <rs:ResultSet>
                   <rs:resultVariable>x</rs:resultVariable>
                   <rs:resultVariable>y</rs:resultVariable>
                   <rs:size  
rdf:datatype='http://www.w3.org/2000/10/XMLSchema#integer'>1</rs:size>
                  <rs:solution>
                        <rs:ResultSolution>
                            <rs:binding rdf:parseType='Resource'>
                                <rs:variable>x</rs:variable>
                                <rs:value  
rdf:datatype='{$x/rdf:datatype}'>$x</rs:value>
                            </rs:binding>
                            <rs:binding rdf:parseType='Resource'>
                                <rs:variable>y</rs:variable>
                               <rs:value rdf:resource='$y'/>
                           </rs:binding>
                     </rs:ResultSolution>
                 </rs:solution>
              <rs:ResultSet>
         USING
                some FOR <http://somevoc.org/mine/>,
                rs FOR <http://jena.hpl.hp.com/2003/03/result-set#>

---> ala SeRQL (see spec)
---> any better syntax?

IMO this TRANSFORM thingie would be extremely useful, especially to  
dynamically generate XML Web content out of an RDF database - it would  
also open the doors to all the others XML tools already deployed.

Just give some thoughts to all this - in the meantime I will set up  
some use cases for this onto the RDF Query and Rules survey page [7] to  
see if some developer will pick it up

cheers

Alberto

[1]  
http://rdfig.xmlhack.com/2003/04/20/2003-04-20.html#1050846336.312674
[2] http://lists.w3.org/Archives/Public/www-archive/2003Apr/0052.html
[3] http://lists.w3.org/Archives/Public/www-rdf-rules/2003Apr/0013.html
[4] http://www.w3.org/TR/xquery/
[5] http://esw.w3.org/topic/RDFQueryTestcasesRequirements
[6] http://robustai.net/sailor/grammar/Quads.html
[7]  
http://rdfstore.sourceforge.net/2002/06/24/rdf-query/query-use- 
cases.html

Received on Wednesday, 23 April 2003 13:17:16 UTC