- From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
- Date: Wed, 23 Apr 2003 18:16:53 +0100
- To: "'Alberto Reggiori'" <alberto@asemantics.com>, "'www-rdf-rules@w3.org'" <www-rdf-rules@w3.org>, "'Dave Reynolds'" <Dave_Reynolds@hplb.hpl.hp.com>
Alberto, That's a great note! I agree with the features your proposing. Overall points: 1/ Let's be clear when we are expressing something that is at the RDF level and when there are features that are outside the current RDF. We all know that current RDF isn't enough as your contextual/provenance discussion highlights. I find Tim's stack [1] or the original layer cake diagram useful here. It is important to distinguish when a feature is RDF and when a feature is beyond RDF because there will be other systems experimentation with the next wave of "RDF" (RDF-ng) while the core RDF is more stable. 2/ As we go beyond the bound variables/tables approach, the idea that an SQL-like syntax is helpful decreases. The SQL-like assumption also influences our processing model for the query which, for me, is the key discussion for next query language. 3/ I think it is time to draw a line under RDQL/SquishQL whatever. Leave that stable and start a new query language syntax that is more suitable for the purpose - and define the purpose first! 4/ "query" is overloaded: there is query to get information (base values) out of the RDF into something an application can use in, say, a web page. That is the role of bound variables and the SQL-like motivation is very good. There is also "query" in finding a subgraph, or sequence of subgraphs, out of a knowledge base, for further graph-level processing. See [2]. Being able to separate the various transformations and manipulations into a clear processing model would be very powerful. > optional matches Very necessary. There are some restrictions in DQL "may bind" aren't there? I think that optional graph matching is also necessary (see RDF-QBE [3] which addresses the restricted case of trees). Things like "get all triples whose properties are in the Dublin Core namespace. > - implementation problems? A processing model would be the starting point here. Something like: "do the exact match", "augment with the optional parts" (this removes the problems for extract matches using values from the optional parts). > - possible syntax I had been thinking of a new clause: the WHERE clause remains the exact graph pattern match. But that follows from the "do exact", "do optional" outline above. The optional matching is at the current RDF level so should be part of the language that is not exposed to changes in RDF direction (assuming some sort of backwards compatibility in RDF-ng). > provenance information I don't think there is a standard approach yet emerging - cwm formulae nest strictly as trees as they can't be named although that seems to be a superficial issues and I believe the implementation and language could easily have named formulae (bNodes and URIs) and hence arbitrary structures. Whether contexts are enough to build full provenance is unclear to me. I don't have a sense of the requirements for provenance. And what about reification? :-) > results constructors These are a good idea and makes query a special case of rules equivalent to "cwm -filter" - i.e. the output does not flow back into the original data causing further rule firings, and specifically, no new rule are generated by this route. In the most general case, there needs to be something like a preamble, a template for each result solution and a post-amble. As I understand your sketch syntax does not make clear which bits get repeated for each solution and which is not: looks like the rs:solution block is repeated and this is a clue - should we have special properties whose semantics indicate how to process the template? For HTML generation, [4] is interesting. We have a student here looking at something similar - a combine rules (external to data) engine and HTML generator. Minor: do we need bother SELECT and CONSTRUCT? Again - great note! Andy [1] http://www.w3.org/DesignIssues/diagrams/SemWave.png [2] http://www.w3.org/2001/sw/meetings/tech-200303/query [3] http://www.hpl.hp.com/semweb/publications.htm#RDF-QBE [4] http://ninebynine.org/RDFNotes/RDFForLittleLanguages.htm -----Original Message----- From: Alberto Reggiori [mailto:alberto@asemantics.com] Sent: 20 April 2003 22:14 To: 'www-rdf-rules@w3.org' Subject: optionals, provenance and transformation into RDF queries hi there! I have been reading through your IRC/chump [1] emailed to www-archive [2] - my strong feeling is that sooner or later we will need to add support at the SquishQL/RDQL syntax level for optional matching triples (or bindings). In addition, while developing some RDF apps I also found out the importance and usefulness to SELECT triples using one more dimension/component (s,p,o + c) aka kind of provenance/source/context/scope/quads information (whatever that is called or means in RDF :) I would like to discuss the possibility to come up with a common (JDBC/ODBC/DBI friendly :) syntax how to express such extensions in our SQL-ish query languages - I am also wondering about the use of some kind of CONSTRUCT clause ala SeRQL [3] to "transform" or format the actual bound vars. IMO this could also help the integration of RDF query languages with XML semi-structured ones [4] and developers would love that :) here, I will just quickly summarize some aspects related to this extensions (perhaps need to put them on the RDFQR Wiki page [5] later) optional matches ------------------------ - as soon as you start writing real-world RDF applications you need those, otherwise you have to go back to API and "build the query by hand"; because RDF data nature is generally irregular, incomplete, perhaps expressed using different data granularity, deeply nested - DQL supports it and perhaps others (???) - RDF is flexible and tolerant and an RDF query language must be so too - implementation problems? - possible syntax ---> Andy's idea about "locating data" and "extracting data" - is it about splitting up the RDQL/SquishQL statement SELECT and WHERE parts in two? ---> use some special char on SELECTed vars to say they are "optional" (question mark at the end is not a good choice for JDBC/ODBC compatibility) ---> use square brackets (we are thinking about supporting this syntax perhaps flagging the SELECTed vars as optional too) WHERE (?x,<some:mandatoryProp>,?y), [ (?x, <some:optionalProp>, ?z)] ---> use full-blown SQL style syntax (really too verbose to me) WHERE ( (?x,<some:mandatoryProp>,?y) ) OR ( (?x,<some:mandatoryProp>,?y), (?x, <some:optionalProp>, ?z) ) provenance information -------------------------------- - RDF sources once parsed and stored into an RDF database are flatten down and at the query time you very often need to filter them based on the "context" where they have been asserted - i.e. source URL or some other RDF resource which could be further described. This information can not be generally represented with triples, perhaps with reification, but I do not understand it much :-) - N3 formulae are something similar - Quads [6] use that extra component for that IMU - possible syntax ---> Allow one more component on the triple-pattern ala Quads (we already support such a syntax in our implementation of RDQL) WHERE (?x, <some:prop>,?y, ?context), (?context, <rdf:type>, <some:MeaningfulContext>) ---> Allow N3 style curly brackets (ugly) - or is there any better syntax? WHERE ( { (?x, <some:prop>,?y, ?context) } <rdf:type>, <some:MeaningfulContext> ) ---> Use some other special CONTEXT clause results constructors -------------------------- - most applications need to use RDF just to "grep" the Web to some kind of XML-ish syntax - RDBMS DBI/JDBC/ODBC are also fine. But having an XML result allows to play a lot more with it (e.g. XSLT) and pipe/chain things better; and developers would feel more familiar. - Andy RDF Query result set could also benefit from this - soon people will start to nest SquisQL/RDQL statements i.e. RDF ---> RDF transformation - XQuery supports it already - SeRQL uses it - possible syntax ---> ala XQuery using CONSTRUCT or TRANFORM clause SELECT ?x,?y WHERE (?res, <some:px>, ?x), (?res, <some:py>, ?y) CONSTRUCT <rs:ResultSet> <rs:resultVariable>x</rs:resultVariable> <rs:resultVariable>y</rs:resultVariable> <rs:size rdf:datatype='http://www.w3.org/2000/10/XMLSchema#integer'>1</rs:size> <rs:solution> <rs:ResultSolution> <rs:binding rdf:parseType='Resource'> <rs:variable>x</rs:variable> <rs:value rdf:datatype='{$x/rdf:datatype}'>$x</rs:value> </rs:binding> <rs:binding rdf:parseType='Resource'> <rs:variable>y</rs:variable> <rs:value rdf:resource='$y'/> </rs:binding> </rs:ResultSolution> </rs:solution> <rs:ResultSet> USING some FOR <http://somevoc.org/mine/>, rs FOR <http://jena.hpl.hp.com/2003/03/result-set#> ---> ala SeRQL (see spec) ---> any better syntax? IMO this TRANSFORM thingie would be extremely useful, especially to dynamically generate XML Web content out of an RDF database - it would also open the doors to all the others XML tools already deployed. Just give some thoughts to all this - in the meantime I will set up some use cases for this onto the RDF Query and Rules survey page [7] to see if some developer will pick it up cheers Alberto [1] http://rdfig.xmlhack.com/2003/04/20/2003-04-20.html#1050846336.312674 [2] http://lists.w3.org/Archives/Public/www-archive/2003Apr/0052.html [3] http://lists.w3.org/Archives/Public/www-rdf-rules/2003Apr/0013.html [4] http://www.w3.org/TR/xquery/ [5] http://esw.w3.org/topic/RDFQueryTestcasesRequirements [6] http://robustai.net/sailor/grammar/Quads.html [7] http://rdfstore.sourceforge.net/2002/06/24/rdf-query/query-use- cases.html
Received on Wednesday, 23 April 2003 13:17:16 UTC