- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 9 Nov 2006 19:37:55 +0100
- To: Andrew Newman <andrewfnewman@gmail.com>
- Cc: public-rdf-dawg-comments@w3.org
On Thu, Nov 09, 2006 at 05:52:25AM +1000, Andrew Newman wrote: > > I'd like present to the DAWG public comments list my Honours thesis. > It discusses a formal model, the relational model, to SPARQL. It > builds on the work of by Cyganiak, Frasincar et al., Harris and > Shadbolt, Pérez et al. and others. Hopefully, it's appropriate to > some of the current discussions. > > It's available at (~500K): > http://jrdf.sourceforge.net/RelationalBasedSPARQL.pdf thank you very much for this work! it looks very useful to the DAWG and the community as a whole. > In it I suggest that the current SPARQL specification is directly > influenced by implementation specifics such as SQL and not RDF. It is > argued that the semantics of SQL is a poor match to the RDF data > model. Examples of this mismatch include: > * The existence of NULL (section 2.3). > * UNION and other operations may or may not return duplicates (section 2.4). > * Lack of Compositional Semantics (section 2.5). > * Order dependent OPTIONAL (like SQL's left outer join) (section 2.6). ====Duplicates==== You note that [[ SQL is often seen as an implementation of the relational model even though it has numerous incompatibilities with the relational model such as bag instead of set semantics, column ordering, duplicates and handling of nulls [6]. ]] and later state [[ As both the RDF model and the relational model are both propositional and set-based it is likely that a compatible model for querying RDF can be provided. ]] Does the relational model you propose preserve duplicates? Specifically, are the ⋈(null-accepting join) and π (sorry, I may have the wrong codepoint there, using a greek lower case pi) defined to work on sets or bags? Is this choice orthogonal to the rest of the calculus? §2.4 Duplicates implies that you favor bags. While I don't know all of Date's arguments against duplicates, I suspect that the use cases for aggregates motivate him to have duplicates in the algebra, just not in the report (guessing here). A practical implication of closure is that one should be able to implement the aggregate downstream of some report {SELECT ?type WHERE { ?employee a ?type }} , without burdening the querier to project in something distinct, like the employee Id. Since the DAWG has posponed aggregates [AG], users will need a bag semantics to be able to postprocess with their one aggregates. ====NULLs==== [[ Cyganiak says, “The SPARQL model does not, for example, distinguish between an OPTIONAL variable that is unbound in some solutions, and a variable that is not used in the query at all.” This result leads to confusion as to what an unbound result means. ]] I think this would be analogous to an SQL that allowed me to select non-existent columns and responded with NULLs. Are there use cases where this confusion matters? ====DEE and DUM==== I interprated the SPARQL Optional table as specifying the results of row heading OPT column heading. Substituting in T, F and R: ┏━━━┯━━━━━━━━┯━━━━━━━━┯━━━━━━━━━┓ ┃ │ T │ F │ R ┃ T=DEE ┃ T │ T⟕T=T │ T⟕F=T │ T⟕Ro=T ┃ F=DUM ┃ F │ F⟕T=F │ F⟕F=F │ F⟕Ro=F ┃ R=non-optional relation ┃ R │ R⟕T=Rr │ R⟕F=Rr │ R⟕Ro=Rr ┃ Ro=option relation ┗━━━┷━━━━━━━━┷━━━━━━━━┷━━━━━━━━━┛ Rr=resulting relation Searching for corresponding queries, I get arrive at: ┏━━━┯━━━━━━━━━━━━━━━━━━━━━━┯━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ │ T │ F │ R ┃ ┃ T │ { OPT {} } │ │ { OPT { ?sO ?pO ?oO } } ┃ ┃ F │ │ │ ┃ ┃ R │ { ?s ?p ?o OPT { } } │ │ { ?s ?p ?o OPT { ?sO ?pO ?oO } } ┃ ┗━━━┷━━━━━━━━━━━━━━━━━━━━━━┷━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ Basically, I didn't figure out how to write queries that used F (DUM). ====Order Independent Joins==== If the motivation for the full outer join was the freedom to optimize by re-ordering, I think we also have to look at the performance of a re-orderable full outer join vs. the non-reorderable left join. I am generally in favor of left joins over full outer joins for intuition and parallelism with SQL (which I suppose is the source of much of the "intuition"). I need to explore the subsumption scheme you outlined to see how it interacts with queris for the non-existance of a particular pattern (NAF is spelled OPTIONAL !BOUND in SPARQL). > Outcomes presented include: > * A way of mapping RDF and SPARQL operations to the relational model > (section 4). > * Using tuple subsumption to implement UNION and OPTIONAL using > previous optimisation techniques (section 2.7) that is up to twice as > fast as an alternate implementation (using join, antjoin and union) > and up to 8 times faster than ARQ (section 4.5). > * An order independent version of OPTIONAL using full outer join and > tuple subsumption (section 4.4). > > Suggested future work includes: > * Using SQL to implement tuple subsumption OPTIONAL and UNION (section 5.3) > * Alternative ways of implementing ASK and CONSTRUCT (section 5.1) > using the relational model as a basis. > * Aggregate functions (section 5.1). > * Other optimisation techniques if compositional semantics are chosen > (section 5.2). > > The current code is only available through SF subversion: > svn co https://svn.sourceforge.net/svnroot/jrdf jrdf [AG] http://www.w3.org/2001/sw/DataAccess/issues#countAggregate -- -eric home-office: +1.617.395.1213 (usually 900-2300 CET) +33.1.45.35.62.14 cell: +33.6.73.84.87.26 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Thursday, 9 November 2006 18:37:06 UTC