Re: SPARQL Comments (Personal) [OK?] from Seaborne, Andy on 2006-02-08 (public-rdf-dawg-comments@w3.org from February 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 08 Feb 2006 15:08:13 +0000
To: David Wood <dwood@softwarememetics.com>
CC: public-rdf-dawg-comments@w3.org, public-swbp-wg@w3.org
Message-ID: <43EA095D.3040109@hp.com>
-------- Original Message --------
Subject: SPARQL Comments (Personal)
Date: Sat, 5 Nov 2005 07:27:19 -0500
From: David Wood <dwood@softwarememetics.com>
To: public-rdf-dawg-comments@w3.org
CC: public-swbp-wg@w3.org


Hi all,

I have made some comments on the SPARQL language at [1].  A brief
discussion thread ensued.

Please note that DanC has already drawn my attention ([2]) to the
isLiteral filter and the state of UNSAID.

These comments should be taken as PERSONAL and do NOT represent any
W3C working group.

[1] http://lists.w3.org/Archives/Public/public-swbp-wg/2005Oct/0107.html
[2] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/
0162.html

Regards,
Dave

-------- Original Message --------
      > From:
      > http://lists.w3.org/Archives/Public/public-swbp-wg/2005Oct/0107.html
      >
      > Hi all,
      >
      > I have an action item [1] to review and comment on the specification
      > for the SPARQL Query Language for RDF [2].
      >
      > I reviewed the 21 July 2005 Working Draft.
      >
      > Regards,
      > Dave
      >
      > [1] http://www.w3.org/2005/10/03-swbp-minutes.html#action17
      > [2] http://www.w3.org/TR/rdf-sparql-query/
      > -----------------------------%
      > <--------------------------------------------
      >
      > Review of SPARQL Query Language for RDF Working Draft
      >
      > OVERVIEW:
      >
      >
      > LANGUAGE ISSUES:
      >
      > 1)  SPARQL would appear not to have a complete model theoretic base.
      > Although sections of the specification are described using sets,
      > nothing is presented which hangs all of the section together.  This is
      > unfortunate and is, I think, the underlying cause of some of language
      > features critiqued below.  (NB:  I tried to get Simon Raboczi to
      > complete and submit his model theory unifying iTQL and SPARQL, but was
      > unsuccessful in doing so.)

There is the XSLT transform that extracts the definitions.  It can be used via
W3C's XSLT service:

http://www.w3.org/2000/06/webdata/xslt?xslfile=http%3A%2F%2Fwww.w3.org%2F2001%2Fsw%2FDataAccess%2Frq23%2Fdefns.xsl&xmlfile=http%3A%2F%2Fwww.w3.org%2F2001%2Fsw%2FDataAccess%2Frq23%2F&transform=Submit

and the stylesheet are available for offline use.

      > 2)  The language is not built with extensibility in mind.  That is, is
      > not, in my opinion, sufficiently functional.  There are several areas
      > of functionality which we already know are of interest to users of RDF
      > query languages (e.g. iTQL's 'walk' and 'trans' functions which perform
      > generic graph walking and transitive closure,
      > respectfully) and it is difficult to see how one might add these
      > commands to a later SPARQL version without making wholesale changes to
      > the language.

The use cases and requirements document describes the extensibility goals for
SPARQL.  http://www.w3.org/TR/rdf-dawg-uc/

Specifically on transitive properties: a SPARQL query can be made over a graph
containing the transitive closure of some base data.  Also, the graph can
contain propeties which are the transitive form of teh directc property.

      > 3)  The handling of blank nodes ("bnodes") is, again in my opinion, the
      > single greatest failure of the specification.  We have to admit that
      > RDF graphs contain bnodes and queries will run across them.  We also
      > have to admit that a querier will (not 'may') often want to
      > subsequently find information connected to those bnodes.  SPARQL's
      > insistence that bnodes' true internal identities not be returned to a
      > querier (correct in and of itself) combined with the lack of subquery
      > capability ensures that many useful RDF queries routinely performed in
      > other languages simply cannot be written in SPARQL.  OPTIONAL addresses
      > only part of that functionality.

Information attached to a bNode can be found by repeating the part of the
query that located the bNode in the first place.  This includes inverse
functional properties so if the graph node has some uniquely identifying
feature it can quickly be found by a subsequent query.

In RDF, bNode do not have a globally unique name - many systems do indeed
implement them as such (although incompatibly) but in specifying an RDF query
language, this can't be forced on implementations.

Specifically on RDF collections (lists) which are a common source of examples
here, some systems provide property-based access to list members:

cwm/Euler: list:in
http://www.w3.org/2000/10/swap/doc/CwmBuiltins

ARQ provides list:member (and rdfs:member) for member access:
http://jena.sourceforge.net/ARQ/extension.html

Given that the best way forward is not clear currently, the working group 
recorded two relevant postponed issues:

#bnodeRef
http://www.w3.org/2001/sw/DataAccess/issues#bnodeRef

#accessingCollections
http://www.w3.org/2001/sw/DataAccess/issues#accessingCollections


      > 4)  SPARQL contains a large number of top-level commands.  This could
      > be a result again of the lack of subqueries and an underlying model
      > theory.  It is an unfortunate design choice.

The working group documented the use case and requirements in
http://www.w3.org/TR/rdf-dawg-uc/

Assuming "top level" command referred to the 4 query forms SELECT, CONSTRUCT,
DESCRIBE, ASK.

CONSTRUCT and DESCRIBE returns RDF graphs (requirement 3.4)
http://www.w3.org/TR/rdf-dawg-uc/#r3.4

CONSTRUCT takes a template (c.f. projection variables)
DESCRIBE enables a client to inspect the graph returned so can be given
information without a priori knowledge of structure.

SELECT returns a result set (streamed), with limits
http://www.w3.org/TR/rdf-dawg-uc/#r3.2

ASK returns a single boolean value.
http://www.w3.org/TR/rdf-dawg-uc/#d4.9

      > 5)  Good language design would dictate that logical opposites (e.g.
      > conjunction and disjunction) be represented in syntactically similar
      > ways, even if one is generally implicit.  Therefore, since UNION
      > (disjunction) is present (as well it should be) along with conjunction,
      > then conjunction should have an optional equivalent keyword even though
      > it is implicit in most uses.

The working group decided that the conjunctive triple patterns should be
modelled on Turtle/N3 syntax to reuse a potentially familiar concept.  This
does have "." as a syntactic element.

      > 6)  I was initially concerned about the definitions of subjects as
      > including literals in triple patterns, until Andy Seaborne pointed out
      > the differences between a triple and a triple matching pattern.
      >
      > 7)  The nulls generated by UNION and the nulls generated by OPTIONAL
      > may be distinct.  They correspond to logical true and false,
      > respectively.  That makes life a bit difficult for implementors.  It
      > may be that another form of 'null' should be considered.  Thanks to
      > Simon Raboczi for this analysis.

Strictly, SPARQL does not have nulls - variables in some solutions or a result
set may not be have a value bound to them, representing the lack of need for a
value in order for the pattern to match with this solution.

If you have a use case here, it would be helpful.

      >
      > 8)  There does not seem to be any way to force a literal into the
      > variable position in a binding.  That is very useful when attempting to
      > create a result which must take a certain form (e.g. be a set of
      > triples) and occasionally mandatory if the result set must be forced
      > into triple form.

There is a FILTER builtin "isLiteral" that tests for whether an RDF term is a
literal or not.  Only literals found in the graph can be returned.

      > INTEROPERABILITY ISSUES:
      >
      > 1)  There is, to the best of my knowledge, no way for a SPARQL user to
      > command the creation of an RDF container within a data store.  The lack
      > of an explicit command will encourage implementors to create their own,
      > thereby hurting interoperability.  Similarly for deletion requests.

The working group was chartered for data access and graph update is out-of-scope.
http://www.w3.org/2003/12/swa/dawg-charter#update
This is interpreted as covering graph creation as well.

I hope that another working group will be charted to address this when
implementation deployment experience on the web indicates ways of doing that.


      > 2)  The form and content of DESCRIBE results are left to the data
      > publisher.  It would seem that such an open-ended conversation would
      > require a human consumer in the general case.  I am left wondering why
      > the DESCRIBE functionality is not left to a more general SELECT query
      > against a describing RDF container.

The result of DESCRIBE is an RDF graph, not a result set.  The client can
inspect that graph (just as if it had read a whole graph with HTTP GET) to
determine the information provided.

      > SCALABILITY CONCERNS:
      >
      > 1)  The evaluation of regular expressions after the binding of graph
      > patterns rules out a lot of potential join optimizations.

The document covers what a client can assume of a query processor implementing
SPARQL.  It does not prescribe how that is implemented.  In particular,
systems already reorder queries to insert filter expressions (including
regular expressions) at the best point and also to push them into the graph
matching process.

      > 2)  OPTIONAL, in its entirety.  The concern is that querying very large
      > data sets using OPTIONAL would result in very large intermediate
      > results requiring joining.  Subqueries effectively sidestep that
      > problem by allowing further restrictions against a smaller result set.

I note that "subquery" might be referring to Kowari's use of Relation-Valued
Attributes [DateCJ ed 8, Introduction to Database Systems p152-153,590].  Also
described as nested tables.

The report [1] discusses some implementation possibilties, in particular using
left outer join to implement OPTIONAL.

[1] http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html

As we proceed through CR, implementation experience will grow but existing
prototypes are already showing efficient query (e.g. 3Store)

[If it was referring to SQL subqueries: The equivalent of SQL subqueries in
the form of intermediate tables are present in SPARQL through the arbitrary
composition of graph patterns into larger graph patternms.  SPARQL does not
provide aggregate operators akin to SQL's IN/ANY/SOME/ALL.]

      > 3)  UNSAID would have been of great concern with regards to
      > scalability, just in case it comes back  :)

OPTIONAL and BOUND can be used to achieve the effect in many cases.  The use
of bound is not actually needed as it's functional can also be achieved in
roundabout ways (e.g. FILTER ( ( ?x = 3 ) || ! (?x = 3 ) )

http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0162

      > SPECIFICATION ISSUES:
      >
      > 1)  It would be really nice if the grammar was ordered for easier
      > reading, perhaps alphabetically.

The grammar is ordered roughly top-down, from the parser entry level "query"
production to tokens.  All terms are hyperlinked to their definition if they
are no inline tokens.

--------

I hope this message responds to your comments - please let us know if it does,

	Andy
Received on Wednesday, 8 February 2006 15:09:03 UTC