Partial response to: [Fwd: SPARQL Comments (Personal)]

Response to:
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Nov/0012.html
which is
http://lists.w3.org/Archives/Public/public-swbp-wg/2005Oct/0107.html

Outstanding comments 2 and 3:
   2 is about told bNodes
   3 is about transitive properties

	Andy

-------- Original Message --------
Subject: SPARQL Comments (Personal)
Resent-Date: Sat, 05 Nov 2005 12:27:28 +0000
Resent-From: public-rdf-dawg-comments@w3.org
Date: Sat, 5 Nov 2005 07:27:19 -0500
From: David Wood <dwood@softwarememetics.com>
To: public-rdf-dawg-comments@w3.org
CC: public-swbp-wg@w3.org


Hi all,

I have made some comments on the SPARQL language at [1].  A brief
discussion thread ensued.

Please note that DanC has already drawn my attention ([2]) to the
isLiteral filter and the state of UNSAID.

These comments should be taken as PERSONAL and do NOT represent any
W3C working group.

[1] http://lists.w3.org/Archives/Public/public-swbp-wg/2005Oct/0107.html
[2] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/
0162.html

Regards,
Dave

-------- Original Message --------
 > From:
 > http://lists.w3.org/Archives/Public/public-swbp-wg/2005Oct/0107.html
 >
 > Hi all,
 >
 > I have an action item [1] to review and comment on the specification
 > for the SPARQL Query Language for RDF [2].
 >
 > I reviewed the 21 July 2005 Working Draft.
 >
 > Regards,
 > Dave
 >
 > [1] http://www.w3.org/2005/10/03-swbp-minutes.html#action17
 > [2] http://www.w3.org/TR/rdf-sparql-query/
 > -----------------------------%
 > <--------------------------------------------
 >
 > Review of SPARQL Query Language for RDF Working Draft
 >
 > OVERVIEW:
 >
 >
 > LANGUAGE ISSUES:
 >
 > 1)  SPARQL would appear not to have a complete model theoretic base.
 > Although sections of the specification are described using sets,
 > nothing is presented which hangs all of the section together.  This is
 > unfortunate and is, I think, the underlying cause of some of language
 > features critiqued below.  (NB:  I tried to get Simon Raboczi to
 > complete and submit his model theory unifying iTQL and SPARQL, but was
 > unsuccessful in doing so.)
 >
 > 2)  The language is not built with extensibility in mind.  That is, is
 > not, in my opinion, sufficiently functional.  There are several areas
 > of functionality which we already know are of interest to users of RDF
 > query languages (e.g. iTQL's 'walk' and 'trans' functions which perform
 > generic graph walking and transitive closure,
 > respectfully) and it is difficult to see how one might add these
 > commands to a later SPARQL version without making wholesale changes to
 > the language.

I understand this comment may be submitted formally by SWBPD

 >
 > 3)  The handling of blank nodes ("bnodes") is, again in my opinion, the
 > single greatest failure of the specification.  We have to admit that
 > RDF graphs contain bnodes and queries will run across them.  We also
 > have to admit that a querier will (not 'may') often want to
 > subsequently find information connected to those bnodes.  SPARQL's
 > insistence that bnodes' true internal identities not be returned to a
 > querier (correct in and of itself) combined with the lack of subquery
 > capability ensures that many useful RDF queries routinely performed in
 > other languages simply cannot be written in SPARQL.  OPTIONAL addresses
 > only part of that functionality.

I understand this comment may be submitted formally by SWBPD

see also part of
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jun/0039.html
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jul/0006.html

and:
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jul/0010.html

 > 4)  SPARQL contains a large number of top-level commands.  This could
 > be a result again of the lack of subqueries and an underlying model
 > theory.  It is an unfortunate design choice.

The working group documented the use case and requirments in 
http://www.w3.org/TR/rdf-dawg-uc/

Assuming "top level" command referred to the 4 query forms SELECT, CONSTRUCT, 
DESCRIBE, ASK.

CONSTRUCT and DESCRIBE returns RDF graphs (requirement 3.4)
http://www.w3.org/TR/rdf-dawg-uc/#r3.4

CONSTRUCT takes a template (c.f. projection variables)
DESCRIBE enables a client to inspect the graph returned so can be given 
information without a priori knowledge of structure.

SELECT returns a result set (streamed), with limits
http://www.w3.org/TR/rdf-dawg-uc/#r3.2

ASK returns a single boolean value.
http://www.w3.org/TR/rdf-dawg-uc/#d4.9

 > 5)  Good language design would dictate that logical opposites (e.g.
 > conjunction and disjunction) be represented in syntactically similar
 > ways, even if one is generally implicit.  Therefore, since UNION
 > (disjunction) is present (as well it should be) along with conjunction,
 > then conjunction should have an optional equivalent keyword even though
 > it is implicit in most uses.

The working group decided that the conjunctive triple patterns should be 
modelled on Turtle/N3 syntax to reuse a potetially familar concept.  This does 
have "." as a syntactic element.

 > 6)  I was initially concerned about the definitions of subjects as
 > including literals in triple patterns, until Andy Seaborne pointed out
 > the differences between a triple and a triple matching pattern.
 >
 > 7)  The nulls generated by UNION and the nulls generated by OPTIONAL
 > may be distinct.  They correspond to logical true and false,
 > respectively.  That makes life a bit difficult for implementors.  It
 > may be that another form of 'null' should be considered.  Thanks to
 > Simon Raboczi for this analysis.
 >
 > 8)  There does not seem to be any way to force a literal into the
 > variable position in a binding.  That is very useful when attempting to
 > create a result which must take a certain form (e.g. be a set of
 > triples) and occasionally mandatory if the result set must be forced
 > into triple form.

There is a FILTER builtin "isLiteral" that tests for whether an RDF term is a 
literal or not.

 > INTEROPERABILITY ISSUES:
 >
 > 1)  There is, to the best of my knowledge, no way for a SPARQL user to
 > command the creation of an RDF container within a data store.  The lack
 > of an explicit command will encourage implementors to create their own,
 > thereby hurting interoperability.  Similarly for deletion requests.

The working group was chartered for data access and graph update is out-of-scope.
http://www.w3.org/2003/12/swa/dawg-charter#update
This is interpreted as covering graph creation as well.

I hope that another working group will be charted to address this when 
implementation deployment experience on the web indicates ways of doing that.


 > 2)  The form and content of DESCRIBE results are left to the data
 > publisher.  It would seem that such an open-ended conversation would
 > require a human consumer in the general case.  I am left wondering why
 > the DESCRIBE functionality is not left to a more general SELECT query
 > against a describing RDF container.

The result of DESCRIBE is an RDF graph, not a result set.  The client can 
inspect that graph (just as if it had read a whole graph with HTTP GET) to 
detemine the information provided.

 > SCALABILITY CONCERNS:
 >
 > 1)  The evaluation of regular expressions after the binding of graph
 > patterns rules out a lot of potential join optimizations.

The document covers what a client can assume of a query processor implementing 
SPARQL.  It does not prescribe how that is implemented.  In particular, 
systems already reorder queries to insert filter expressions (including 
regular expressions) at the best point and also to push them into the graph 
matching process.

 > 2)  OPTIONAL, in its entirety.  The concern is that querying very large
 > data sets using OPTIONAL would result in very large intermediate
 > results requiring joining.  Subqueries effectively sidestep that
 > problem by allowing further restrictions against a smaller result set.

I note that "subquery" might be referring to Kowari's use of Relation-Valued 
Attributes [DateCJ ed 8, Introduction to Database Systems p152-153,590].  Also 
described as nested tables.

The report [1] discusses some implementation possibilties, in particular using 
left outer join to implement OPTIONAL.

[1] http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html

As we proceed thorugh CR, implementation experience will grow but existing 
prototypes are already showing efficient query (e.g. 3Store)

[If it was referring to SQL subqueries: The equivalent of SQL subqueries in 
the form of intermediate tables are present in SPARQL through the arbitrary 
composition of graph patterns into larger graph patternms.  SPARQL does not 
provide aggregate operators akin to SQL's IN/ANY/SOME/ALL.]

 > 3)  UNSAID would have been of great concern with regards to
 > scalability, just in case it comes back :)

OPTIONAL and BOUND can be used to achieve the effect in many cases.  The use 
of bound is not actually needed as it's functional can also be achieved in 
roundabout ways (e.g. FILTER ( ( ?x = 3 ) || ! (?x = 3 ) )

http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0162

 > SPECIFICATION ISSUES:
 >
 > 1)  It would be really nice if the grammar was ordered for easier
 > reading, perhaps alphabetically.

The grammar is ordered roughly top-down, from the parser entry level "query" 
production to tokens.  All terms are hyperlinked to their definition if they 
are no inline tokens.

	Andy

Received on Friday, 11 November 2005 15:06:46 UTC