Can we have interoperability without a formal semantics? (Was RE: Comments list comments) from Thompson, Bryan B. on 2005-03-22 (public-rdf-dawg@w3.org from January to March 2005)

From: Thompson, Bryan B. <BRYAN.B.THOMPSON@saic.com>
Date: Tue, 22 Mar 2005 06:27:21 -0500
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Cc: "Bebee, Bradley R." <BRADLEY.R.BEBEE@saic.com>, "Personick, Michael R." <MICHAEL.R.PERSONICK@saic.com>
Message-Id: <D24D16A6707B0A4B9EF084299CE99B391E77D4A2@mcl-its-exs02.mail.saic.com>

Steve,

Don't you think that we will need to have a formal semantics in order to
achieve interoperability?  Test cases at the inputs and outputs level can
go some distance toward identifying problems, but there are always the 
possible misunderstandings and edge conditions that are not covered by the
test cases, are not part of any conformance suite, and will be the source
of interoperability failure.  Without a formal semantics for SPARQL, how
can we hope to have vendor interoperability?

Our other concern arising from the absence of a formal semantics, is that
SPARQL may be difficult to optimize, as was recently raised on the comments
list[1,2], or difficult to extend.  Reliable query performance at scale is a
key concern for our customers as they are seeking to federate large
databases
using semantic web technologies.  A failure to deliver here could spoil the
entire "semantic web upswell" with a few "no, you can't do that with SPARQL"
case studies.  Equally, if we do not have a formal semantics, how can we be
certain that we can extend the syntax, e.g., to new features such as you
have
mentioned, without violating the existing semantics?

Thanks,

-bryan

[1]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/0035.ht
ml
[2]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/0042.ht
ml
[3]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Mar/0048.ht
ml

-----Original Message-----
From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
On Behalf Of Steve Harris
Sent: Tuesday, March 22, 2005 5:14 AM
To: RDF Data Access Working Group
Subject: Re: Comments list comments

This is a general comment on feature creep.

While these are useful features, they are complicated to test, time
consuming to implement, and I dont really want a precident for us adding
every feature that someone requests on the comments list.

If we carry on at this rate will will end up with something (eventually)
that is essentially SQL for triples, but woefully underspecified. The SQL
'92 spec is large, and still not extensive enough to give real
interoperability between SQL systems. I dont want SPARQL to fall into the
same trap when we have a chance to make a clean start with real
interoperablity. Every complicated feature that gets added coumpunds the
risk that a significant proportion of developers will not support some part
of the specification, preventing interop.

- Steve

On Mon, Mar 21, 2005 at 04:04:29 +0000, Andy Seaborne wrote:
> 
> Matters arising from the comments list:
> 
> 
> 1/ SELECT to involve expressions
> 
> SQL allows constants and expressions in explicit projections (SQL 
> SELECT in
> other words)
> 
>   SELECT ?x "constant" ...
>   SELECT ?x (?x+?y) ...
> 
> Combined with nested SELECTs and UNIONs, we would have a way to tag 
> which
> branch of a union a solution came from.  This can already be done using 
> different variables in each branch.
> 
> This would require access to results by column number (or aliases 
> which are
> not required by SQL) and so have impact on the results format.
> 
> At the moment, SPARQL UNION is defined without the explicit SELECT
> projection and is a graph pattern operator.  There is no no assignment of 
> values - it's not possible to return RDF terms that are not in the graph
or 
> a dataset label.
> 
> 
> 2/ GROUP BY
> 
> Request for SQL-like GROUP BY in addition to ORDER BY.  GROUP BY 
> allows the
> application of aggrgeate functions which is more problematic than ORDER BY

> (that only chnages the order of solutions, it does not remove, add or 
> change solutions).  It's use with aggregation functions like sum(),
count() 
> that is tricky because of defining what is being counted (names or 
> individuals).
> 
> COUNT() can lead to a significant decrease in network bandwidth but I 
> have
> not seen a proposal as to what it means for RDF query that explciitly 
> addresses the closed world assumptions.
> 
> 
> 	Andy
>

Received on Tuesday, 22 March 2005 11:27:40 UTC