ACTION: propose test case for scalar constraint with syntax from Simon Raboczi on 2004-12-14 (public-rdf-dawg@w3.org from October to December 2004)

From: Simon Raboczi <raboczi@tucanatech.com>
Date: Tue, 14 Dec 2004 20:18:29 +1000
To: public-rdf-dawg@w3.org
Message-Id: <80174065-4DB9-11D9-AB85-000A95C5686E@tucanatech.com>
This message is regards my objection to the test sparql-query-example-3.

   http://www.w3.org/2001/sw/DataAccess/tests/#sparql-query-example-3

I'll be working from revision 1.156 of the SPARQL spec:

   http://www.w3.org/2001/sw/DataAccess/rq23

The SPARQL query involved in the test as it stands is as follows:

Test 3:

   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
   PREFIX  ns:  <http://example.org/ns#>
   SELECT  ?title ?price
   WHERE   ( ?x dc:title ?title )
           ( ?x ns:price ?price ) AND ?price < 30

I don't object to the test itself, but to the feature that it tests.  
Accepting this test would endorse our differentiation of a Constraint 
like "?price < 30" from a Triple Pattern like "( ?x ns:price ?price )".

The entire point of RDF is to be a common model (the RDF graph) which 
we can use to combine descriptions in different domains.  We'll cripple 
its ability to do this if we start excluding domains from this common 
model.  Arithmetic is a particularly important description domain, but 
it doesn't need to be a special case with over a third of the 
non-terminal productions in the grammar dedicated to it.  This is not 
only unnecessary; it's actively harmful to extensibility.  What happens 
when the next "special case" domain pops up?  Every domain is a 
"special case" to some community out there with an existing notation, 
and that existing notation will doubtless be prettier for its 
specialized domain than RDF/SPARQL is.  However, we can't go extending 
the grammar of SPARQL every time a new domain pops up.  The RDF way is 
to define some resources that map that domain into the RDF graph and 
thus into SPARQL's grammar.  That's the extensible way to do it.

As I see it, the current differentiation of a Constraint from a Triple 
Pattern serves two purposes:

Purpose 1: Most importantly, it indicates whether the variable 
substitutions can be solved by calculation (for a Constraint) or by 
lookup in a graph (for a Triple Pattern).  You can't answer the query 
without being able to determine this difference.  Having two different 
syntaxes makes the difference abundantly clear.

Purpose 2: Less importantly, it allows an entirely different grammar 
optimized for arithmetic to be used for Constraints.  Even though the 
grammar is fairly complex, involving a switch to infix notation and 
introducing precedence rules, it's a well-known language that users 
will already know from elsewhere.  Including chunks of other languages 
to bypass a learning curve worked well for Perl, after all.

I'll try here to show an alternative to the current design, using 
mechanisms already included in SPARQL to achieve the two abovementioned 
purposes.  Firstly, this is the test rewritten to use only Triple 
Patterns:

Test 3a:

   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
   PREFIX  ns:  <http://example.org/ns#>
   PREFIX  op:  <http://www.w3.org/2001/sw/DataAccess/operations>
   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
   SELECT  ?title ?price
   WHERE   ( ?x dc:title ?title )
           ( ?x ns:price ?price )
           ( ?price op:numeric-less-than "30"^^xsd:decimal )

Taking this approach generally would allow us to entirely remove the 
final option in production 10 of the current grammar and eliminate 
productions 18-36 entirely:

   [10]  PatternElement ::= TriplePattern
                          | GroupGraphPattern
                          | SourceGraphPattern
                          | OptionalGraphPattern
                          | 'AND'? Expression     <-- this gets removed

I'll suspend my zealotry for avoiding special cases long enough to 
admit that it's probably worth granting syntactic support to a property 
like op:numeric-less-than.  We could do this by extending production 37 
to include "<" as a synonym for "op:numeric-less-than".

   [37]  Literal ::= URI
                   | NumericLiteral
                   | TextLiteral
                   | RelationalOperator    <-- this gets added

   [??]  RelationalOperator ::= '||' | '&&' | 'EQ' | 'NE' | '=~' | '!~'  
  <--- new production
                              | '==' | '!=' | '<'  | '>'  | '<=" | '>='  
  <---

Using the special abbreviated syntax for both RelationalOperator and 
NumericLiteral, we can rewrite the test query in a form thqt I'd argue 
is actually easier to understand than the original 
sparql-query-example-3:

Test 3b:

   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
   PREFIX  ns:  <http://example.org/ns#>
   SELECT  ?title ?price
   WHERE   ( ?x dc:title ?title )
           ( ?x ns:price ?price )
           ( ?price < 30 )

I should point out before proceeding further that I haven't fully dealt 
with Purpose 1 above yet.  Not everything in arithmetic is a 
RelationalOperator, and I'll need to provide a solution for N-ary 
operators ('+', '*'), binary operators ('-', '/', '%'), unary operators 
'+', '-', '!') and user-defined functions.  That's more difficult and 
I'll get back to it after treating Purpose 2.

Purpose 2 for distinguishing between Constraints and Triple Patterns 
was so that we'd know whether to solve by calculation or by consulting 
a graph.  In the absence of the AND keyword, I can imagine at least two 
ways this can still be done with the remainder of SPARQL.  The first 
possibility is to distinguish based on the predicate.  Any Triple 
Pattern whose predicate happened to be in the op: namespace would have 
its solution calculated rather than depending on the graph.  Under this 
scheme Test 3a and Test 3b would be valid as written above.  If this 
default behavior wasn't desired, SOURCE could be used to specify that 
the pattern should be targeted against a graph instead.

The second approach is to dispense with magic entirely and explicitly 
state how each Triple Pattern is to be solved.  We could introduce a 
standard graph (say, xsd: for example's sake) and use SOURCE as 
follows:

Test 3c:

   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
   PREFIX  ns:  <http://example.org/ns#>
   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
   SELECT  ?title ?price
   WHERE   ( ?x dc:title ?title )
           ( ?x ns:price ?price )
           SOURCE xsd: ( ?price < 30 )

This approach of introducing datatype operations using standard graphs 
and leaving the grammar purely composed of Triple Patterns is what I 
personally prefer.  This would decouple SPARQL from XML schema (except 
for NumericLiteral) in much the same way that RDF is decoupled from any 
particular datatype (except for rdf:XMLLiteral).  The standard xsd: 
graph could be an entirely different document, and a concrete example 
to third parties of SPARQL's extension mechanism.  We'd get to choose 
whether the availability of the xsd: graph in a SPARQL processor is 
mandatory or optional, possibly lowering the implementation bar.

Now, back to the unfinished business...

I deferred dealing with operators other than comparisons.  These are 
easy because they map naturally to Triple Patterns.  Looking at section 
11.1.2, unary operators like op:numeric-unary-minus could also fit 
naturally into a Triple Pattern, but anything with more than one 
parameter such as op:numeric-subtract won't.

A way to deal with functions of arbitrary arity is to make them 
properties of RDF Containers and/or Collections.  So if I want to add 
1+1:

   PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
   SELECT  ?x
   WHERE   SOURCE xsd: ( ?args op:numeric-add ?x )
           ( ?args rdf:type rdf:Seq )
           ( ?args rdf:_1 1 )
           ( ?args rdf:_2 1 )

That's pretty horrible, aesthetically.  But Containers and Collections 
are actually part of RDF, so it's easier to justify some kind of 
notation to support them, a-la RDF/XML, N3 or Turtle.  For the sake of 
example, I'll use [[ ]] to indicate a shortcut syntax that creates a 
blank node of type rdf:Seq, because all other bracketing characters 
have already been used in SPARQL.  The arithmetic example above is now 
more plausible:

   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
   SELECT  ?x
   WHERE   SOURCE xsd: ( [[ 1 1 ]] op:numeric-add ?x )

Just to push the idea hard enough to see it strain, this is what the 
quadratic formula x=(-b +/- sqrt(b^2-4ac))/2a could look like:

   PREFIX  op:     <http://www.w3.org/2001/sw/DataAccess/operations>
   PREFIX  opx:    <http://example.com/extended-operations#>
   PREFIX  sparql: <???is-this-defined-yet???>
   PREFIX  xsd:    <http://www.w3.org/2001/XMLSchema#>
   SELECT  ?x
   FROM    xsd:
   WHERE   ( ?a sparql:eq 12.3 )
           ( ?b sparql:eq 45.6 )
           ( ?c sparql:eq 789 )
           ( [[ ?top ?bottom ]] op:numeric-divide ?x )
           ( [[ ?minus_b ?det ]] op:numeric-add ?top ) OR ( [[ ?minus_b 
?det ]] op:numeric-subtract ?top )
           ( [[ ?b ]] op:numeric-unary-minus ?minus_b )
           SOURCE opx: { [[ ?det2 ]] opx:positive-square-root ?det )
           ( [[ ?b2 ?four_ac ]] op:numeric-subtract ?det2 )
           ( [[ ?b ?b ]] op:numeric-multiply ?b2 )
           ( [[ 4 ?ac ]] op:numeric-multiply ?four_ac )
           ( [[ ?a ?c ]] op:numeric-multiply ?ac )
           ( [[ 2 ?a ]] op:numeric-multiply ?bottom )

The opx:positive-square-root is an example of what a user-defined 
extension might look like.  The point isn't that this is a particularly 
attractive notation for arithmetic, but that's it's a feasible notation 
for arithmetic and everything else that might come along in future.

In summary, if we remove the arithmetic part of the grammar, add 
anonymous containers/collections to the grammar, and define a standard 
graph representing arithmetic on XML Scheme datatypes, we retain all 
the existing functionality.  We additionally benefit from a simpler 
grammar, a simpler query model, a well-defined extension mechanism, 
easier use of containers/collections, and we minimize the coupling 
between SPARQL and XML Schema datatypes.  The disadvantage is that 
arithmetic no longer has syntax optimized just for it.

Tying this back to the original ACTION, Test 3c above would be my 
preferred alternative to sparql-query-example-3.  However, as I would 
prefer to see the arithmetic capabilities modularized out to an 
entirely separate document, I'd ideally like to see this test removed 
from the SPARQL tests entirely and become a test for the hypothetical 
xsd: standard graph specification instead.
Received on Tuesday, 14 December 2004 10:19:13 UTC