- From: Simon Raboczi <raboczi@tucanatech.com>
- Date: Tue, 14 Dec 2004 20:18:29 +1000
- To: public-rdf-dawg@w3.org
This message is regards my objection to the test sparql-query-example-3.
http://www.w3.org/2001/sw/DataAccess/tests/#sparql-query-example-3
I'll be working from revision 1.156 of the SPARQL spec:
http://www.w3.org/2001/sw/DataAccess/rq23
The SPARQL query involved in the test as it stands is as follows:
Test 3:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
SELECT ?title ?price
WHERE ( ?x dc:title ?title )
( ?x ns:price ?price ) AND ?price < 30
I don't object to the test itself, but to the feature that it tests.
Accepting this test would endorse our differentiation of a Constraint
like "?price < 30" from a Triple Pattern like "( ?x ns:price ?price )".
The entire point of RDF is to be a common model (the RDF graph) which
we can use to combine descriptions in different domains. We'll cripple
its ability to do this if we start excluding domains from this common
model. Arithmetic is a particularly important description domain, but
it doesn't need to be a special case with over a third of the
non-terminal productions in the grammar dedicated to it. This is not
only unnecessary; it's actively harmful to extensibility. What happens
when the next "special case" domain pops up? Every domain is a
"special case" to some community out there with an existing notation,
and that existing notation will doubtless be prettier for its
specialized domain than RDF/SPARQL is. However, we can't go extending
the grammar of SPARQL every time a new domain pops up. The RDF way is
to define some resources that map that domain into the RDF graph and
thus into SPARQL's grammar. That's the extensible way to do it.
As I see it, the current differentiation of a Constraint from a Triple
Pattern serves two purposes:
Purpose 1: Most importantly, it indicates whether the variable
substitutions can be solved by calculation (for a Constraint) or by
lookup in a graph (for a Triple Pattern). You can't answer the query
without being able to determine this difference. Having two different
syntaxes makes the difference abundantly clear.
Purpose 2: Less importantly, it allows an entirely different grammar
optimized for arithmetic to be used for Constraints. Even though the
grammar is fairly complex, involving a switch to infix notation and
introducing precedence rules, it's a well-known language that users
will already know from elsewhere. Including chunks of other languages
to bypass a learning curve worked well for Perl, after all.
I'll try here to show an alternative to the current design, using
mechanisms already included in SPARQL to achieve the two abovementioned
purposes. Firstly, this is the test rewritten to use only Triple
Patterns:
Test 3a:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
PREFIX op: <http://www.w3.org/2001/sw/DataAccess/operations>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?title ?price
WHERE ( ?x dc:title ?title )
( ?x ns:price ?price )
( ?price op:numeric-less-than "30"^^xsd:decimal )
Taking this approach generally would allow us to entirely remove the
final option in production 10 of the current grammar and eliminate
productions 18-36 entirely:
[10] PatternElement ::= TriplePattern
| GroupGraphPattern
| SourceGraphPattern
| OptionalGraphPattern
| 'AND'? Expression <-- this gets removed
I'll suspend my zealotry for avoiding special cases long enough to
admit that it's probably worth granting syntactic support to a property
like op:numeric-less-than. We could do this by extending production 37
to include "<" as a synonym for "op:numeric-less-than".
[37] Literal ::= URI
| NumericLiteral
| TextLiteral
| RelationalOperator <-- this gets added
[??] RelationalOperator ::= '||' | '&&' | 'EQ' | 'NE' | '=~' | '!~'
<--- new production
| '==' | '!=' | '<' | '>' | '<=" | '>='
<---
Using the special abbreviated syntax for both RelationalOperator and
NumericLiteral, we can rewrite the test query in a form thqt I'd argue
is actually easier to understand than the original
sparql-query-example-3:
Test 3b:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
SELECT ?title ?price
WHERE ( ?x dc:title ?title )
( ?x ns:price ?price )
( ?price < 30 )
I should point out before proceeding further that I haven't fully dealt
with Purpose 1 above yet. Not everything in arithmetic is a
RelationalOperator, and I'll need to provide a solution for N-ary
operators ('+', '*'), binary operators ('-', '/', '%'), unary operators
'+', '-', '!') and user-defined functions. That's more difficult and
I'll get back to it after treating Purpose 2.
Purpose 2 for distinguishing between Constraints and Triple Patterns
was so that we'd know whether to solve by calculation or by consulting
a graph. In the absence of the AND keyword, I can imagine at least two
ways this can still be done with the remainder of SPARQL. The first
possibility is to distinguish based on the predicate. Any Triple
Pattern whose predicate happened to be in the op: namespace would have
its solution calculated rather than depending on the graph. Under this
scheme Test 3a and Test 3b would be valid as written above. If this
default behavior wasn't desired, SOURCE could be used to specify that
the pattern should be targeted against a graph instead.
The second approach is to dispense with magic entirely and explicitly
state how each Triple Pattern is to be solved. We could introduce a
standard graph (say, xsd: for example's sake) and use SOURCE as
follows:
Test 3c:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?title ?price
WHERE ( ?x dc:title ?title )
( ?x ns:price ?price )
SOURCE xsd: ( ?price < 30 )
This approach of introducing datatype operations using standard graphs
and leaving the grammar purely composed of Triple Patterns is what I
personally prefer. This would decouple SPARQL from XML schema (except
for NumericLiteral) in much the same way that RDF is decoupled from any
particular datatype (except for rdf:XMLLiteral). The standard xsd:
graph could be an entirely different document, and a concrete example
to third parties of SPARQL's extension mechanism. We'd get to choose
whether the availability of the xsd: graph in a SPARQL processor is
mandatory or optional, possibly lowering the implementation bar.
Now, back to the unfinished business...
I deferred dealing with operators other than comparisons. These are
easy because they map naturally to Triple Patterns. Looking at section
11.1.2, unary operators like op:numeric-unary-minus could also fit
naturally into a Triple Pattern, but anything with more than one
parameter such as op:numeric-subtract won't.
A way to deal with functions of arbitrary arity is to make them
properties of RDF Containers and/or Collections. So if I want to add
1+1:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?x
WHERE SOURCE xsd: ( ?args op:numeric-add ?x )
( ?args rdf:type rdf:Seq )
( ?args rdf:_1 1 )
( ?args rdf:_2 1 )
That's pretty horrible, aesthetically. But Containers and Collections
are actually part of RDF, so it's easier to justify some kind of
notation to support them, a-la RDF/XML, N3 or Turtle. For the sake of
example, I'll use [[ ]] to indicate a shortcut syntax that creates a
blank node of type rdf:Seq, because all other bracketing characters
have already been used in SPARQL. The arithmetic example above is now
more plausible:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?x
WHERE SOURCE xsd: ( [[ 1 1 ]] op:numeric-add ?x )
Just to push the idea hard enough to see it strain, this is what the
quadratic formula x=(-b +/- sqrt(b^2-4ac))/2a could look like:
PREFIX op: <http://www.w3.org/2001/sw/DataAccess/operations>
PREFIX opx: <http://example.com/extended-operations#>
PREFIX sparql: <???is-this-defined-yet???>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?x
FROM xsd:
WHERE ( ?a sparql:eq 12.3 )
( ?b sparql:eq 45.6 )
( ?c sparql:eq 789 )
( [[ ?top ?bottom ]] op:numeric-divide ?x )
( [[ ?minus_b ?det ]] op:numeric-add ?top ) OR ( [[ ?minus_b
?det ]] op:numeric-subtract ?top )
( [[ ?b ]] op:numeric-unary-minus ?minus_b )
SOURCE opx: { [[ ?det2 ]] opx:positive-square-root ?det )
( [[ ?b2 ?four_ac ]] op:numeric-subtract ?det2 )
( [[ ?b ?b ]] op:numeric-multiply ?b2 )
( [[ 4 ?ac ]] op:numeric-multiply ?four_ac )
( [[ ?a ?c ]] op:numeric-multiply ?ac )
( [[ 2 ?a ]] op:numeric-multiply ?bottom )
The opx:positive-square-root is an example of what a user-defined
extension might look like. The point isn't that this is a particularly
attractive notation for arithmetic, but that's it's a feasible notation
for arithmetic and everything else that might come along in future.
In summary, if we remove the arithmetic part of the grammar, add
anonymous containers/collections to the grammar, and define a standard
graph representing arithmetic on XML Scheme datatypes, we retain all
the existing functionality. We additionally benefit from a simpler
grammar, a simpler query model, a well-defined extension mechanism,
easier use of containers/collections, and we minimize the coupling
between SPARQL and XML Schema datatypes. The disadvantage is that
arithmetic no longer has syntax optimized just for it.
Tying this back to the original ACTION, Test 3c above would be my
preferred alternative to sparql-query-example-3. However, as I would
prefer to see the arithmetic capabilities modularized out to an
entirely separate document, I'd ideally like to see this test removed
from the SPARQL tests entirely and become a test for the hypothetical
xsd: standard graph specification instead.
Received on Tuesday, 14 December 2004 10:19:13 UTC