- From: Simon Raboczi <raboczi@tucanatech.com>
- Date: Tue, 14 Dec 2004 20:18:29 +1000
- To: public-rdf-dawg@w3.org
This message is regards my objection to the test sparql-query-example-3. http://www.w3.org/2001/sw/DataAccess/tests/#sparql-query-example-3 I'll be working from revision 1.156 of the SPARQL spec: http://www.w3.org/2001/sw/DataAccess/rq23 The SPARQL query involved in the test as it stands is as follows: Test 3: PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT ?title ?price WHERE ( ?x dc:title ?title ) ( ?x ns:price ?price ) AND ?price < 30 I don't object to the test itself, but to the feature that it tests. Accepting this test would endorse our differentiation of a Constraint like "?price < 30" from a Triple Pattern like "( ?x ns:price ?price )". The entire point of RDF is to be a common model (the RDF graph) which we can use to combine descriptions in different domains. We'll cripple its ability to do this if we start excluding domains from this common model. Arithmetic is a particularly important description domain, but it doesn't need to be a special case with over a third of the non-terminal productions in the grammar dedicated to it. This is not only unnecessary; it's actively harmful to extensibility. What happens when the next "special case" domain pops up? Every domain is a "special case" to some community out there with an existing notation, and that existing notation will doubtless be prettier for its specialized domain than RDF/SPARQL is. However, we can't go extending the grammar of SPARQL every time a new domain pops up. The RDF way is to define some resources that map that domain into the RDF graph and thus into SPARQL's grammar. That's the extensible way to do it. As I see it, the current differentiation of a Constraint from a Triple Pattern serves two purposes: Purpose 1: Most importantly, it indicates whether the variable substitutions can be solved by calculation (for a Constraint) or by lookup in a graph (for a Triple Pattern). You can't answer the query without being able to determine this difference. Having two different syntaxes makes the difference abundantly clear. Purpose 2: Less importantly, it allows an entirely different grammar optimized for arithmetic to be used for Constraints. Even though the grammar is fairly complex, involving a switch to infix notation and introducing precedence rules, it's a well-known language that users will already know from elsewhere. Including chunks of other languages to bypass a learning curve worked well for Perl, after all. I'll try here to show an alternative to the current design, using mechanisms already included in SPARQL to achieve the two abovementioned purposes. Firstly, this is the test rewritten to use only Triple Patterns: Test 3a: PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> PREFIX op: <http://www.w3.org/2001/sw/DataAccess/operations> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?title ?price WHERE ( ?x dc:title ?title ) ( ?x ns:price ?price ) ( ?price op:numeric-less-than "30"^^xsd:decimal ) Taking this approach generally would allow us to entirely remove the final option in production 10 of the current grammar and eliminate productions 18-36 entirely: [10] PatternElement ::= TriplePattern | GroupGraphPattern | SourceGraphPattern | OptionalGraphPattern | 'AND'? Expression <-- this gets removed I'll suspend my zealotry for avoiding special cases long enough to admit that it's probably worth granting syntactic support to a property like op:numeric-less-than. We could do this by extending production 37 to include "<" as a synonym for "op:numeric-less-than". [37] Literal ::= URI | NumericLiteral | TextLiteral | RelationalOperator <-- this gets added [??] RelationalOperator ::= '||' | '&&' | 'EQ' | 'NE' | '=~' | '!~' <--- new production | '==' | '!=' | '<' | '>' | '<=" | '>=' <--- Using the special abbreviated syntax for both RelationalOperator and NumericLiteral, we can rewrite the test query in a form thqt I'd argue is actually easier to understand than the original sparql-query-example-3: Test 3b: PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT ?title ?price WHERE ( ?x dc:title ?title ) ( ?x ns:price ?price ) ( ?price < 30 ) I should point out before proceeding further that I haven't fully dealt with Purpose 1 above yet. Not everything in arithmetic is a RelationalOperator, and I'll need to provide a solution for N-ary operators ('+', '*'), binary operators ('-', '/', '%'), unary operators '+', '-', '!') and user-defined functions. That's more difficult and I'll get back to it after treating Purpose 2. Purpose 2 for distinguishing between Constraints and Triple Patterns was so that we'd know whether to solve by calculation or by consulting a graph. In the absence of the AND keyword, I can imagine at least two ways this can still be done with the remainder of SPARQL. The first possibility is to distinguish based on the predicate. Any Triple Pattern whose predicate happened to be in the op: namespace would have its solution calculated rather than depending on the graph. Under this scheme Test 3a and Test 3b would be valid as written above. If this default behavior wasn't desired, SOURCE could be used to specify that the pattern should be targeted against a graph instead. The second approach is to dispense with magic entirely and explicitly state how each Triple Pattern is to be solved. We could introduce a standard graph (say, xsd: for example's sake) and use SOURCE as follows: Test 3c: PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?title ?price WHERE ( ?x dc:title ?title ) ( ?x ns:price ?price ) SOURCE xsd: ( ?price < 30 ) This approach of introducing datatype operations using standard graphs and leaving the grammar purely composed of Triple Patterns is what I personally prefer. This would decouple SPARQL from XML schema (except for NumericLiteral) in much the same way that RDF is decoupled from any particular datatype (except for rdf:XMLLiteral). The standard xsd: graph could be an entirely different document, and a concrete example to third parties of SPARQL's extension mechanism. We'd get to choose whether the availability of the xsd: graph in a SPARQL processor is mandatory or optional, possibly lowering the implementation bar. Now, back to the unfinished business... I deferred dealing with operators other than comparisons. These are easy because they map naturally to Triple Patterns. Looking at section 11.1.2, unary operators like op:numeric-unary-minus could also fit naturally into a Triple Pattern, but anything with more than one parameter such as op:numeric-subtract won't. A way to deal with functions of arbitrary arity is to make them properties of RDF Containers and/or Collections. So if I want to add 1+1: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?x WHERE SOURCE xsd: ( ?args op:numeric-add ?x ) ( ?args rdf:type rdf:Seq ) ( ?args rdf:_1 1 ) ( ?args rdf:_2 1 ) That's pretty horrible, aesthetically. But Containers and Collections are actually part of RDF, so it's easier to justify some kind of notation to support them, a-la RDF/XML, N3 or Turtle. For the sake of example, I'll use [[ ]] to indicate a shortcut syntax that creates a blank node of type rdf:Seq, because all other bracketing characters have already been used in SPARQL. The arithmetic example above is now more plausible: PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?x WHERE SOURCE xsd: ( [[ 1 1 ]] op:numeric-add ?x ) Just to push the idea hard enough to see it strain, this is what the quadratic formula x=(-b +/- sqrt(b^2-4ac))/2a could look like: PREFIX op: <http://www.w3.org/2001/sw/DataAccess/operations> PREFIX opx: <http://example.com/extended-operations#> PREFIX sparql: <???is-this-defined-yet???> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?x FROM xsd: WHERE ( ?a sparql:eq 12.3 ) ( ?b sparql:eq 45.6 ) ( ?c sparql:eq 789 ) ( [[ ?top ?bottom ]] op:numeric-divide ?x ) ( [[ ?minus_b ?det ]] op:numeric-add ?top ) OR ( [[ ?minus_b ?det ]] op:numeric-subtract ?top ) ( [[ ?b ]] op:numeric-unary-minus ?minus_b ) SOURCE opx: { [[ ?det2 ]] opx:positive-square-root ?det ) ( [[ ?b2 ?four_ac ]] op:numeric-subtract ?det2 ) ( [[ ?b ?b ]] op:numeric-multiply ?b2 ) ( [[ 4 ?ac ]] op:numeric-multiply ?four_ac ) ( [[ ?a ?c ]] op:numeric-multiply ?ac ) ( [[ 2 ?a ]] op:numeric-multiply ?bottom ) The opx:positive-square-root is an example of what a user-defined extension might look like. The point isn't that this is a particularly attractive notation for arithmetic, but that's it's a feasible notation for arithmetic and everything else that might come along in future. In summary, if we remove the arithmetic part of the grammar, add anonymous containers/collections to the grammar, and define a standard graph representing arithmetic on XML Scheme datatypes, we retain all the existing functionality. We additionally benefit from a simpler grammar, a simpler query model, a well-defined extension mechanism, easier use of containers/collections, and we minimize the coupling between SPARQL and XML Schema datatypes. The disadvantage is that arithmetic no longer has syntax optimized just for it. Tying this back to the original ACTION, Test 3c above would be my preferred alternative to sparql-query-example-3. However, as I would prefer to see the arithmetic capabilities modularized out to an entirely separate document, I'd ideally like to see this test removed from the SPARQL tests entirely and become a test for the hypothetical xsd: standard graph specification instead.
Received on Tuesday, 14 December 2004 10:19:13 UTC