Re: ACTION: propose test case for scalar constraint with syntax from Seaborne, Andy on 2004-12-15 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 15 Dec 2004 10:22:29 +0000
To: Simon Raboczi <raboczi@tucanatech.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <41C01065.7060101@hp.com>
Simon,

Two important goals, as I see it, are:

1/ Application writers can express in queries what they want in a clear manner.

2/ We allow for as wide a variety of implementation strategies for SPARQL 
processors as possible.

Your proposal of separating out mathematical graphs is a reasonable 
implementation approach but it is not the only one.  Some systems, like SQL, 
have efficient support for numeric expressions.  Extracing the numeric 
expressions from the triple form is hard, while the turning of constrinat 
syntax into graph syntax is easy.

I don't find the [[ ]] syntax at all clear.

Note that section 3 of rq23 is undergoing change anyway based on Pat's comments.


Simon Raboczi wrote:
> 
> This message is regards my objection to the test sparql-query-example-3.
> 
>   http://www.w3.org/2001/sw/DataAccess/tests/#sparql-query-example-3
> 
> I'll be working from revision 1.156 of the SPARQL spec:
> 
>   http://www.w3.org/2001/sw/DataAccess/rq23
> 
> The SPARQL query involved in the test as it stands is as follows:
> 
> Test 3:
> 
>   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
>   PREFIX  ns:  <http://example.org/ns#>
>   SELECT  ?title ?price
>   WHERE   ( ?x dc:title ?title )
>           ( ?x ns:price ?price ) AND ?price < 30
> 
> I don't object to the test itself, but to the feature that it tests.  
> Accepting this test would endorse our differentiation of a Constraint 
> like "?price < 30" from a Triple Pattern like "( ?x ns:price ?price )".

The differentiation is supposed to be mainly syntactic: a system may choose 
to implement this as (may be by just processing the query during parsing):

    ( ?price math:lessThan 30 )

or the way Tucana does it with special datatyping graphs.

The "mainly" qualifier is a recognition that some RDF systems will find it 
possible to implement XML datatyping (e.g. "42"^^xsd:short == 
"42"^^xsd:byte) easier than in purely graph matching.  Tucana do it by 
special datatyping graphs for example.

There's a good discussion in:
http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/


> The entire point of RDF is to be a common model (the RDF graph) which we 
> can use to combine descriptions in different domains.  We'll cripple its 
> ability to do this if we start excluding domains from this common 
> model.  Arithmetic is a particularly important description domain, but 
> it doesn't need to be a special case with over a third of the 
> non-terminal productions in the grammar dedicated to it.  This is not 
> only unnecessary; it's actively harmful to extensibility.  What happens 
> when the next "special case" domain pops up?  Every domain is a "special 
> case" to some community out there with an existing notation, and that 
> existing notation will doubtless be prettier for its specialized domain 
> than RDF/SPARQL is.  However, we can't go extending the grammar of 
> SPARQL every time a new domain pops up.  The RDF way is to define some 
> resources that map that domain into the RDF graph and thus into SPARQL's 
> grammar.  That's the extensible way to do it.
> 
> As I see it, the current differentiation of a Constraint from a Triple 
> Pattern serves two purposes:
> 
> Purpose 1: Most importantly, it indicates whether the variable 
> substitutions can be solved by calculation (for a Constraint) or by 
> lookup in a graph (for a Triple Pattern).  You can't answer the query 
> without being able to determine this difference.  Having two different 
> syntaxes makes the difference abundantly clear.

A constraint is a restriction on solutions just like a triple pattern is.

You could read
      ( ?x dc:title ?title )
      ( ?x ns:price ?price ) AND ?price < 30

as saying
"""
all solutions (title, price) such that the following is true:
   x has a title
   x has a price
   price is less than 30
"""

The order of execution does not matter.  It's an implementation matter as to 
whether calculation or looking in a graph is done.  It does not matter to 
the query.

(and, yes, that makes the query

SELECT  ?title ?price
WHERE   ( ?x dc:title ?title )
         ?price < 30
         ( ?x ns:price ?price )

legal).

It is not intended to fix whether it is lookup or calculation.  I'd expect 
different systems to do it differently and we have implementations that do 
just this at the moment.

Is there text I can change in rq23 to make this clear?

> 
> Purpose 2: Less importantly, it allows an entirely different grammar 
> optimized for arithmetic to be used for Constraints.  Even though the 
> grammar is fairly complex, involving a switch to infix notation and 
> introducing precedence rules, it's a well-known language that users will 
> already know from elsewhere.  Including chunks of other languages to 
> bypass a learning curve worked well for Perl, after all.

This is the main intention, not the lesser one - to have a syntax for 
writing constraints that application writers are familiar with.  The grammar 
is conventional (the original for RDQL came from a free grammar for Java) 
and it is long, which is mainly to do with getting the usual precedence of 
operators to happen.

> 
> I'll try here to show an alternative to the current design, using 
> mechanisms already included in SPARQL to achieve the two abovementioned 
> purposes.  Firstly, this is the test rewritten to use only Triple Patterns:
> 
> Test 3a:
> 
>   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
>   PREFIX  ns:  <http://example.org/ns#>
>   PREFIX  op:  <http://www.w3.org/2001/sw/DataAccess/operations>
>   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
>   SELECT  ?title ?price
>   WHERE   ( ?x dc:title ?title )
>           ( ?x ns:price ?price )
>           ( ?price op:numeric-less-than "30"^^xsd:decimal )

That should be correct and (we have a dependency on F&O here) should be the 
same thing.

It is quite possible that a SPARQL processor can do a better job on that 
query than, say,

WHERE   ( ?x dc:title ?title )
         ( ?x ns:price ?price )
         ( ?price ?p "30"^^xsd:short )
         ( ?p owl:sameAs op:numeric-less-than)


We also have input from SWBPD:

http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/

on issues of equality.

> 
> Taking this approach generally would allow us to entirely remove the 
> final option in production 10 of the current grammar and eliminate 
> productions 18-36 entirely:
> 
>   [10]  PatternElement ::= TriplePattern
>                          | GroupGraphPattern
>                          | SourceGraphPattern
>                          | OptionalGraphPattern
>                          | 'AND'? Expression     <-- this gets removed
> 
> I'll suspend my zealotry for avoiding special cases long enough to admit 
> that it's probably worth granting syntactic support to a property like 
> op:numeric-less-than.  We could do this by extending production 37 to 
> include "<" as a synonym for "op:numeric-less-than".
> 
>   [37]  Literal ::= URI
>                   | NumericLiteral
>                   | TextLiteral
>                   | RelationalOperator    <-- this gets added
> 
>   [??]  RelationalOperator ::= '||' | '&&' | 'EQ' | 'NE' | '=~' | '!~'  
>  <--- new production
>                              | '==' | '!=' | '<'  | '>'  | '<=" | '>='  
>  <---
> 
> Using the special abbreviated syntax for both RelationalOperator and 
> NumericLiteral, we can rewrite the test query in a form thqt I'd argue 
> is actually easier to understand than the original sparql-query-example-3:
> 
> Test 3b:
> 
>   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
>   PREFIX  ns:  <http://example.org/ns#>
>   SELECT  ?title ?price
>   WHERE   ( ?x dc:title ?title )
>           ( ?x ns:price ?price )
>           ( ?price < 30 )
> 
> I should point out before proceeding further that I haven't fully dealt 
> with Purpose 1 above yet.  Not everything in arithmetic is a 
> RelationalOperator, and I'll need to provide a solution for N-ary 
> operators ('+', '*'), binary operators ('-', '/', '%'), unary operators 
> '+', '-', '!') and user-defined functions.  That's more difficult and 
> I'll get back to it after treating Purpose 2.
> 
> Purpose 2 for distinguishing between Constraints and Triple Patterns was 
> so that we'd know whether to solve by calculation or by consulting a 
> graph.  In the absence of the AND keyword, I can imagine at least two 
> ways this can still be done with the remainder of SPARQL.  The first 
> possibility is to distinguish based on the predicate.  Any Triple 
> Pattern whose predicate happened to be in the op: namespace would have 
> its solution calculated rather than depending on the graph.  Under this 
> scheme Test 3a and Test 3b would be valid as written above.  If this 
> default behavior wasn't desired, SOURCE could be used to specify that 
> the pattern should be targeted against a graph instead.
> 
> The second approach is to dispense with magic entirely and explicitly 
> state how each Triple Pattern is to be solved.  We could introduce a 
> standard graph (say, xsd: for example's sake) and use SOURCE as follows:
> 
> Test 3c:
> 
>   PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
>   PREFIX  ns:  <http://example.org/ns#>
>   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
>   SELECT  ?title ?price
>   WHERE   ( ?x dc:title ?title )
>           ( ?x ns:price ?price )
>           SOURCE xsd: ( ?price < 30 )
> 
> This approach of introducing datatype operations using standard graphs 
> and leaving the grammar purely composed of Triple Patterns is what I 
> personally prefer.  This would decouple SPARQL from XML schema (except 
> for NumericLiteral) in much the same way that RDF is decoupled from any 
> particular datatype (except for rdf:XMLLiteral).  The standard xsd: 
> graph could be an entirely different document, and a concrete example to 
> third parties of SPARQL's extension mechanism.  We'd get to choose 
> whether the availability of the xsd: graph in a SPARQL processor is 
> mandatory or optional, possibly lowering the implementation bar.

If the "SOURCE xsd:" form is a well known (defined in rq23) piece of syntax, 
why not switch to arithmetic syntax?  I don't see that having "SOURCE xsd:" 
changes the capability over inlining an expression - and after all the main 
graph may understand datatyping.

I don't see there is a coupling with XML schema datatypes until we define 
the operators that are required of a SPARQL processor in section 11 (Testing 
Values).

If it is not a special piece of syntax, then this can already be done in 
SPARQL using the SOURCE clause and predicates.

Your query above is legal at the moment - would you like to suggest text for 
section on extensibility that highlights that it is possible to have graphs 
with different capabilities?

> 
> Now, back to the unfinished business...
> 
> I deferred dealing with operators other than comparisons.  These are 
> easy because they map naturally to Triple Patterns.  Looking at section 
> 11.1.2, unary operators like op:numeric-unary-minus could also fit 
> naturally into a Triple Pattern, but anything with more than one 
> parameter such as op:numeric-subtract won't.
> 
> A way to deal with functions of arbitrary arity is to make them 
> properties of RDF Containers and/or Collections.  So if I want to add 1+1:
> 
>   PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
>   SELECT  ?x
>   WHERE   SOURCE xsd: ( ?args op:numeric-add ?x )
>           ( ?args rdf:type rdf:Seq )
>           ( ?args rdf:_1 1 )
>           ( ?args rdf:_2 1 )
> 
> That's pretty horrible, aesthetically.  But Containers and Collections 
> are actually part of RDF, so it's easier to justify some kind of 
> notation to support them, a-la RDF/XML, N3 or Turtle.

Did you consider RDF collections?

 > For the sake of
> example, I'll use [[ ]] to indicate a shortcut syntax that creates a 
> blank node of type rdf:Seq, because all other bracketing characters have 
> already been used in SPARQL.  The arithmetic example above is now more 
> plausible:
> 
>   PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>
>   SELECT  ?x
>   WHERE   SOURCE xsd: ( [[ 1 1 ]] op:numeric-add ?x )
> 
> Just to push the idea hard enough to see it strain, this is what the 
> quadratic formula x=(-b +/- sqrt(b^2-4ac))/2a could look like:
> 
>   PREFIX  op:     <http://www.w3.org/2001/sw/DataAccess/operations>
>   PREFIX  opx:    <http://example.com/extended-operations#>
>   PREFIX  sparql: <???is-this-defined-yet???>
>   PREFIX  xsd:    <http://www.w3.org/2001/XMLSchema#>
>   SELECT  ?x
>   FROM    xsd:
>   WHERE   ( ?a sparql:eq 12.3 )
>           ( ?b sparql:eq 45.6 )
>           ( ?c sparql:eq 789 )
>           ( [[ ?top ?bottom ]] op:numeric-divide ?x )
>           ( [[ ?minus_b ?det ]] op:numeric-add ?top ) OR ( [[ ?minus_b 
> ?det ]] op:numeric-subtract ?top )
>           ( [[ ?b ]] op:numeric-unary-minus ?minus_b )
>           SOURCE opx: { [[ ?det2 ]] opx:positive-square-root ?det )
>           ( [[ ?b2 ?four_ac ]] op:numeric-subtract ?det2 )
>           ( [[ ?b ?b ]] op:numeric-multiply ?b2 )
>           ( [[ 4 ?ac ]] op:numeric-multiply ?four_ac )
>           ( [[ ?a ?c ]] op:numeric-multiply ?ac )
>           ( [[ 2 ?a ]] op:numeric-multiply ?bottom )
> 
> The opx:positive-square-root is an example of what a user-defined 
> extension might look like.  The point isn't that this is a particularly 
> attractive notation for arithmetic, but that's it's a feasible notation 
> for arithmetic and everything else that might come along in future.

1/ This introduces bNodes into queries.

2/ This is favouring one implementation approach over another.  It would be 
very hard to extract the expression "x=(-b +/- sqrt(b^2-4ac))/2a" from a 
query such as above back into an expression.

(There is an analogy here between the abstract graph and the variety of ways 
it can be encoded in RDF/XML)

3/ We have to be able to cover "ill-formed" expressions like a Seq with two 
rdf:_1's or two elements rdf:_1, rdf:_3 or less grounded expressions like:

     ( [[ 15 ?a ]] op:add  ?b )

4/ It would be better to have a general P(S, O) or (P S O) with expressions 
inline.  It would then be lisp-like and provide nesting.

cwm path expressions help a little for a postfix notation:

-----------------
@prefix math: <http://www.w3.org/2000/10/swap/math#>.

{
   ((2 1).math:sum 3) math:product ?c .
} => { <a> <b> ?c } .
-----------------
there is dataflow analysis on expression order, the built-ins don't run 
backwards:
-----------------
@prefix math: <http://www.w3.org/2000/10/swap/math#>.

{
   ((2 1).math:sum ?c) math:product 9 .
} => { <a> <b> ?c } .
-----------------
does not give ?c = 3

	Andy

> 
> In summary, if we remove the arithmetic part of the grammar, add 
> anonymous containers/collections to the grammar, and define a standard 
> graph representing arithmetic on XML Scheme datatypes, we retain all the 
> existing functionality.  We additionally benefit from a simpler grammar, 
> a simpler query model, a well-defined extension mechanism, easier use of 
> containers/collections, and we minimize the coupling between SPARQL and 
> XML Schema datatypes.  The disadvantage is that arithmetic no longer has 
> syntax optimized just for it.
> 
> Tying this back to the original ACTION, Test 3c above would be my 
> preferred alternative to sparql-query-example-3.  However, as I would 
> prefer to see the arithmetic capabilities modularized out to an entirely 
> separate document, I'd ideally like to see this test removed from the 
> SPARQL tests entirely and become a test for the hypothetical xsd: 
> standard graph specification instead.
Received on Wednesday, 15 December 2004 10:23:05 UTC