Re: [OK?] Re: Last Call for comments on "SPARQL Query Language for RDF" from Axel Polleres on 2007-05-10 (public-rdf-dawg-comments@w3.org from May 2007)

From: Axel Polleres <axel.polleres@deri.org>
Date: Thu, 10 May 2007 13:59:58 -0600
To: Lee Feigenbaum <feigenbl@us.ibm.com>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <464379BE.9050007@deri.org>
All,

I got the chance to discuss the comments directly with Eric here at WWW.
I see that the WG wants to close this and will try to not cause more 
trouble ... :-)


hope still the comments were helpful and let me emphasize agein what I 
already said to Eric: If there's a follow-up after the first Rec on 
issues like aggregates, etc. I 'd be very glad to join the WG
(at the moment I guess I'm busy enough with RIF)!

best,
Axel

Lee Feigenbaum wrote:
> Axel Polleres <axel.polleres@deri.org> wrote on 05/03/2007 07:55:51 PM:
> 
> 
>>Where asked back, replies inline.
> 
> 
> Hello Axel,
> 
> Thanks for a timely reply. We've responded to the points that remain 
> unaddressed below inline (and have cut the remaining text for brevity). 
> Please let us know if this response satisfies you. If it does, you can 
> help our comment tracking by replying to this message and adding [CLOSED] 
> in the subject line. Again, in the interests of our schedule, we'd like to 
> ask that you get back to us as soon as possible, and if we do not hear 
> from you in 10 days we will consider these comments closed.
> 
> Lee
> 
> 
>>Lee Feigenbaum wrote:
>>
>>>Axel Poleres wrote on 04/17/2007 12:53:55 PM:
>>>
>>>
>>>>Dear all,
>>>>
>>>>below my review on the current SPARQL draft from
>>>>
>>>>http://www.w3.org/TR/rdf-sparql-query/
>>>>
>>>>on behalf of W3C member organization DERI Galway.
> 
> 
> ...
> 
> 
>>>>Section 5
>>>>
>>>>Section 5 is called Graph patterns and has only subsections
>>>>5.1 and 5.2 for basic and group patterns, whereas the other types are
>>>>devoted separate top level sections.. this structuring seems a bit
>>>>unlogical.
>>>
>>>
>>>In the interest of keeping the document numbering as is, we've decided 
> 
> to 
> 
>>>keep this section as is. If you have a better suggestion for the name 
> 
> of 
> 
>>>the section, we'd be glad to hear it. ("Basic Graph Patterns and Group 
> 
> 
>>>Graph Patterns" does not seem particularly helpful to a reader.)
>>
>>
>>I'd suggest to have two separate top level sections.
>>"In the interest of keeping the document numbering as is"
>>is not an argument which seems very logical to me, to be honest.
> 
> 
> We've added a line in the introduction explaining the relationship of the 
> two topics covered in Section 5:
> 
> """
> In this section we describe the two forms that combine patterns by 
> conjunction: basic graph patterns, which combine triples patterns, and 
> group graph patterns, which combine all other graph patterns.
> """
> 
> There is a close-but-different relationship between the two types of graph 
> patterns and we feel that keeping one section helps keep this clear. The 
> editors are not motivated to split Section 5 into two top-level sections. 
> (Do note, however, that editorial changes are permitted during CR, and 
> anyone submitting proposed changes will of course be given due 
> consideration.)
> 
> ...
> 
>>>>Another one about FILTERs: What about this one, ie. a FILTER which
>>>>refers to the outside scope:
>>>>
>>>>?x p o OPTIONAL { FILTER (?x != s) }
>>>>
>>>>concrete example:
>>>>
>>>>SELECT ?n ?m
>>>>{ ?x a foaf:Person .  ?x foaf:name ?n .
>>>>  OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
> 
> 
> Call this query [X].
> 
> 
>>>>Supresses the email address for John Doe in the output!
>>>>Note: This one is interesting, since the OPTIONAL part may NOT be
>>>>evaluated separately!, but carries over a binding from the
>>>>super-pattern! 
>>>
>>>
>>>A filter in the optional part of an OPTIONAL construct applies to the
>>>solutions from the required part as (possibly) extended by the 
> 
> optional
> 
>>>part. In the algebra, the example above becomes:
>>>
>>>LeftJoin(
>>>  BGP(?x a foaf:Person .  ?x foaf:name ?n),
>>>  BGP(?x foaf:mbox ?m),
>>>  (?n != "John Doe")
>>>)
>>
>>Hmmm, does this mean, that the query would simple be the same as writing
>>
>>SELECT ?n ?m
>>{ ?x a foaf:Person .  ?x foaf:name ?n . FILTER (?n != "John Doe")
>>   OPTIONAL { ?x foaf:mbox ?m }  }
> 
> 
> Call this query [Y].
> 
> 
>>in this case?
> 
> 
> No. This latter query, [Y], is:
> 
> Filter(?n != "John Doe",
>   LeftJoin(
>     BGP(?x a foaf:Person .  ?x foaf:name ?n),
>     BGP(?x foaf:mbox ?m),
>     true
>   )
> )
> 
> 
>>(How) would it be possible then to encode my intended meaning of the 
>>query, ie. that I want to give all names, but supress the email address 
>>of John Doe?
> 
> 
> The original query, [X], has these semantics. The second query, [Y], does 
> not.
>  
> ... 
> 
>>>>Would it make sense to add some non-well-defined OPTIONAL patterns,
>>>>following [Perez et al. 2006] in the document? As mentioned before, I
>>>>didn't yet check section 12, maybe these corner case examples are
>>>>there.. 
>>
>> > We're not motivated to add these
>> > examples to the document.
>>
>>Why? I would object here, but not being part of the WG, I have to leave 
>>this decision to you of course.
> 
> 
> The editors do not believe that such an example would add to the quality 
> of the specification.
> 
> ...
> 
>>>>Section 9:
>>>>
>>>>What is "reduced" good for? I personally would tend to make reduced
>>>>the default, and instead put a modifier "STRICT" or "WITHDUPLICATES"
>>>>which enforces that ALL non-unique solutions are displayed.
>>>
>>>REDUCED can be used to permit certain optimizations by the SPARQL 
> 
> query 
> 
>>>engine. The WG discussed various design options in this space 
> 
> including 
> 
>>>the design you are suggesting, and decided to add the REDUCED keyword 
> 
> and 
> 
>>>mark the feature at-risk. More information:
>>>
>>>http://lists.w3.org/Archives/Public/public-rdf-
>>
>>dawg/2007JanMar/att-0194/20-dawg-minutes.html#item02
>>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0162.html
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0128.html
> 
>>>...and surrounding.
>>
>>By making REDUCED the exception and all tuples with duplicates the 
>>default, you somewhat implicitly single out implementations which use 
>>per-set rather than per-tuple strategy with that, in my opinion. I find 
>>this limiting.
> 
> 
> This was an intentional choice by the WG, considering all of the 
> information you have mentioned. I do not see any new information here at 
> this time to ask the WG to reconsider this decision.
>  
> ...
> 
>>>>Section 9.1
>>>>
>>>>The ORDER BY construct allows arbitrary constraints/expressions as
>>>>parameter...ie. you could give an arbitrary constraint condition here,
>>>>right? What is the order of that? TRUE > FALSE? Would be good to add
>>>>a remark on that. 
>>>
>>>
>>>This is for generality because semantic web data is not as structured 
> 
> (and 
> 
>>>typed) as a database.  It allows the query to proceed without an error 
> 
> 
>>>condition so it generates some defined outcome.
>>>
>>>SPARQL doesn't provide total ordering, but the example you asked about 
> 
> is 
> 
>>>specified.
>>>
>>>[[
>>>The "<" operator (see the Operator Mapping) defines the relative order 
> 
> of 
> 
>>>pairs of numerics, simple literals, xsd:strings, xsd:booleans and 
>>>xsd:dateTimes.
>>>]]
>>>
>>>"<" operator in the Operator Mapping table has an entry for
>>>  A < B  xsd:boolean  xsd:boolean  op:boolean-less-than(A, B)
>>>op:boolean-less-than is defined in XPath Functions and Operators
>>>  http://www.w3.org/TR/xpath-functions/#func-boolean-less-than
>>>
>>>[[
>>>Summary: Returns true if $arg1 is false and $arg2 is true. Otherwise, 
>>>returns false.
>>>]]
>>>
>>>I think the LC document specifies all the orderings intended by the 
> 
> DAWG, 
> 
>>>but am certainly open to counter-example.
>>
>>What I meant to say was that a short clarifying remark would keep 
>>readers from having to look through the separate spec.
> 
> 
> I believe that the intuitive interpretation of < is enough to give people 
> a good understanding. As SPARQL specifically uses XPath functions and 
> operators (for user familiarity, library re-use, and to leverage reviewed 
> specifications), we can't replace them with our own definitions. 
> Bulk-including those sections of the XPath spec would be costly and 
> confusing.
> 
> As frustrating as it may appear, I think this is optimized.
> 
> ...
> 
>>>ASK doesn't permit a SolutionModifier. Adding ASK there could imply 
> 
> that 
> 
>>>it was allowed and even had some effect (other than syntax error).
>>
>>wouldn't that be worth a footnote then, maybe?
> 
> 
> I reluctantly added a sentence to the end:
> [[
> Using ORDER BY on a solution sequence for a CONSTRUCT or DESCRIBE query 
> has no direct effect because only SELECT returns a sequence of results. 
> Used in combination with LIMIT and OFFSET, ORDER BY can be used to return 
> results generated from a different slice of the solution sequence. An ASK 
> query does not include ORDER BY, LIMIT or OFFSET.
> ]]
> ...
> 
> 
>>>>Sec 9.2:
>>>>
>>>>Add somewhere in the prose: "using the SELECT result form"...
>>>>
>>>>It is actually a bit weird that you mix select into the solution
>>>>modifiers, IMO, it would be better to mention SELECT first in section
>>>>9 
>>>>and then introducing the solution modifiers.
>>>
>>>
>>>
>>>SELECT is both an indicator of the query result form and also contains 
> 
> the 
> 
>>>projection.
>>
>>yes, that's my point.
> 
> 
> We see no reason to make a change. This has been part of the SPARQL 
> specification for a long time, and the experience of the community seems 
> to indicate that it is comprehensible.
> 
> ... 
> 
>>>>Sec 9.5/9.6:
>>>>
>>>>OFFSET 0 has no effect, LIMIT 0 obviously makes no sense since the
>>>>answer is always the empty solution set... So why for both not simply
>>>>only allowing positive integers? I see no benefit in allowing 0 at
>>>>all. 
>>>
>>>
>>>The WG believes that allowing 0 eases the burden on programmatically 
>>>generated queries.
>>
>>What is the justification for this belief if I may ask?
> 
> 
> The belief arises from implementation experience by various members of the 
> workgroup.
> 
> (For example:
> 
>     query = sprintf("SELECT ... LIMIT %d OFFSET %d", limit, offset);
> is easier than
>     if (offset == 0) {
>         query = sprintf("SELECT ... ");
>     } else {
>         query = sprintf("SELECT ... LIMIT %d OFFSET %d", limit, offset);
>     }
> )
> 
> Google, for another instance, serves from offset 0:
>   http://www.google.com/search?q=search&hl=en&start=0&sa=N
> 
> 
>>>>Section 10.2
>>>>
>>>>CONSTRUCT combines triples "by set union"?
>>>>So, I need to eliminate duplicate triples if I want to implement
>>>>CONSTRUCT in my SPARQL engine?
>>>>Is this really what you wanted? In case of doubt, I'd suggest to
>>>>remove "by set union", or respectively, analogously to SELECT,
>>>>introduce a DISTINCT (or alternatively a WITHDUPLICATES)
>>>>modifier for CONSTRUCT...
>>>
>>>
>>>A set represented with duplicate triples is identical to a 
> 
> representation 
> 
>>>without any duplicates, 
>>
>>no, it is not identical if viewed as dataset for another query: if I 
>>apply another (SELECT) query on the output of the CONSTRUCT  - which 
>>again is RDF, so why not? - then ther is potentially a difference (see 
>>the distinct/reduced issue)
> 
> 
> See below.
>  
> 
>>>so I believe the text is correct as written.  That 
>>>is, the following are representations of the same graph:
>>>
>>><x> <y> <z> .
>>>
>>>and
>>>
>>><x> <y> <z> .
>>><x> <y> <z> .
>>
>>if I ask aquery with solution modifiers on these two graphs, then it is 
>>not the same! Attention!
> 
> 
> No, the above are two representations of the same RDF graph. (A graph with 
> a single triple.) Any SPARQL query against either of these two 
> representations of the same graph will have the same solutions.
> 
> 
>>>>BTW, I miss the semantics for CONSTRUCT given formally in Section 12.
>>>
>>>
>>>We do not right now intend to include CONSTRUCT in Section 12. 
> 
> CONSTRUCT 
> 
>>>is defined normatively in section 10.2. ( 
>>>http://www.w3.org/TR/rdf-sparql-query/#construct ).
>>
>>I fail to find a definition of the formal semantics of CONSTRUCT there.
>>
>>CONSTRUCT is likely one of the things which people will pick up very 
>>fast...so it would be good to have this more formal, I think.
> 
> 
> The group has decided not to pursue a rigorous treatment of CONSTRUCT in 
> Section 12 at this time. To do so would require a great deal of new work 
> and review, and would put our schedule in serious jeopardy. We believe 
> that the semantics specified in Section 10.2 sufficiently specify 
> CONSTRUCT and will lead to interoperable implementations. 
> 
> ... 
> 
>>>>In the definition of compatible mappings, you might want to change
>>>>
>>>>"every variable v in dom(&mu;1) and in dom(&mu;2)"
>>>>to
>>>>"every variable v &isin;  dom(&mu;1) &cap; dom(&mu;2)"
>>>>
>>>>"Write merge(&mu;1, &mu;2) for &mu;1 set-union &mu;2"
>>>>
>>>>Why not use the symbol &cup; here?
>>>
>>>
>>>As noted above the reliance on some symbols being available is not 
> 
> safe 
> 
>>>across enough brower and locale setups.  We are striking a balance 
> 
> here.
> 
>>and &mu; is safe?
> 
> 
> &mu; is safer.  Correct display of, say, &cup; is less common than than 
> &mu; 
> The W3C document style does not set the font family for display.
> 
> ...
> 
>>>>12.5
>>>>
>>>>The operator List(P) is nowhere defined.
>>>>I still don't have totally clear why you need to introduce the ToList
>>>>operator. 
>>>
>>>
>>>Already discussed.
>>
>>Also that "List(P)" is not defined?
> 
> 
> This is already fixed in the editors' working draft.
> 
> ... 
> 
>>>>A general comment:
>>>>
>>>>I miss a section defining the *Semantics of a query* and of different
>>>>result forms. The Evaluation semantics given here rather is a mix of
>>>>functions having partly multisets of solution mappings and sequences
>>>>thereof as result, 
>>>>but all are called "eval()".
>>>> E.g. eval for BGP returns a multiset, whereas eval returns a list
>>>>for ToList, etc. 
>>>>
>>>>The semantics of a *query* is not really clearly defined yet, it
>>>>seems. This needs another revision, I guess.
>>
>>no response here?
> 
> 
> 
> Sec. 12 intro says:
> 
> """
> This section defines the correct behavior for evaluation of graph patterns 
> and 
> solution modifiers, given a query string and an RDF dataset. It does not 
> imply 
> a SPARQL implementation must use the process defined here.
> """
> 
> 
>>>>In the "Notes", item (d):
>>>>
>>>>"the current state of the art in OWL-DL querying focusses on the case
>>>>where answer bindings to blank nodes are prohibited."
>>>>
>>>>It would be helpful to give references here.
>>>
>>>
>>>The notes highlight the working assumptions.  I don't think references 
> 
> 
>>>would change that.  This is a diference between an acedemic paper and 
> 
> a 
> 
>>>specification.
>>
>>You mean that a specification shouldn't follow general rules of style 
>>which make the reader more comfortable (such as for instance 
>>references)? disagree, to be honest.
> 
> 
> Technology specifications do not follow the same style rules as academic 
> papers, largely because the two have different goals. The editors do not 
> believe that references to OWL-DL querying work would improve the 
> specification.
>  
> thanks again,
> Lee
> 
> 
> 


-- 
Dr. Axel Polleres
email: axel@polleres.net  url: http://www.polleres.net/
Received on Thursday, 10 May 2007 20:00:14 UTC