Re: [OK?] Re: Last Call for comments on "SPARQL Query Language for RDF" from Lee Feigenbaum on 2007-05-04 (public-rdf-dawg-comments@w3.org from May 2007)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Fri, 4 May 2007 17:07:35 -0400
To: Axel Polleres <axel.polleres@deri.org>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <OF42A1B9CF.B378D4B1-ON852572D1.00721EC9-852572D1.00740CDB@us.ibm.com>
Axel Polleres <axel.polleres@deri.org> wrote on 05/03/2007 07:55:51 PM:

> Where asked back, replies inline.

Hello Axel,

Thanks for a timely reply. We've responded to the points that remain 
unaddressed below inline (and have cut the remaining text for brevity). 
Please let us know if this response satisfies you. If it does, you can 
help our comment tracking by replying to this message and adding [CLOSED] 
in the subject line. Again, in the interests of our schedule, we'd like to 
ask that you get back to us as soon as possible, and if we do not hear 
from you in 10 days we will consider these comments closed.

Lee

> Lee Feigenbaum wrote:
> > Axel Poleres wrote on 04/17/2007 12:53:55 PM:
> > 
> >>Dear all,
> >>
> >>below my review on the current SPARQL draft from
> >>
> >>http://www.w3.org/TR/rdf-sparql-query/
> >>
> >>on behalf of W3C member organization DERI Galway.

...

> >>Section 5
> >>
> >>Section 5 is called Graph patterns and has only subsections
> >>5.1 and 5.2 for basic and group patterns, whereas the other types are
> >>devoted separate top level sections.. this structuring seems a bit
> >>unlogical.
> > 
> > 
> > In the interest of keeping the document numbering as is, we've decided 
to 
> > keep this section as is. If you have a better suggestion for the name 
of 
> > the section, we'd be glad to hear it. ("Basic Graph Patterns and Group 

> > Graph Patterns" does not seem particularly helpful to a reader.)
> 
> 
> I'd suggest to have two separate top level sections.
> "In the interest of keeping the document numbering as is"
> is not an argument which seems very logical to me, to be honest.

We've added a line in the introduction explaining the relationship of the 
two topics covered in Section 5:

"""
In this section we describe the two forms that combine patterns by 
conjunction: basic graph patterns, which combine triples patterns, and 
group graph patterns, which combine all other graph patterns.
"""

There is a close-but-different relationship between the two types of graph 
patterns and we feel that keeping one section helps keep this clear. The 
editors are not motivated to split Section 5 into two top-level sections. 
(Do note, however, that editorial changes are permitted during CR, and 
anyone submitting proposed changes will of course be given due 
consideration.)

...
> >>Another one about FILTERs: What about this one, ie. a FILTER which
> >>refers to the outside scope:
> >>
> >>?x p o OPTIONAL { FILTER (?x != s) }
> >>
> >>concrete example:
> >>
> >>SELECT ?n ?m
> >>{ ?x a foaf:Person .  ?x foaf:name ?n .
> >>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }

Call this query [X].

> >>
> >>Supresses the email address for John Doe in the output!
> >>Note: This one is interesting, since the OPTIONAL part may NOT be
> >>evaluated separately!, but carries over a binding from the
> >>super-pattern! 
> > 
> > 
> > A filter in the optional part of an OPTIONAL construct applies to the
> > solutions from the required part as (possibly) extended by the 
optional
> > part. In the algebra, the example above becomes:
> > 
> > LeftJoin(
> >   BGP(?x a foaf:Person .  ?x foaf:name ?n),
> >   BGP(?x foaf:mbox ?m),
> >   (?n != "John Doe")
> > )
> 
> Hmmm, does this mean, that the query would simple be the same as writing
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n . FILTER (?n != "John Doe")
>    OPTIONAL { ?x foaf:mbox ?m }  }

Call this query [Y].

> in this case?

No. This latter query, [Y], is:

Filter(?n != "John Doe",
  LeftJoin(
    BGP(?x a foaf:Person .  ?x foaf:name ?n),
    BGP(?x foaf:mbox ?m),
    true
  )
)

> (How) would it be possible then to encode my intended meaning of the 
> query, ie. that I want to give all names, but supress the email address 
> of John Doe?

The original query, [X], has these semantics. The second query, [Y], does 
not.
 
... 
> > 
> >>Would it make sense to add some non-well-defined OPTIONAL patterns,
> >>following [Perez et al. 2006] in the document? As mentioned before, I
> >>didn't yet check section 12, maybe these corner case examples are
> >>there.. 
>  > We're not motivated to add these
>  > examples to the document.
> 
> Why? I would object here, but not being part of the WG, I have to leave 
> this decision to you of course.

The editors do not believe that such an example would add to the quality 
of the specification.

...
> >>Section 9:
> >>
> >>What is "reduced" good for? I personally would tend to make reduced
> >>the default, and instead put a modifier "STRICT" or "WITHDUPLICATES"
> >>which enforces that ALL non-unique solutions are displayed.
> > 
> > REDUCED can be used to permit certain optimizations by the SPARQL 
query 
> > engine. The WG discussed various design options in this space 
including 
> > the design you are suggesting, and decided to add the REDUCED keyword 
and 
> > mark the feature at-risk. More information:
> > 
> > http://lists.w3.org/Archives/Public/public-rdf-
> dawg/2007JanMar/att-0194/20-dawg-minutes.html#item02
> > 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0162.html
> > 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0128.html
> > 
> > ...and surrounding.
> 
> By making REDUCED the exception and all tuples with duplicates the 
> default, you somewhat implicitly single out implementations which use 
> per-set rather than per-tuple strategy with that, in my opinion. I find 
> this limiting.

This was an intentional choice by the WG, considering all of the 
information you have mentioned. I do not see any new information here at 
this time to ask the WG to reconsider this decision.
 
...
> >>Section 9.1
> >>
> >>The ORDER BY construct allows arbitrary constraints/expressions as
> >>parameter...ie. you could give an arbitrary constraint condition here,
> >>right? What is the order of that? TRUE > FALSE? Would be good to add
> >>a remark on that. 
> > 
> > 
> > This is for generality because semantic web data is not as structured 
(and 
> > typed) as a database.  It allows the query to proceed without an error 

> > condition so it generates some defined outcome.
> >
> > SPARQL doesn't provide total ordering, but the example you asked about 
is 
> > specified.
> > 
> > [[
> > The "<" operator (see the Operator Mapping) defines the relative order 
of 
> > pairs of numerics, simple literals, xsd:strings, xsd:booleans and 
> > xsd:dateTimes.
> > ]]
> > 
> > "<" operator in the Operator Mapping table has an entry for
> >   A < B  xsd:boolean  xsd:boolean  op:boolean-less-than(A, B)
> > op:boolean-less-than is defined in XPath Functions and Operators
> >   http://www.w3.org/TR/xpath-functions/#func-boolean-less-than
> > 
> > [[
> > Summary: Returns true if $arg1 is false and $arg2 is true. Otherwise, 
> > returns false.
> > ]]
> > 
> > I think the LC document specifies all the orderings intended by the 
DAWG, 
> > but am certainly open to counter-example.
> 
> What I meant to say was that a short clarifying remark would keep 
> readers from having to look through the separate spec.

I believe that the intuitive interpretation of < is enough to give people 
a good understanding. As SPARQL specifically uses XPath functions and 
operators (for user familiarity, library re-use, and to leverage reviewed 
specifications), we can't replace them with our own definitions. 
Bulk-including those sections of the XPath spec would be costly and 
confusing.

As frustrating as it may appear, I think this is optimized.

...
> > ASK doesn't permit a SolutionModifier. Adding ASK there could imply 
that 
> > it was allowed and even had some effect (other than syntax error).
> 
> wouldn't that be worth a footnote then, maybe?

I reluctantly added a sentence to the end:
[[
Using ORDER BY on a solution sequence for a CONSTRUCT or DESCRIBE query 
has no direct effect because only SELECT returns a sequence of results. 
Used in combination with LIMIT and OFFSET, ORDER BY can be used to return 
results generated from a different slice of the solution sequence. An ASK 
query does not include ORDER BY, LIMIT or OFFSET.
]]
...

> >>Sec 9.2:
> >>
> >>Add somewhere in the prose: "using the SELECT result form"...
> >>
> >>It is actually a bit weird that you mix select into the solution
> >>modifiers, IMO, it would be better to mention SELECT first in section
> >>9 
> >>and then introducing the solution modifiers.
> > 
> > 
> > 
> > SELECT is both an indicator of the query result form and also contains 
the 
> > projection.
> 
> yes, that's my point.

We see no reason to make a change. This has been part of the SPARQL 
specification for a long time, and the experience of the community seems 
to indicate that it is comprehensible.

... 
> >>Sec 9.5/9.6:
> >>
> >>OFFSET 0 has no effect, LIMIT 0 obviously makes no sense since the
> >>answer is always the empty solution set... So why for both not simply
> >>only allowing positive integers? I see no benefit in allowing 0 at
> >>all. 
> > 
> > 
> > The WG believes that allowing 0 eases the burden on programmatically 
> > generated queries.
> 
> What is the justification for this belief if I may ask?

The belief arises from implementation experience by various members of the 
workgroup.

(For example:

    query = sprintf("SELECT ... LIMIT %d OFFSET %d", limit, offset);
is easier than
    if (offset == 0) {
        query = sprintf("SELECT ... ");
    } else {
        query = sprintf("SELECT ... LIMIT %d OFFSET %d", limit, offset);
    }
)

Google, for another instance, serves from offset 0:
  http://www.google.com/search?q=search&hl=en&start=0&sa=N

> >>Section 10.2
> >>
> >>CONSTRUCT combines triples "by set union"?
> >>So, I need to eliminate duplicate triples if I want to implement
> >>CONSTRUCT in my SPARQL engine?
> >>Is this really what you wanted? In case of doubt, I'd suggest to
> >>remove "by set union", or respectively, analogously to SELECT,
> >>introduce a DISTINCT (or alternatively a WITHDUPLICATES)
> >>modifier for CONSTRUCT...
> > 
> > 
> > A set represented with duplicate triples is identical to a 
representation 
> > without any duplicates, 
> 
> no, it is not identical if viewed as dataset for another query: if I 
> apply another (SELECT) query on the output of the CONSTRUCT  - which 
> again is RDF, so why not? - then ther is potentially a difference (see 
> the distinct/reduced issue)

See below.
 
> > so I believe the text is correct as written.  That 
> > is, the following are representations of the same graph:
> > 
> > <x> <y> <z> .
> > 
> > and
> > 
> > <x> <y> <z> .
> > <x> <y> <z> .
> 
> if I ask aquery with solution modifiers on these two graphs, then it is 
> not the same! Attention!

No, the above are two representations of the same RDF graph. (A graph with 
a single triple.) Any SPARQL query against either of these two 
representations of the same graph will have the same solutions.

> >>BTW, I miss the semantics for CONSTRUCT given formally in Section 12.
> > 
> > 
> > We do not right now intend to include CONSTRUCT in Section 12. 
CONSTRUCT 
> > is defined normatively in section 10.2. ( 
> > http://www.w3.org/TR/rdf-sparql-query/#construct ).
> 
> I fail to find a definition of the formal semantics of CONSTRUCT there.
> 
> CONSTRUCT is likely one of the things which people will pick up very 
> fast...so it would be good to have this more formal, I think.

The group has decided not to pursue a rigorous treatment of CONSTRUCT in 
Section 12 at this time. To do so would require a great deal of new work 
and review, and would put our schedule in serious jeopardy. We believe 
that the semantics specified in Section 10.2 sufficiently specify 
CONSTRUCT and will lead to interoperable implementations. 

... 
> >>In the definition of compatible mappings, you might want to change
> >>
> >>"every variable v in dom(&mu;1) and in dom(&mu;2)"
> >>to
> >>"every variable v &isin;  dom(&mu;1) &cap; dom(&mu;2)"
> >>
> >>"Write merge(&mu;1, &mu;2) for &mu;1 set-union &mu;2"
> >>
> >>Why not use the symbol &cup; here?
> > 
> > 
> > As noted above the reliance on some symbols being available is not 
safe 
> > across enough brower and locale setups.  We are striking a balance 
here.
> 
> and &mu; is safe?

&mu; is safer.  Correct display of, say, &cup; is less common than than 
&mu; 
The W3C document style does not set the font family for display.

...
> >>12.5
> >>
> >>The operator List(P) is nowhere defined.
> >>I still don't have totally clear why you need to introduce the ToList
> >>operator. 
> > 
> > 
> > Already discussed.
> 
> Also that "List(P)" is not defined?

This is already fixed in the editors' working draft.

... 
> >>A general comment:
> >>
> >>I miss a section defining the *Semantics of a query* and of different
> >>result forms. The Evaluation semantics given here rather is a mix of
> >>functions having partly multisets of solution mappings and sequences
> >>thereof as result, 
> >>but all are called "eval()".
> >>  E.g. eval for BGP returns a multiset, whereas eval returns a list
> >>for ToList, etc. 
> >>
> >>The semantics of a *query* is not really clearly defined yet, it
> >>seems. This needs another revision, I guess.
> 
> no response here?


Sec. 12 intro says:

"""
This section defines the correct behavior for evaluation of graph patterns 
and 
solution modifiers, given a query string and an RDF dataset. It does not 
imply 
a SPARQL implementation must use the process defined here.
"""

> >>In the "Notes", item (d):
> >>
> >>"the current state of the art in OWL-DL querying focusses on the case
> >>where answer bindings to blank nodes are prohibited."
> >>
> >>It would be helpful to give references here.
> > 
> > 
> > The notes highlight the working assumptions.  I don't think references 

> > would change that.  This is a diference between an acedemic paper and 
a 
> > specification.
> 
> You mean that a specification shouldn't follow general rules of style 
> which make the reader more comfortable (such as for instance 
> references)? disagree, to be honest.

Technology specifications do not follow the same style rules as academic 
papers, largely because the two have different goals. The editors do not 
believe that references to OWL-DL querying work would improve the 
specification.
 
thanks again,
Lee
Received on Friday, 4 May 2007 21:07:47 UTC