Re: Comments on SPARQL Query Language for RDF (draft) from Arjohn Kampman on 2005-03-24 (public-rdf-dawg-comments@w3.org from March 2005)

From: Arjohn Kampman <arjohn.kampman@aduna.biz>
Date: Thu, 24 Mar 2005 15:57:49 +0100
To: andy.seaborne@hp.com
Cc: public-rdf-dawg-comments@w3.org, Jeen Broekstra <jeen@aduna.biz>
Message-ID: <4242D56D.5010009@aduna.biz>
Seaborne, Andy wrote:
> Arjohn, Jeen,
> 
> Thanks for the comments.  I have incorporate the editorial ones: thi 
> sreply only contains discussion poinrs.
> 
> Arjohn Kampman wrote:
[...]
>> General comments (in no specific order)
>> ---------------------------------------
>>
>> - We are not very fond of SELECT-WHERE-FILTER construction. Considering
>>    that the FROM keyword is no longer used for specifying datasets; how
>>    about adopting the SQL-style SELECT-FROM-WHERE construction instead?
>>    It could prevent confusion with people coming from a database world
>>    that expect the WHERE-clause to contain boolean constraints.
> 
> 
> The protocol will provide some means for specifying the target of a 
> query so the matter has not changed.  I agree that people thinking that 
> SPARQL is some strange SQL wil casue problems but at the same time, the 
> the analogy is also helpful.

The protocol draft is next on my reading list, so I wasn't aware of
this. I'll get back to this issue when I've finished reading that spec.

> Until SQL, FILTER can appear inside the pattern, and not in a separated 
> clause, so app writer can place it next to the thing it affects if they 
> wish to.
> 
> By the way, WHERE is actually an optional word.  You can write queries 
> without it if you prefer.

Personally, I have a strong association of WHERE-clauses with boolean
expressions. IMHO, it doesn't feel "right" to put path expressions in a
WHERE-clause, but I might be able to get used to it ;-)

>> - The document suggests that (parts of) queries can only be evaluated on
>>    a specific graph: either the background graph or a named graph. We
>>    would have expected that, when no specific graph label is specified,
>>    the query would be evaluated on the union of all graphs.
> 
> 
> That set up is possible - make the background graph include the RDF 
> merge of the named graphs - but it is not the only configuration of an 
> RDF dataset.  The background graph is the knowledge base and includes 
> the things the application is saying is its knowledge - it may not 
> believe what's in some or all of the named graphs automatically.
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0070.html

Thanks for the reference, it made things a little clearer. I still have
a number of questions and doubts though:

My understanding of the email is that it's up to the application to
decide whether the background graph includes all named graphs, or that
it is a separate graph (or even some other constellation). If this is
true, then I would be a favor of the former, where the background graph
is the union/merge of all triples.

The way I see it, the graph label should be an ignorable attribute of
triples. With the latter approach, this is no longer true as query
results will depend on whether the graph label is queried or not. This
essentially comes down to redefining RDF from triples to quads.

In order to realize the "ignorability" of graph labels, the triple
pattern "{ ?s ?p ?o }" would have to match all triples, regardless of
the fact that they have zero, one or more than one label. The behaviour
of the pattern "GRAPH ?G { ?s ?p ?o }" is not immediately clear in this
setting. It could query just the triples with one of more labels. Or it
could query all triples, leaving the ?G variable unbound from triples
that have no label.

[...]
>> - Named graphs are identified by URIs; bnodes or literals cannot be used
>>    for this purpose. This forces application developers to generate URIs
>>    when a simple string would be sufficient. Supporting literals as graph
>>    names would allow developers to use simple string or datatyped dates
>>    to tag specific sets of statements. Would this be useful?
> 
> 
> Web resources are named by URI - the global uniqueness means that one 
> system can communicate that name to another without confusion.

Sure, but that doesn't answer our question. If one wants to communicate
the label of a named graph, one should use a URI. But if this is of no
concern, would it be useful to support bnodes and/or literals as label?

>> - The definition of DESCRIBE is very loose: maybe too loose to be useful
>>    in practice? An application developer would likely have a guarantee as
>>    to whether the mechanism yields the info that is needed. As it is now,
>>    the mechanism could very well result in the development of several
>>    "DESCRIBE-dialects", which offer this guarantee for specific use
>>    cases. We think a fixed definition like "it returns the bnode closure
>>    for the concerning URIs" would be more useful.
> 
> 
> There have been many definitions of a description and each seems to have 
> some application domain assumptions.  The SPARQL protocol service 
> description woudl be a place to state what a given service offers - the 
> point about DESCRIBE is that it is not defined exactly by the client 
> (c.f. CONSTRUCT).
> 
> Even "bnode closure" is tricky - FOAF is all bNodes.
> 
> We may see common descriptions emerging in various domains, such as LSID 
> getMetaData.

If no specific definition of the result for a DESCRIBE query can be
given, then wouldn't it be better to leave this definition to the
developers of these specific protocols and remove it from SPARQL? As the
developer of one of the available "semantic web frameworks", I find it
difficult to decide how to implement this functionality. There simply is
no single decision that will fulfill the needs of all. I think the
DESCRIBE-queries have a huge potential for introducing incompatibilities
between various SW-frameworks, which is not good.

>> - SeRQL offers default bindings for the often used prefixes 'rdf',
>>    'rdfs' and 'xsd'. If not specified in the query itself, these prefixes
>>    map to the standard RDF, RDF Schema and XML Schema namespaces. This
>>    has proved to be very convenient. Is this a feature that should be
>>    added to SPARQL too? We noted that the comment for version 1.244 of
>>    the document mentions: "Removed text for default prefixes for rdf:
>>    rdfs: owl: xsd:", but we we're unable to find a reason for this in the
>>    mailing list archives.
> 
> 
> It didn't seem to have sufficient support from within the WG.

Too bad. Of course, default bindings can still be added in a later
version once people start using the language for real ;-)

[...]
>> - There is a strong demand from the Sesame community to add ORDER BY and
>>    GROUP BY/COUNT functionality to SeRQL. It's good to see that the
>>    former has already been added to the editor's draft. However, we feel
>>    that the latter is just as important. Having to transmit complete
>>    query results only to be able to count specific rows adds a lot of
>>    unnecessary network traffic and can really hurt performance.
> 
> 
> Could you write this up as a use case?  What is being counted?  
> Individuals or names (URI labels, bNodes etc etc).
> 
> As a use case, even if the issue is not address in this round, it can be 
> logged as a postponed issue.  In particular, there are strong closed 
> world assumptions about applying aggregate functions so it would be good 
> to understand as much about this requirement as possible.

We'll consider doing this. Also, we're planning to implement this in
SeRQL, which might yield valuable input.

>  From below:
>  > Section A:
>  > * We have a number of remarks concerning the grammar, which is ambiguous
>  >    or at least needs unnecessary large look-aheads in a number of rules.
>  >    However, we're not sure if the grammar is considered to be final
>  >    enough for this kind of comments. Please let us know if you're
>  >    interested.
> 
> The grammar is getting close.  There is a tradeoff to be had been 
> expressing the grammar clearly and introducing extra, artificial states 
> (they don't represent an abstraction the app writer thinks about) for 
> some particular gramamr tool. The objective is not to be the grammar a 
> particular system can just copy across.
> 
> Globally, the lookahead is 1 - locally, a parser may either wish to use 
> extra states of locally increase lookahead.  What parsing mechanism are 
> you using?

Mainly JavaCC.

Some issues with the current grammar that might be worth resolving:
- It allows "DESCRIBE <my:URI> WHERE ..."
- The first rule for PropertyList is both recursive and repetitive.
   Substituting the '*' with a '?' would fix this.
- Same issue as above for ObjectList.
- An equivalent but clearer definition for Collection would be:
   Collection ::= '(' GraphNode* ')'
- The rule for ConditionalXorExpression is both unnecessary and
   confusing. It should probably be removed.
- The Expression argument for functions like STR, LANG, DATATYPE, etc.
   seems to be too generic. It even allows one to apply these functions
   on boolean expressions containing ANDs and ORs. Might it be possible
   to replace these arguments with VarOrTerm?
- RDFLiteral allows the definition of literals with both a langauge tag
   and a datatype. Should be easy to fix, e.g.:
   RDFLiteral ::= String ( <LANGTAG> | '^^' URI )?
- <LANGTAG> only allows language tags that consists of max two
   components. However, the following document also seems to use tags
   with three or more tags like "zh-min-nan" and "en-GB-oed":
   http://www.iana.org/assignments/language-tags
- The presentation of <QNAME>, <BNODE_LABEL>, <STRING_LITERAL1> and
   <STRING_LITERAL2> suggest that these have two production rules. It
   took me quite some time to find out that these were just single rules
   that were spread over two lines. Placing the full rules on single
   lines will prevent this confusion for other readers.


>> Editorial comments
>> ------------------
> 
> 
> Noted and fixed where still relevant.
> 
>     Thanks
>     Andy

It seems that you missed one comment:

>> Section 2.1:
>> * The query in "Data descriptions used in this document" is said to be
>>   equivalent to the previous query, which is not true: this query
>>   has a variable as subject, whereas the previous query has a URI.

One new comment: there are two occurrences of "patten" which should be
replaced with "pattern".

--
Arjohn
Received on Thursday, 24 March 2005 14:57:50 UTC