- From: Arjohn Kampman <arjohn.kampman@aduna.biz>
- Date: Thu, 24 Mar 2005 15:57:49 +0100
- To: andy.seaborne@hp.com
- Cc: public-rdf-dawg-comments@w3.org, Jeen Broekstra <jeen@aduna.biz>
Seaborne, Andy wrote: > Arjohn, Jeen, > > Thanks for the comments. I have incorporate the editorial ones: thi > sreply only contains discussion poinrs. > > Arjohn Kampman wrote: [...] >> General comments (in no specific order) >> --------------------------------------- >> >> - We are not very fond of SELECT-WHERE-FILTER construction. Considering >> that the FROM keyword is no longer used for specifying datasets; how >> about adopting the SQL-style SELECT-FROM-WHERE construction instead? >> It could prevent confusion with people coming from a database world >> that expect the WHERE-clause to contain boolean constraints. > > > The protocol will provide some means for specifying the target of a > query so the matter has not changed. I agree that people thinking that > SPARQL is some strange SQL wil casue problems but at the same time, the > the analogy is also helpful. The protocol draft is next on my reading list, so I wasn't aware of this. I'll get back to this issue when I've finished reading that spec. > Until SQL, FILTER can appear inside the pattern, and not in a separated > clause, so app writer can place it next to the thing it affects if they > wish to. > > By the way, WHERE is actually an optional word. You can write queries > without it if you prefer. Personally, I have a strong association of WHERE-clauses with boolean expressions. IMHO, it doesn't feel "right" to put path expressions in a WHERE-clause, but I might be able to get used to it ;-) >> - The document suggests that (parts of) queries can only be evaluated on >> a specific graph: either the background graph or a named graph. We >> would have expected that, when no specific graph label is specified, >> the query would be evaluated on the union of all graphs. > > > That set up is possible - make the background graph include the RDF > merge of the named graphs - but it is not the only configuration of an > RDF dataset. The background graph is the knowledge base and includes > the things the application is saying is its knowledge - it may not > believe what's in some or all of the named graphs automatically. > > http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0070.html Thanks for the reference, it made things a little clearer. I still have a number of questions and doubts though: My understanding of the email is that it's up to the application to decide whether the background graph includes all named graphs, or that it is a separate graph (or even some other constellation). If this is true, then I would be a favor of the former, where the background graph is the union/merge of all triples. The way I see it, the graph label should be an ignorable attribute of triples. With the latter approach, this is no longer true as query results will depend on whether the graph label is queried or not. This essentially comes down to redefining RDF from triples to quads. In order to realize the "ignorability" of graph labels, the triple pattern "{ ?s ?p ?o }" would have to match all triples, regardless of the fact that they have zero, one or more than one label. The behaviour of the pattern "GRAPH ?G { ?s ?p ?o }" is not immediately clear in this setting. It could query just the triples with one of more labels. Or it could query all triples, leaving the ?G variable unbound from triples that have no label. [...] >> - Named graphs are identified by URIs; bnodes or literals cannot be used >> for this purpose. This forces application developers to generate URIs >> when a simple string would be sufficient. Supporting literals as graph >> names would allow developers to use simple string or datatyped dates >> to tag specific sets of statements. Would this be useful? > > > Web resources are named by URI - the global uniqueness means that one > system can communicate that name to another without confusion. Sure, but that doesn't answer our question. If one wants to communicate the label of a named graph, one should use a URI. But if this is of no concern, would it be useful to support bnodes and/or literals as label? >> - The definition of DESCRIBE is very loose: maybe too loose to be useful >> in practice? An application developer would likely have a guarantee as >> to whether the mechanism yields the info that is needed. As it is now, >> the mechanism could very well result in the development of several >> "DESCRIBE-dialects", which offer this guarantee for specific use >> cases. We think a fixed definition like "it returns the bnode closure >> for the concerning URIs" would be more useful. > > > There have been many definitions of a description and each seems to have > some application domain assumptions. The SPARQL protocol service > description woudl be a place to state what a given service offers - the > point about DESCRIBE is that it is not defined exactly by the client > (c.f. CONSTRUCT). > > Even "bnode closure" is tricky - FOAF is all bNodes. > > We may see common descriptions emerging in various domains, such as LSID > getMetaData. If no specific definition of the result for a DESCRIBE query can be given, then wouldn't it be better to leave this definition to the developers of these specific protocols and remove it from SPARQL? As the developer of one of the available "semantic web frameworks", I find it difficult to decide how to implement this functionality. There simply is no single decision that will fulfill the needs of all. I think the DESCRIBE-queries have a huge potential for introducing incompatibilities between various SW-frameworks, which is not good. >> - SeRQL offers default bindings for the often used prefixes 'rdf', >> 'rdfs' and 'xsd'. If not specified in the query itself, these prefixes >> map to the standard RDF, RDF Schema and XML Schema namespaces. This >> has proved to be very convenient. Is this a feature that should be >> added to SPARQL too? We noted that the comment for version 1.244 of >> the document mentions: "Removed text for default prefixes for rdf: >> rdfs: owl: xsd:", but we we're unable to find a reason for this in the >> mailing list archives. > > > It didn't seem to have sufficient support from within the WG. Too bad. Of course, default bindings can still be added in a later version once people start using the language for real ;-) [...] >> - There is a strong demand from the Sesame community to add ORDER BY and >> GROUP BY/COUNT functionality to SeRQL. It's good to see that the >> former has already been added to the editor's draft. However, we feel >> that the latter is just as important. Having to transmit complete >> query results only to be able to count specific rows adds a lot of >> unnecessary network traffic and can really hurt performance. > > > Could you write this up as a use case? What is being counted? > Individuals or names (URI labels, bNodes etc etc). > > As a use case, even if the issue is not address in this round, it can be > logged as a postponed issue. In particular, there are strong closed > world assumptions about applying aggregate functions so it would be good > to understand as much about this requirement as possible. We'll consider doing this. Also, we're planning to implement this in SeRQL, which might yield valuable input. > From below: > > Section A: > > * We have a number of remarks concerning the grammar, which is ambiguous > > or at least needs unnecessary large look-aheads in a number of rules. > > However, we're not sure if the grammar is considered to be final > > enough for this kind of comments. Please let us know if you're > > interested. > > The grammar is getting close. There is a tradeoff to be had been > expressing the grammar clearly and introducing extra, artificial states > (they don't represent an abstraction the app writer thinks about) for > some particular gramamr tool. The objective is not to be the grammar a > particular system can just copy across. > > Globally, the lookahead is 1 - locally, a parser may either wish to use > extra states of locally increase lookahead. What parsing mechanism are > you using? Mainly JavaCC. Some issues with the current grammar that might be worth resolving: - It allows "DESCRIBE <my:URI> WHERE ..." - The first rule for PropertyList is both recursive and repetitive. Substituting the '*' with a '?' would fix this. - Same issue as above for ObjectList. - An equivalent but clearer definition for Collection would be: Collection ::= '(' GraphNode* ')' - The rule for ConditionalXorExpression is both unnecessary and confusing. It should probably be removed. - The Expression argument for functions like STR, LANG, DATATYPE, etc. seems to be too generic. It even allows one to apply these functions on boolean expressions containing ANDs and ORs. Might it be possible to replace these arguments with VarOrTerm? - RDFLiteral allows the definition of literals with both a langauge tag and a datatype. Should be easy to fix, e.g.: RDFLiteral ::= String ( <LANGTAG> | '^^' URI )? - <LANGTAG> only allows language tags that consists of max two components. However, the following document also seems to use tags with three or more tags like "zh-min-nan" and "en-GB-oed": http://www.iana.org/assignments/language-tags - The presentation of <QNAME>, <BNODE_LABEL>, <STRING_LITERAL1> and <STRING_LITERAL2> suggest that these have two production rules. It took me quite some time to find out that these were just single rules that were spread over two lines. Placing the full rules on single lines will prevent this confusion for other readers. >> Editorial comments >> ------------------ > > > Noted and fixed where still relevant. > > Thanks > Andy It seems that you missed one comment: >> Section 2.1: >> * The query in "Data descriptions used in this document" is said to be >> equivalent to the previous query, which is not true: this query >> has a variable as subject, whereas the previous query has a URI. One new comment: there are two occurrences of "patten" which should be replaced with "pattern". -- Arjohn
Received on Thursday, 24 March 2005 14:57:50 UTC