Re: rq25 (1.18) review (part I) from Seaborne, Andy on 2007-03-01 (public-rdf-dawg@w3.org from January to March 2007)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 01 Mar 2007 10:37:41 +0000
To: dawg mailing list <public-rdf-dawg@w3.org>
Message-ID: <45E6ACF5.7060408@hp.com>
Addressing editorial comments in this pass up to end sec 4.

-------- Original Message --------
 > From: Kendall Clark <>
 > Date: 26 February 2007 23:21
 >
 > (The formatting got badly broken on the one Lee forwarded. This should
 > be easier to read.)

It appears to be the use of quote-printable.  Your email looks just fine until 
it passes through intermediate systems.  And Thunderbird seems to have 
problems quoting replies which are quote-printable, compounding the effect. 
This reply was initially generated using Outlook reformatting, which does seem 
to get it right, then put into Thunderbird for the threading.

...

 > Abstract

Only intermediate changes to the abstract and section 1 in this pass ...

 >
 > 2nd paragraph: The first sentence is very awkward ("the query language
 > part..."); way too colloquial and chatty for a spec, IMO.
 > I'd strike "for easy access to data". I'd strike the entire next
 > sentence starting "The SPARQL query language consists of..." in favor
 > of an actual *definition* -- ah, which there is in 1 Introduction,
 > making this redundant and unnecessary. Strike it.

Changed, for now, to:
"""
The SPARQL query language consists of the syntax and semantics for asking and 
answering queries against RDF graphs. It meets the requirements and design 
objectives described in RDF Data Access Use Cases and Requirements [UCNR]. 
SPARQL contains capabilities for querying by triple patterns, conjunctions, 
disjunctions, and optional patterns. It also supports constraining queries by 
source RDF graph and extensible value testing. Results of SPARQL queries can 
be ordered and presented in different result forms.
"""

Abstracts get quoted and copied around as brief descriptions; that will lead 
to possible duplication compared to starting at the introduction.

 >
 > In fact, I'd strike the entire paragraph. But "report forms" in the
 > last sentence should be "result forms", surely.

Corrected.

We'll need to revise this again so I've left in the note:
@@Revise when rest finished.

 >
 > s/Status of this Document/Status of This Document/

It was "Status of This document".

Changed to "Status of this Document" which seems to be the W3C norm.

 >
 > 1 Introduction
 >
 > I would strike the first 3 paragraphs.

Done.  Just left some bare bullet points

 > This section should begin with
 > "SPARQL consists of three documents". Though, actually, that's weird,
 > right? Why does the query language spec define SPARQL in toto?

It places the document in context and remind the reader they need to look 
elsewhere for the other, important, pieces.

 > The
 > protocol document offers a definition of *the protocol* (see http://
 > www.w3.org/TR/rdf-sparql-protocol/#ap). Shouldn't the query language
 > spec define *the query language*?
 > Is there a definition of the query language at all? (There is a
 > definition of a "SPARQL Query String", but that's different.)
 >
 > Strike the odd adjective "companion" that's used in front of
 > "protocol". Makes no sense.
 >
 > 1.1 Document Outline
 >
 > I would strike this entirely, especially as the distinction "informal"
 > v. "formal" is very problematic. Are they synonyms of the standard spec
 > terms "informative" and "normative"? If not, why not?

The wording is not good - a lot more of the document is necessary for the 
formal description.  It seems the word "informal" causes confusion - removed.

There is WG discussion to be had here.

 >
 > What *is* normative in this document? I can't tell. That's a serious
 > problem IMO. Given the "informality" of nearly all of it -- a tone
 > which I continue to object to -- how are we to resolve conflicts
 > between the "informal" and "formal" parts?

The convention (by scanning some W3C recs) appears to be that all sections are 
normative unless stated otherwise; appendices are not normative unless marker 
normative.

I've marked the grammar as normative.

 >
 > 1.2.2 Data Descriptions
 >
 > Strike "used to show each triple explicitly". A spec is *not* a meta-
 > commentary upon itself. There is, IMO, far too much of this kind of
 > self-referential commentary. A specification *specifies*; it does not
 > discuss, converse, comment, or muse.

The data description is Turtle and outside the spec.  It is worth noting that 
we are using Turtle in a way that shows every triple explicitly, in contrast 
to RDF/XML.

 >
 > 1.2.3 Results Descriptions
 >
 > "used as a descriptive term" -- huh? Is the idea here to define
 > "binding"? If so, I'd think the text might read something like "A
 > 'binding' is a pair (variable, RDF term)".

Done

 >
 > The last sentence of this section is grammatically incorrect (it's a
 > run-on sentence), and I would simply strike it. (All things equal, a
 > shorter spec is a better spec.)

kept """Variables are not required to be bound in a solution.""" and remove 
the reminder.  It's a feature of SPARQL that unbounds occur.

 >
 > 1.2.4 Terminology
 >
 > What are "RDF URI References"?

An "RDF URI Reference" is a term defined by RDF-core.

In RFC 2396, "URI" does not include the fragment part, a URI reference does. 
But URI references are not absolute, URIs are.  So RDF-core defined "RDF URI 
Reference" for absolute URIs with a possible fragment.

RFC 3986/3987 changes this; an IRI is absolute and may include a fragment.

In addition, RDF-core anticipated the character set issues of IRIs except the 
final IRI RFC went a different way on spaces.  This is noted in rq25.

 >Is that a special term that we import
 > from somewhere else? If so, can't it be hyperlinked or defined?

The relevant link to RDF Concepts points to the RDF URI Reference section.

 > Generally it's accepted best practice in writing specs to define terms
 > either all at once in a glossary or at their first occurrence or both.
 >
 > "The following terms are used from RDF Concepts..." -- this is an
 > awkward wording. How about, instead, "The following terms are defined
 > in RDF Concepts..."?

Changed to:

"""
The following terms are defined in RDF Concepts and Abstract Syntax [CONCEPTS] 
and used in SPARQL:
"""

 >
 > That sentence is also a run-on by virtue of having no colon at the end
 > of it...
 >
 > IMO, we should not define terms used from another spec and then
 > *rename* them in this spec. This is
 > just confusing. "IRI" -> "RDF URI reference"; and "datatype IRI" ->
 > "datatype URI". If we're going to do this -- and I'd prefer we didn't
 > -- it should be more explicitly marked as such. Putting this into two
 > parenthetical phrases -- which suggests that that content is
 > secondarily important -- is likely to cause confusion.

This is the "Document Conventions" section and it is importing terminology 
from elsewhere.

 >
 > "SPARQL implementations may issue warnings..." -- how? Which ones?
 > There's a lot of talk about
 > warnings and errors, but no warnings or errors defined. Why not?

The local API is free to do what it wants.  The protocol would also be a way 
to issue warnings.

 >
 > 2 Making Simple Queries
 >
 > Last sentence: what does "fulfill a pattern" mean? Is that different
 > than or the same as "match" a pattern?

s/ to fulfill a pattern//

 >
 > 2.2 Multiple Matches
 >
 > "The results of a query are a sequence of solutions"; better: "The
 > result of a query is a sequence of solutions" or even just "a solution
 > sequence" -- which gives you a nice, crisp term that could be *defined*
 > or linked to its formal definition in the semantics section.
 >
 > Last sentence: "This is a basic graph pattern match..." is a run-on.

Fixed.

 >
 > 2.3 Matching RDF Literals
 >
 > Last sentence: "This RDF data..." contains a hyphen to separate a
 > range; but the standard orthography for ranges is an en dash, available
 > as "&ndash;" in HTML. This problem occurs throughout the doc:
 > http://en.wikipedia.org/wiki/Dash seems trustworthy on point.

Some tools don't work with &ndash; properly (possibly this is charset 
problems).  We've had document corruption problems before :-(

Fixed (for now).

 >
 > 2.3.1 Matching Language Tags
 >
 > "Language tags in SPARQL are expressed the same way as in Turtle." --
 > huh? Does that mean the same grammar production is used in each
 > language?  Why is this even relevant here?

"""Language tags in SPARQL are expressed using @ and the language tag."""

(In my case "yes", it's the same grammar production  :-))


 > 2.3.2 Matching Numeric Types
 >
 > This first sentence should be struck. Either specify the integer
 > datatype and then give an example; this sentence does both,
 > simultaneously, and confuses me on both counts. Also, it's "e.g.", not
 > "eg".

Done.

"""
Integers in a SPARQL query indicate an RDF typed literal with the datatype 
xsd:integer. For example: 42 is a shortened form of 
42"^^<http://www.w3.org/2001/XMLSchema#integer>.
"""

 >
 > 2.3.3 Matching Arbitrary Datatypes
 >
 > Last sentence is a run-on. And I don't understand it: "the literal is
 > known to match" -- known to whom? Huh?

"""
The following query has a solution with variable v bound to :y. The query 
processor does not have to have any understanding of the values in the space 
of the datatype because the lexical form and datatype IRI both match so the 
literal matches.
"""

The point being made is that same lexical form, same datatype is enough for a 
match, even if the datatype value space is not fully understood.

 > 2.4 Blank Node Labels in Query Results
 >
 > This entire section should be redrafted. It's confusing, disjointed,
 > and vague. What does "local to a result set" *mean*? I have no idea.
 > And "...should not expect blank node labels in a query to refer to a
 > particular blank node" -- What is a 'particular blank node' here?
 > Are we entirely comfortable
 > talking about what computer processes should not "expect"?

 > Surely this
 > should just talk about *matches* instead of all this "refer" and
 > "reference" talk. Is that defined explicitly anywhere? If so, can we
 > get a link there? What about "co-occurrences of blank nodes" -- what
 > does that mean?

This issue is serialization in the presence of blank nodes.

Does

"""
Blank node labels are scoped to a result set (as defined in "SPARQL Query 
Results XML Format") or the graph for the CONSTRUCT query form. An application 
writer should not expect blank node labels in a query to refer to a particular 
blank node in the data. Use of the same label within a result set of graph 
indicates the same blank node.
"""
work better for you?

 >
 > Last sentence: "There need not be any relation..." -- I don't know what
 > this means. There "need not be", but there is anyway? There is, but
 > only contingently? And what kind of "relation" is being ruled out here?
 > Not a lexical identity relation, surely.
 >
 > 3 RDF Term Constraints
 >
 > "A constraint may lead to an error condition..." -- two issues here:
 > first, this is another error
 > thingie that could happen but it's not specified, so it's not clear how
 > to distinguish it from something else. Second, is this the 'may' of
 > specification or colloquial speech? Why doesn't rq25 use terms like
 > "may", "must", "must not", etc in their standard specification sense?

s/may//

 >
 > At the very least, if it's not going to use them in that way, it should
 > *say* that it's not going to use them in that way so that readers don't
 > interpret them in that way by mistake.
 >
 > But, really, shouldn't there be some really solid, domain-specific
 > reason why we aren't *specifying* using "may", "must", etc?
 >
 > I'm all for flouting convention and throwing over best practices, but
 > surely you need *good* reasons to do so? What are our good reasons?

Generally:

The use of "may"/"should"/"must" etc in the sense of RFC2119 is indicated by 
their use in some emphasized form (bold or capitals usually, depending on the 
medium).  They refer to obligations on the implementations.  We are discussing 
what SPARQL is, not what an implementation may/must/should do.  That resides 
with the protocol document - it's the request that is executed that has the 
main conformance obligation.

 >
 > Last sentence: drop the parens.

Done.

 > 3.2 Restricting Numeric Values
 >
 > The second sentence is a total non sequitur as written.

"""It is also possible to restrict the values of literals that have numeric 
types."""

 >
 > 3.3 Other Term Constraints
 >
 > There are an alarmingly high number of "@@" in this doc; this section
 > is but one example. Lots of @@ in grammar rules, it seems. This does
 > not seem, to me, a sign of stability...
 >
 > 4.1.1 Syntax for IRIs
 >
 > Most of the first paragraph is redundant. Why is this being repeated?
 > Repetition like this is
 > analogous to cut-and-paste chunks of code; it's brittle and introduces
 > errors.

Where is is repeated?  This is the only text that I can find that discusses 
the issues of IRI references and IRIs.  There text in A.5 but that is about 
the grammar rule.

The syntax work <> is an IRI reference; SPARQL is defined over IRIs, moving on 
from where RDFc-ore was because of RFC 3986.

 >
 > "It is mapped to an IRI by concatenating *the* IRI..." -- add "the"
 > to 2nd-to-last sentence.

Done.

 >
 > Last sentence: "may be the empty string" -- huh?

"may be empty" ??

:x and x: are legal.

 >
 > 4.1.2 Syntax for Literals
 >
 > What's a "general syntax"? Is it different than a "syntax"?

There are several syntax forms for literals - the only one capable of 
expressing the general case is the ""^^ / ""@ form.

 >
 > 4.1.3 Syntax for Query Variables
 >
 > "...does not form part of the variable name" -- better: "...is not part
 > of the variable name..."

Done

 >
 > 4.1.4 Syntax for Blank Nodes
 >
 > "The same blank node labels may not be used in two separate basic graph
 > patterns." -- Surely, even in informal, commentary style pseudo-
 > spec'ese, this should be "must", not "may". And shouldn't it read "two
 > or more"? *May* one use the same blank node label in *three* separate
 > basic graph patterns? In 5?

s/may/must/
"""The same blank node labels can not be used in two different basic graph 
patterns in the same query."""

And if it's used in 3, it must also have been used in 2.

 >
 > 4.2.3 RDF Collections
 >
 > I find this entire section very confusing. I can't tell what's
 > "allocated": blank nodes or triple
 > patterns or both.

 >
 > "These allocated blank nodes allocated do not occur elsewhere in the
 > query." -- "allocated" should be dropped,

second one dropped.

 > but I'm not sure which one...
 > And "These allocated blank nodes..." is vague. Which ones?

the ones for the list

[4.2.4.
 > "...is short for:" -- does that mean "is equivalent to" or something
 > else?

changed to "short form" like previously.

It may be equivalent abstractly but strictly it's not equivalent because 
different grammar productions are involved.  "short form" (sometimes 
"syntactic sugar") is the term I know to use for this.

 >
 > 4.3 Syntax for Constraints
 >
 > Is this ready for LC?

This is not an LC draft.   It is text for the WG to review.

- - - - - - - - -
Changes so far checked in as version 1.24.

	Andy
Received on Thursday, 1 March 2007 10:38:01 UTC