Re: feedback on "SPARQL Query Language for RDF", v1.139 from Kevin Wilkinson on 2004-12-01 (public-rdf-dawg@w3.org from October to December 2004)

From: Kevin Wilkinson <wilkinson@hpl.hp.com>
Date: Wed, 01 Dec 2004 15:05:49 -0800
To: andy.seaborne@hp.com, public-rdf-dawg@w3.org
Message-ID: <41AE4E4D.FFF67174@hpl.hp.com>

andy,
   thanks for the quick turn-around on my comments.
attached are my comments on your changes.

kevin

comments on your changes (now referring to version 1.142
of the SPARQL Mdraft).

"Seaborne, Andy" wrote:
> 
> Kevin,
> 
> Thank you for such a detailed set of comments, and thank you marking up the text.
> 
> Changes logged below.
> 
> Kevin Wilkinson wrote:
> > attached are my comments on v1.139 of the SPARQL spec.
>
...
> 
> Changes in v1.141 until noted otherwise.
> 
> >
> >     1 Introduction
> >
> > An RDF graph is a set of triples, each consisting of a /+subject+/, an
> > +object+, and +/predicate/ that specifies+ a property relationship
> > between them+,+ as defined in RDF Concepts and Abstract syntax
> > <http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Datatypes-intro>.
> 
> A/ Can't see the change in subject and object.

the suggested change was to italicize subject, predicate and object.
your convention in other places in the document is to italicize
terms when they are first introduced (or for emphasis). i
thought it was appropriate to italicize subj, pred, obj. 

as for predicate vs. property, consistency is important. but,
it might be good to imply here that predicate and property are
synonyms because property crops up in other places, e.g.,
rdf:Property, InverseFunctionalProperty, etc.


> >     2 Making Simple Queries
...
> Leave this to Eric.

re: graph1, graphPattern1, don't neglect to change the headers
in the result table, i.e., referrer, reference, author are all
wrong. i forgot to mention this in my pervious message.

> >       2.1 Writing a Simple Query
...
> > The terms delimited by "<>" are URI References [13] <#ref13> (URIRefs);
> > URIRefs can also abbreviated with an XML QName-like form [14] <#ref14>;
> > this is syntactic assistance and is translated to the full URIRef.
> > -Other RDF terms-+The terms delimited by double quotes+ are literals
> > which, following N-Triples syntax [7] <#ref7>, are a string and
> > -optional language tag (introduced with '@') and datatype URIRef
> > (introduced by '^^')-+optionally, either a language tag (indicated by
> > '@') or a datatype URIRef (indicated by '^^')+.
> 
> Changed to:
> """
> The RDF terms delimited by double quotes ("") are literals which, following
> N-Triples syntax [7], are a string, in quotes, an optional language tag,
> introduced with '@', and optional datatype URIRef, introduced by '^^'.
> """

but your phrasing admits the possibility of a literal having
a language tag and a datatype. that's why i prefer my original
wording, "...in quotes, optionally followed by either a
language tag ... or a datatype ...".


> 
> >       2.2 Triple Patterns
...
> > -In SPARQL, a triple pattern is an RDF triple but with the addition that
> > components can be a query variable instead.-
> >
> > +In SPARQL, a triple pattern is an RDF triple in which any component can
> > be a query variable.+
> 
> A triple pattern is not an RDF triple if it has different contents.  Text left
> as is.

but your phrasing also makes it sounds like a triple pattern is an
RDF triple.  how about "... triple pattern is +similar to+ an RDF
triple but with the addition ..."

> > *Definition:* Substitution
> >
> > A substitution S is a partial functional relation from variables to RDF
> > terms or variables. We write S[v] for the RDF term that S pairs with the
> > variable v and define S[v] to be v where there is no such pairing.
> > �
> >
> > *Definition:* Triple Pattern Matching
> >
> > For +substitution+ S and Triple Pattern T, S(T) is -the-+a+ triple
> > pattern +formed+ by replacing any variable v in T with S[v]. (KW
> > Comment: there may be more than one such triple pattern, correct?)
> 
> No - a substitution is a function and is well-defined.  Applied to a triple
> pattern there is only one triple pattern produced.

i found the notation S(T) confusing since S is a function
from variables; i don't know what S(T) is. a different
function? can't a variable be bond to multiple terms? that's
why i thought S(T) would produce multiple triples.

> > Triple Pattern T matches RDF graph G with substitution S, if S(T) is a
> > triple of G.
> >
> > (KW Comment: the above definition (of Triple Pattern T match G) is a
> > second definition of triple pattern matching. Previously, at the start
> > of section 2.2, you say that a pattern matches all triples with
> > "identical" RDF terms. Is it obvious that these two definitions are
> > identical? Maybe prefix the first definition by saying it is an informal
> > definition.)
> 
> I hope the use of the boxes does that informal/formal.  Will consider - theer is
> also a comment outstanmding from Yoshio about putting all definitions before the
> nararrative text.  Probably not possible at the very start of the doc.

i agree with yoshio. i prefer that definitions precede the examples.

> > For example, the query:
> >
> > SELECT * WHERE ( ?x ?x ?v )

perhaps "SELECT ?x, ?y, ?z" is better than "SELECT *"
since the '*' form of Select is not defined until section 10.


> >       2.3 Graph Patterns
> >
> > -The keyword WHERE is followed by a /Graph Pattern/ which is made of one
> > or more /Triple Patterns/. These Triple Patterns are "and"ed together.
> > More formally, the Graph Pattern is the conjunction of the Triple
> > Patterns. In each query solution, all the triple patterns must be
> > satisfied with the same binding of variables to values.-
> >
> > +A /Graph Pattern/ is one or more /Triple Patterns /"and"ed together,
> > i.e., a conjunction of Triple Patterns. In a match, all the triple
> > patterns must be satisfied with the same binding of variables to values.+
> 
> Trying to, informally, explain the syntax at this point, hence the keyword
> WHERE.  Avoid solution though.

i still prefer my wording since the objective of 2.3 is to explain
graph patterns. graph patterns do not need to have "WHERE" in front
of them. if you want to mention the WHERE word, i suggest doing that
in section 2.1 where you introduce querying.


> > Data:
> >
> > @prefix foaf:    <http://xmlns.com/foaf/0.1/> .
> >
> > _:a  foaf:name   "Johnny Lee Outlaw" .
> > _:a  foaf:mbox   <mailto:jlow@example.com> .
> >
> """
> There is a bNode [12] in this dataset, identified by _:a. The label is only used
> with the file for encoding purposes. The label information is not in the RDF
> graph. No query will be able to identify that bNode by the label used in the
> serialization.
> """

i would suggest dropping the above paragraph. there is no need
to introduce the concept of bnode labels at this point. it's
distracting from the discussion of graph patterns. bnode
labels/serialization are covered just fine in 2.5.


> > *Definition:* Graph Pattern (Partial Definition) �?? Conjunction
> >
> The defintion of "matching" is being built up through the document.  Each
> definition has a qualifier - here the defintion is "Graph Pattern Matching".

except that the definition of triple matching skirts the
issue by defining match in terms of "contains". perhaps
that's intentional. but, issues like plain literals being
equivalent to xsd:string-typed literals and issues with
bnodes are not mentioned.


> And what's more, on reflection, I don't think "simply entails" is necessary.
> Subgraph would be clearer and *at this point* the definitions don't rely on
> entailment.  The binding really does have the bNode as its value.  It's later,
> on encoding results, that this is broken.  It must be the same bNode to match
> again later.

good. i agree that subgraph would be clearer.

> >       2.4 Multiple Matches
> >
...
> >  _:a foaf:name  "Johnny Lee Outlaw" .
> >  _:a foaf:box   <mailto:jlow@example.com> .
> >
> >  _:b foaf:name  "Peter Goodguy" .
> >  _:b foaf:box   <mailto:peter@example.org> .
> >
> > (KW Comment: I don't like the above example because it illustrates two
> > concepts. First, it shows that a query may have multiple solutions.
> > That's fine. But, it also illustrates the results can be a projection of
> > the query variables. This raises additional questions, specifically, how
> > are duplicates handled. I'd feel better if this example included
> > variable 'x' in the result list.).
> 
> Point taken.  However, (1) this isn't the first time projection has happened and
> (2) avoiding bNodes in results is desirable for clarity.  The only option would
> be to not use FOAF but then we wouldn't have that is familiar to at least some
> people.  Having synthetic data is rather dry.  In this example, there aren't
> duplicates.

would it be too bizarre to have URIRef's in place of the bnodes, e.g.,
     ex:a  foaf:name  "Johnny Lee Outlaw" . ...


> > *Definition:* Query Solution
> >
> > A Query Solution is a Pattern Solution where the pattern is the whole
> > pattern of the query.
> >
> > *Definition:* Query Results
> >
> > The Query Results, for a given graph pattern GP on G, is written
> > R(GP,G), and is the set of all query solutions such that GP matches G.
> >
> > R(GP, G) may be the empty set.
> >
> >
> >       2.5 Blank Nodes
> >
> >
> >         Blank Nodes and Queries
> >
> 
> """
> BNodes can't appear in a SPARQL query. There is no standard representation of
> bNodes in RDF and the syntax of SPARQL queries does not allow them.
> 
> They do take part in the pattern matching process.

perhaps add "take part in the pattern matching process +by being
bound to variables in triple patterns+".

> """
> Committed version 1.141


i also had feedback on sections 3-6 and 10. i assume
you got that feedback in my original message and are
still processing those changes. if not, please let
me know and i can resend them.

kevin

Received on Wednesday, 1 December 2004 23:02:00 UTC