- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 11 Oct 2004 12:05:24 +0100
- To: Kevin Wilkinson <wilkinson@hpl.hp.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Kevin,
Thank you for the comments. The audit trail is below and the changes, where
made, go into v1.108, v1.109, v1.110
Kevin Wilkinson wrote:
> here are my comments on draft 1.104 of the SPARQL Query Lang.
> document. so far, i've only reviewed sections 1-5. i'll send
> comments, if any, on the remaining sections later this week.
>
> in general, great job by eric and andy. many of my comments
> are word-smithing. ignore or incorporate as you see fit.
>
> i used the following notation in my comments.
> '-' delimits things to remove.
> '+' delimits things to add.
> *NOTE: blah, blah, blah * delimits my comments.
> undelimited text is used to provide context.
>
> kevin
>
>
> ------------------------------------------------------------------------
>
>
> Comments on SPARQL draft 1.104 (2004/10/08) - Kevin Wilkinson
>
> 1 Introduction
>
> An RDF graph is +encoded as+ a set of triples,
> -each consisting of -
> +each comprising+ a subject, an object-,- and a
> property relationship between them [12].
Nack. The RDF graph *is* a set of triples.
> +A triple is also referred to as a statement.
I have been avoiding "statement" and using triples through out. Query works on
the set of triples, statement is the logic concept represented by a triple.
Statement is the term used by the first RDF spec; the recent WG prefers triple
in this context, I believe.
> The
> RDF terms in a triple are either URIs, blank nodes
> (bNodes), plain literals and typed literals (defined in RDF
> Concepts and Abstract syntax).+
I have moved the link to "RDF Concepts" to this point.
> ... it may be a graph that is partly calculated on
> demand +(e.g., by giving the inference closure)+,
> or it may be an RDF representation of a legacy database.
>
> SPARQL is a query language for accessing such RDF graphs.
> It provides facilities to:
> * -select- +extract+ information +, i.e. extract
> subjects, properties and/or objects, from queried graphs+
"extract is better : now says
"""
* extract information in the form of URIs, bNodes, plain and typed literals.
"""
> * extract RDF subgraphs +of queried graphs+
I think this is implied by "sub"
> * construct new RDF graphs based on information -from
> the target of the query- +in the queried graphs+.
Done.
>
> As a data access language, it is suitable for -both local
> and remote use- +querying graphs that are either local to
> or remote from the client (host machine).+
Hmm - need also to consider the case of same machine, different process. I'll
leave it as the vaguer "local and remote".
>
>
> 2 Making Simple Queries
>
> -Queries match graph patterns against the target graph of
> the query. Patterns are like graphs but may named variables
> in place of some of the nodes or predicates; the simplest graph
> patterns are single triple patterns. The RDF terms are URIs,
> blank nodes (bNodes), plain literals and typed literals (defined
> in RDF Concepts and Abstract syntax). Graph patterns can be
> combined using various operators into more complicated graph
> patterns.-
> +Queries match graph patterns against the target graph(s) of
> the query. The simplest graph pattern is a single triple
> pattern. This is a triple comprising RDF terms or named
> variables and it matches all triples in a graph whose
> corresponding subject, object or property are equal to the
> correspond RDF term in the pattern. The named variables in
> the pattern, if any, are then bound to their corresponding
> subject, object or property in the matched triples. More
> complicated graph patterns can be constructed from single
> triple patterns and various operators.+
I leave this to Eric but I don't think we have to make the change in order to
publish. This part needs to be rewored based on the rest of section 2 and
especially 2.1. As such, one set of approximaye words will do.
>
> A binding is a mapping from the variables in a query to terms.
> A result mapping is a binding which, when applied to the
> variables in the query,
> -produces a subgraph of the target graph-
> +produces a set of terms from the queried graph+; a result
> is a set of result mappings. If there are no result mappings,
> the result set is empty.
Some rewording. Needs to brought into line with the rest of the document.
>
> Pictorially, suppose we have a graph with two triples and
> +apply+ the given triple pattern:
>
> -with- +we get the+ result:
Will leave pending revision of this section.
>
> *NOTE: I suggest using the graph0 and query0 rather than
> triple1-2 and triplePattern1. Multiple triples form a graph
> and a triple pattern IS a query applied to a graph. So, the
> picture is a bit confusing.*
>
> -RDF graphs are constructed from one or more triples, ex. graph1.-
> +A more complicated query may combine bindings from multiple
> triple patterns. Consider query1 applied to graph1.+
>
> *NOTE: the figure for query1 has a typo: change ?addrm to ?addr.*
Leave for Eric - I can't edit the pictures.
>
> 2.1 Writing a Simple Query
>
> +SPARQL uses an SQL-like syntax for expressing queries.+
I was hoping not tohave to justify a claim of "SQL-like" as it means different
things to different people.
> The example below ... and WHERE clause -gives- +contains just+
> one triple pattern.
Done by s/gives/has/
>
> The terms -quoted- +delimited+ by "<>" are URI References.
Done
>
> -Variables are indicated by '??'; the '?' does not form part
> of the variables' name.-
> +Variable names are prefixed by '??'; the '?' is not part
> of the variable?'s name.+
There has been some processing error here - entities keep geeting corrupted (any
idea why Eric? tidy?).
>
> Because URIRefs can -become- +be+ long,
Done
>
> Prefixes are syntactic: the +prefix+ name -chosen- does not
> -effect- +affect+ the query,
Done
> -nor does it have to be the same as the data-
> +nor do prefix names in queries need to be the same prefixes
> used for data+.
Done
>
> *NOTE: just wondering if, in the context here of typed literals,
> the document should mention that plain literals will match typed
> literals with the type xsd:string. Also, would a plain literal
> match a literal with a lang tag? Or would an int-typed literal
> match a float? etc. At some point, the doc should point out some
> of the nuances with typed and lang-tag literal matching.*
Good points but we can be delay until section 12? I don't want to get the
reader sidetracked by plain literals match xsd:strings or issues about
xsd:integer comparing to xsd:float/xsd:double just at this point
Dalyed until after publication.
>
> 2.2 Triple Patterns
>
> The building blocks of queries are triple patterns. Syntactically,
> a SPARQL triple pattern is a subject, predicate and object
> -enclosed in '()'s- +delimited by parentheses+.
Done
> The previous example +query+ shows a triple pattern with a
> -variable subject (book), a predicate of dcore:title and a
> variable object (title).-
> +a predicate of dcore:title and variables for subject and object.+
Have attempted rewording here.
>
> -A triple pattern is matched against the graph by finding values
> for values for variables so that the triple pattern, with values
> substituted for variables, is a triple in the graph being queried.-
> +A triple pattern applied to a graph matches all triples with
> identical RDF terms for the corresponding subject, predicate
> and object.The variables in the triple pattern, if any, are then
> bound to the corresponding RDF terms in the matching triples.+
Done.
>
> *NOTE: "RDF URI Reference" is frequently used. Why not just say
> URI? Is an RDF URI somehow different from a URI? Is a URI Reference
> different from a URI?*
URI (currently - RDF2396) does not include the #frag part. URIRef includes teh
#frag part. I understand that this is to change and "URI" will cover URIRefs as
well in a revised 2396.
>
> A query variable is a name -, used to define queries as graph patterns-.
> *NOTE: I have no idea what that last phrase means. Delete or
> rephrase it.*
Its trying to informall scope the variables.
>
> *NOTE: this section introduces the term ?query variable?. Is this
> different from ?variable?? I think not. So, why not just stick with
> ?variable?? Another inconsistency in this section is that ?Triple
> Pattern? is capitalized whereas previously it was lower-case.
> It?s unclear why. Is it a mistake?*
Yes - a mistake.
Tried to fix up.
>
> -We show- +In this document, we illustrate+ bindings in results
> in tabular form -, for example:- +with one header row containing
> all variable names and a value row for each mapping of the
> result variables. For example:+
Done.
>
> +Note that literal values are quoted, except for integers. URI?s
> are delimited by angle brackets except occasionally QNames will
> be used.+
> *NOTE: I added the above because the examples are NOT consistent
> with respect to formatting of the result bindings. You may want
> to change the examples to be consistent (e.g., all literals are
> quoted, all URIs delimited). If not, you should definitely
> add the above sentence.*
>
> -Not every binding needs to exist in every row of the table.-
> *NOTE: I am not sure what is meant by the above. Please rephrase
> it. Do you mean that, due to optionals, that some variables will
> not be bound in a result row?*
They should be consisteny. If you find any that aren't, please let me know.
>
> *NOTE: in the Definition of Triple Pattern Matching, I?m having
> trouble making the leap from B, a binding of one variable, to SB,
> a set of bindings for multiple variables. I?m really confused how
> the individual bindings, B, are combined, e.g. cross-product,
> concatenated, what? I know it?s neither but that?s how I read it.*
A binding is a single pair (var, RDF Term)
A set of bindings is a set of pairs.
{ (var1, term1) , (var2, term2) , ... }
>
> If the same variable name is used more than once in a pattern
> then, within each *solution* to the query, the variable has the
> same value.
> *NOTE: ?solution? is undefined in the above sentence.
> Did you mean to say ?substitution?? If not, you need to define
> ?solution?.*
It's a forward reference. Will leave for now. I'd like to use the right
terminology if it reads OK.
>
> 2.3 Graph Patterns
>
> The keyword WHERE is followed by a Graph Pattern which is
> -made of one or more Triple Patterns. These Triple Patterns are
> "and"ed together. More formally, the Graph Pattern is the conjunction
> of the Triple Patterns.-
> +a Triple Pattern or a conjunction of Triple Patterns.+
> In each query *solution*, each triple pattern must be satisfied
> with the same binding of variables to values.
> *NOTE: again, ?solution? is undefined. I?m not sure I understand
> the above sentence.*
Minor working change s/each triple pattern/all the triple patterns/
>
> There is a bNode [12] in this dataset. Just within the file, for
> encoding purposes, the bNode is identified by _:a but the
> information about the bNode label is not in the RDF graph. No query
> will be able to identify that bNode by name.
> *NOTE: I'?m not sure I understand the last sentence. It implies that
> a bNode CANNOT be a value in a triple pattern, since that would be
> identifying the bNode by name. I don?t think that is the intention
> but that is how it reads to me.*
It can be a value - it can't be written in a query.
Is
"""
No query will be able to identify that bNode by the label used in the serialization.
"""
>
> *NOTE: in the Definition of Graph Pattern (Partial Definition), it
> states that a set of triple patterns is a graph pattern. However,
> the sentence above this definition states that a graph pattern is
> *two* or more triple patterns. This is not consistent. A set can
> have one member.*
Fixed.
"""
A conjunctive graph pattern is a set of triple patterns
"""
>
> 2.4 Multiple Matches
>
> The results of a query are all the ways a query can match the graph
> being queried.
> *NOTE: I'?m confused here. Does ?'result' refer only
> to the result variables or to the complete set of bindings for the
> graph pattern? If the former, then, since the result variable list
> may not include ALL variables in the query, it seems like it could
> exclude some ways in which the query matches the graph (especially
> if duplicates are eliminated). So, please be specific if you?re
> referring to the result variables or all variables.*
We haven't defined SELECT variables yet - its complete set of bindings. It's
all variables.
>
> *NOTE: aha, here?s the definition I was looking for. Unfortunately,
> I don?t understand it. But, I?ll keep trying. One thing I?m concerned
> about is what happens if the query variables are not ?connected??
The only meaning I understand for "connected" is in graph terms. Graph patterns
do not define connected pattern graphs.
> Does the definition still make sense? For example, consider the query
> "Select ?name, ?mbox Where (?x foaf:name ?name) (?y foaf:box ?mbox)".
> There are no linking variables in this query. We need to ensure
> that these queries are well-defined.*
A legal query. It may not be intended :-)
>
>
> 4 Including Optional Values
>
> *NOTE: rename section to simply ?Optional Values?.*
Nack - out of style.
>
> For every solution of the query, every variable has
> -an RDF Term- +a value+.
Pat recommedns avoiding "value" due to confusion with the value space of typed
literals.
> -But RDF data is semi-structured data;-
> Sometimes useful, additional information about some item of interest
> in the graph can be found but, for another item, the information is
> not present.
Done
> -The application writer would such additional information but does
> not want the query to not match just because the some information
> is missing.-
> +If the application writer wants that additional information, the
> query should not fail just because the some information is missing.+
Done
>
> In the example, only a single triple +pattern+ is given in the
Done
> optional match part of the query but in general it is a graph
> pattern.
>
> Optional blocks can also be nested +as described in Section .xxx+.
I will add pointers around the document when section structure is stable.
>
> *NOTE: I assume that the optional block need NOT be connected to the
> rest of the query. For example,
> "Select ?name, ?time Where (?x foaf:name ?name)
> [ ex:timezone/#ECT ex:datetime ?time ] " *
>
> -If a variable was introduced in one optional block and mentioned
> in another, it would be used to constrain the second. Reversing the
> order of the optional blocks would reverse the blocks in which the
> variable was was introduced and was used to constraint.-
> +If a variable was bound in one optional block and referenced in
> another, it would constrain the second. Reversing the order of
> the optional blocks would reverse the blocks in which the
> variable was bound and constrained.+
Nack - but needs rethinking anyway as it is too procedural.
>
>
> 4.3 Optional Matching - Formal Definition
>
> *NOTE: in the definition, GP = (GP1 union GP2). Shouldn?t this be
> (GP1 and GP2)? Or, maybe I?m not understanding this correctly.*
Its is union - the query formed by GP1 merged with GP2. "And" would be satisfy
GP1, satisfy GP2 independently. We want a big pattern, see if it works, if it
does, use it, else use the small one GP1.
>
> The outer optional block must match for a nested one-s- to apply.
> That is, the outer graph pattern pattern-s- is fixed for the
> purposes of any nested optional block.
Done.
>
>
> 5.0 Nested Patterns
>
> *NOTE: the second nesting example might be more interesting
> if the nested clause had optional data, e.g., some might have
> optional middle names. For example:
>
> SELECT ?foafName ?mbox ?fname ?gname
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
> WHERE ( ?x foaf:name ?foafname )
> [ (?x foaf:mbox ?mbox) ]
> [ (?x vcard:N ?vc) (?vc vcard:Family ?fname) (?vc vcard:Given ?gname)
> [ (?vc vcard:Given ?gname) ]
> ]
Left as is.
Andy
Received on Monday, 11 October 2004 11:06:12 UTC