- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 11 Oct 2004 12:05:24 +0100
- To: Kevin Wilkinson <wilkinson@hpl.hp.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Kevin, Thank you for the comments. The audit trail is below and the changes, where made, go into v1.108, v1.109, v1.110 Kevin Wilkinson wrote: > here are my comments on draft 1.104 of the SPARQL Query Lang. > document. so far, i've only reviewed sections 1-5. i'll send > comments, if any, on the remaining sections later this week. > > in general, great job by eric and andy. many of my comments > are word-smithing. ignore or incorporate as you see fit. > > i used the following notation in my comments. > '-' delimits things to remove. > '+' delimits things to add. > *NOTE: blah, blah, blah * delimits my comments. > undelimited text is used to provide context. > > kevin > > > ------------------------------------------------------------------------ > > > Comments on SPARQL draft 1.104 (2004/10/08) - Kevin Wilkinson > > 1 Introduction > > An RDF graph is +encoded as+ a set of triples, > -each consisting of - > +each comprising+ a subject, an object-,- and a > property relationship between them [12]. Nack. The RDF graph *is* a set of triples. > +A triple is also referred to as a statement. I have been avoiding "statement" and using triples through out. Query works on the set of triples, statement is the logic concept represented by a triple. Statement is the term used by the first RDF spec; the recent WG prefers triple in this context, I believe. > The > RDF terms in a triple are either URIs, blank nodes > (bNodes), plain literals and typed literals (defined in RDF > Concepts and Abstract syntax).+ I have moved the link to "RDF Concepts" to this point. > ... it may be a graph that is partly calculated on > demand +(e.g., by giving the inference closure)+, > or it may be an RDF representation of a legacy database. > > SPARQL is a query language for accessing such RDF graphs. > It provides facilities to: > * -select- +extract+ information +, i.e. extract > subjects, properties and/or objects, from queried graphs+ "extract is better : now says """ * extract information in the form of URIs, bNodes, plain and typed literals. """ > * extract RDF subgraphs +of queried graphs+ I think this is implied by "sub" > * construct new RDF graphs based on information -from > the target of the query- +in the queried graphs+. Done. > > As a data access language, it is suitable for -both local > and remote use- +querying graphs that are either local to > or remote from the client (host machine).+ Hmm - need also to consider the case of same machine, different process. I'll leave it as the vaguer "local and remote". > > > 2 Making Simple Queries > > -Queries match graph patterns against the target graph of > the query. Patterns are like graphs but may named variables > in place of some of the nodes or predicates; the simplest graph > patterns are single triple patterns. The RDF terms are URIs, > blank nodes (bNodes), plain literals and typed literals (defined > in RDF Concepts and Abstract syntax). Graph patterns can be > combined using various operators into more complicated graph > patterns.- > +Queries match graph patterns against the target graph(s) of > the query. The simplest graph pattern is a single triple > pattern. This is a triple comprising RDF terms or named > variables and it matches all triples in a graph whose > corresponding subject, object or property are equal to the > correspond RDF term in the pattern. The named variables in > the pattern, if any, are then bound to their corresponding > subject, object or property in the matched triples. More > complicated graph patterns can be constructed from single > triple patterns and various operators.+ I leave this to Eric but I don't think we have to make the change in order to publish. This part needs to be rewored based on the rest of section 2 and especially 2.1. As such, one set of approximaye words will do. > > A binding is a mapping from the variables in a query to terms. > A result mapping is a binding which, when applied to the > variables in the query, > -produces a subgraph of the target graph- > +produces a set of terms from the queried graph+; a result > is a set of result mappings. If there are no result mappings, > the result set is empty. Some rewording. Needs to brought into line with the rest of the document. > > Pictorially, suppose we have a graph with two triples and > +apply+ the given triple pattern: > > -with- +we get the+ result: Will leave pending revision of this section. > > *NOTE: I suggest using the graph0 and query0 rather than > triple1-2 and triplePattern1. Multiple triples form a graph > and a triple pattern IS a query applied to a graph. So, the > picture is a bit confusing.* > > -RDF graphs are constructed from one or more triples, ex. graph1.- > +A more complicated query may combine bindings from multiple > triple patterns. Consider query1 applied to graph1.+ > > *NOTE: the figure for query1 has a typo: change ?addrm to ?addr.* Leave for Eric - I can't edit the pictures. > > 2.1 Writing a Simple Query > > +SPARQL uses an SQL-like syntax for expressing queries.+ I was hoping not tohave to justify a claim of "SQL-like" as it means different things to different people. > The example below ... and WHERE clause -gives- +contains just+ > one triple pattern. Done by s/gives/has/ > > The terms -quoted- +delimited+ by "<>" are URI References. Done > > -Variables are indicated by '??'; the '?' does not form part > of the variables' name.- > +Variable names are prefixed by '??'; the '?' is not part > of the variable?'s name.+ There has been some processing error here - entities keep geeting corrupted (any idea why Eric? tidy?). > > Because URIRefs can -become- +be+ long, Done > > Prefixes are syntactic: the +prefix+ name -chosen- does not > -effect- +affect+ the query, Done > -nor does it have to be the same as the data- > +nor do prefix names in queries need to be the same prefixes > used for data+. Done > > *NOTE: just wondering if, in the context here of typed literals, > the document should mention that plain literals will match typed > literals with the type xsd:string. Also, would a plain literal > match a literal with a lang tag? Or would an int-typed literal > match a float? etc. At some point, the doc should point out some > of the nuances with typed and lang-tag literal matching.* Good points but we can be delay until section 12? I don't want to get the reader sidetracked by plain literals match xsd:strings or issues about xsd:integer comparing to xsd:float/xsd:double just at this point Dalyed until after publication. > > 2.2 Triple Patterns > > The building blocks of queries are triple patterns. Syntactically, > a SPARQL triple pattern is a subject, predicate and object > -enclosed in '()'s- +delimited by parentheses+. Done > The previous example +query+ shows a triple pattern with a > -variable subject (book), a predicate of dcore:title and a > variable object (title).- > +a predicate of dcore:title and variables for subject and object.+ Have attempted rewording here. > > -A triple pattern is matched against the graph by finding values > for values for variables so that the triple pattern, with values > substituted for variables, is a triple in the graph being queried.- > +A triple pattern applied to a graph matches all triples with > identical RDF terms for the corresponding subject, predicate > and object.The variables in the triple pattern, if any, are then > bound to the corresponding RDF terms in the matching triples.+ Done. > > *NOTE: "RDF URI Reference" is frequently used. Why not just say > URI? Is an RDF URI somehow different from a URI? Is a URI Reference > different from a URI?* URI (currently - RDF2396) does not include the #frag part. URIRef includes teh #frag part. I understand that this is to change and "URI" will cover URIRefs as well in a revised 2396. > > A query variable is a name -, used to define queries as graph patterns-. > *NOTE: I have no idea what that last phrase means. Delete or > rephrase it.* Its trying to informall scope the variables. > > *NOTE: this section introduces the term ?query variable?. Is this > different from ?variable?? I think not. So, why not just stick with > ?variable?? Another inconsistency in this section is that ?Triple > Pattern? is capitalized whereas previously it was lower-case. > It?s unclear why. Is it a mistake?* Yes - a mistake. Tried to fix up. > > -We show- +In this document, we illustrate+ bindings in results > in tabular form -, for example:- +with one header row containing > all variable names and a value row for each mapping of the > result variables. For example:+ Done. > > +Note that literal values are quoted, except for integers. URI?s > are delimited by angle brackets except occasionally QNames will > be used.+ > *NOTE: I added the above because the examples are NOT consistent > with respect to formatting of the result bindings. You may want > to change the examples to be consistent (e.g., all literals are > quoted, all URIs delimited). If not, you should definitely > add the above sentence.* > > -Not every binding needs to exist in every row of the table.- > *NOTE: I am not sure what is meant by the above. Please rephrase > it. Do you mean that, due to optionals, that some variables will > not be bound in a result row?* They should be consisteny. If you find any that aren't, please let me know. > > *NOTE: in the Definition of Triple Pattern Matching, I?m having > trouble making the leap from B, a binding of one variable, to SB, > a set of bindings for multiple variables. I?m really confused how > the individual bindings, B, are combined, e.g. cross-product, > concatenated, what? I know it?s neither but that?s how I read it.* A binding is a single pair (var, RDF Term) A set of bindings is a set of pairs. { (var1, term1) , (var2, term2) , ... } > > If the same variable name is used more than once in a pattern > then, within each *solution* to the query, the variable has the > same value. > *NOTE: ?solution? is undefined in the above sentence. > Did you mean to say ?substitution?? If not, you need to define > ?solution?.* It's a forward reference. Will leave for now. I'd like to use the right terminology if it reads OK. > > 2.3 Graph Patterns > > The keyword WHERE is followed by a Graph Pattern which is > -made of one or more Triple Patterns. These Triple Patterns are > "and"ed together. More formally, the Graph Pattern is the conjunction > of the Triple Patterns.- > +a Triple Pattern or a conjunction of Triple Patterns.+ > In each query *solution*, each triple pattern must be satisfied > with the same binding of variables to values. > *NOTE: again, ?solution? is undefined. I?m not sure I understand > the above sentence.* Minor working change s/each triple pattern/all the triple patterns/ > > There is a bNode [12] in this dataset. Just within the file, for > encoding purposes, the bNode is identified by _:a but the > information about the bNode label is not in the RDF graph. No query > will be able to identify that bNode by name. > *NOTE: I'?m not sure I understand the last sentence. It implies that > a bNode CANNOT be a value in a triple pattern, since that would be > identifying the bNode by name. I don?t think that is the intention > but that is how it reads to me.* It can be a value - it can't be written in a query. Is """ No query will be able to identify that bNode by the label used in the serialization. """ > > *NOTE: in the Definition of Graph Pattern (Partial Definition), it > states that a set of triple patterns is a graph pattern. However, > the sentence above this definition states that a graph pattern is > *two* or more triple patterns. This is not consistent. A set can > have one member.* Fixed. """ A conjunctive graph pattern is a set of triple patterns """ > > 2.4 Multiple Matches > > The results of a query are all the ways a query can match the graph > being queried. > *NOTE: I'?m confused here. Does ?'result' refer only > to the result variables or to the complete set of bindings for the > graph pattern? If the former, then, since the result variable list > may not include ALL variables in the query, it seems like it could > exclude some ways in which the query matches the graph (especially > if duplicates are eliminated). So, please be specific if you?re > referring to the result variables or all variables.* We haven't defined SELECT variables yet - its complete set of bindings. It's all variables. > > *NOTE: aha, here?s the definition I was looking for. Unfortunately, > I don?t understand it. But, I?ll keep trying. One thing I?m concerned > about is what happens if the query variables are not ?connected?? The only meaning I understand for "connected" is in graph terms. Graph patterns do not define connected pattern graphs. > Does the definition still make sense? For example, consider the query > "Select ?name, ?mbox Where (?x foaf:name ?name) (?y foaf:box ?mbox)". > There are no linking variables in this query. We need to ensure > that these queries are well-defined.* A legal query. It may not be intended :-) > > > 4 Including Optional Values > > *NOTE: rename section to simply ?Optional Values?.* Nack - out of style. > > For every solution of the query, every variable has > -an RDF Term- +a value+. Pat recommedns avoiding "value" due to confusion with the value space of typed literals. > -But RDF data is semi-structured data;- > Sometimes useful, additional information about some item of interest > in the graph can be found but, for another item, the information is > not present. Done > -The application writer would such additional information but does > not want the query to not match just because the some information > is missing.- > +If the application writer wants that additional information, the > query should not fail just because the some information is missing.+ Done > > In the example, only a single triple +pattern+ is given in the Done > optional match part of the query but in general it is a graph > pattern. > > Optional blocks can also be nested +as described in Section .xxx+. I will add pointers around the document when section structure is stable. > > *NOTE: I assume that the optional block need NOT be connected to the > rest of the query. For example, > "Select ?name, ?time Where (?x foaf:name ?name) > [ ex:timezone/#ECT ex:datetime ?time ] " * > > -If a variable was introduced in one optional block and mentioned > in another, it would be used to constrain the second. Reversing the > order of the optional blocks would reverse the blocks in which the > variable was was introduced and was used to constraint.- > +If a variable was bound in one optional block and referenced in > another, it would constrain the second. Reversing the order of > the optional blocks would reverse the blocks in which the > variable was bound and constrained.+ Nack - but needs rethinking anyway as it is too procedural. > > > 4.3 Optional Matching - Formal Definition > > *NOTE: in the definition, GP = (GP1 union GP2). Shouldn?t this be > (GP1 and GP2)? Or, maybe I?m not understanding this correctly.* Its is union - the query formed by GP1 merged with GP2. "And" would be satisfy GP1, satisfy GP2 independently. We want a big pattern, see if it works, if it does, use it, else use the small one GP1. > > The outer optional block must match for a nested one-s- to apply. > That is, the outer graph pattern pattern-s- is fixed for the > purposes of any nested optional block. Done. > > > 5.0 Nested Patterns > > *NOTE: the second nesting example might be more interesting > if the nested clause had optional data, e.g., some might have > optional middle names. For example: > > SELECT ?foafName ?mbox ?fname ?gname > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> > WHERE ( ?x foaf:name ?foafname ) > [ (?x foaf:mbox ?mbox) ] > [ (?x vcard:N ?vc) (?vc vcard:Family ?fname) (?vc vcard:Given ?gname) > [ (?vc vcard:Given ?gname) ] > ] Left as is. Andy
Received on Monday, 11 October 2004 11:06:12 UTC