Re: comments on 17 December 2013 WD of RDF 1.1 Primer from Guus Schreiber on 2014-01-28 (public-rdf-comments@w3.org from January 2014)

From: Guus Schreiber <guus.schreiber@vu.nl>
Date: Tue, 28 Jan 2014 23:12:11 +0100
To: Bob DuCharme <bob@snee.com>, <public-rdf-comments@w3.org>
Message-ID: <52E82B3B.6050304@vu.nl>
Bob,

Thanks again for your comments. Responses inline.

On 31-12-13 00:25, Bob DuCharme wrote:
> There's a lot of good stuff in it, but because it's a Primer, I assume
> that its intended audience is people who are new to RDF, and the
> document often assumes too much about the reader's knowledge of
> technical specification vocabulary.
>
> I've divided up my comments into two lists: comments about substance
> followed by picky copyediting suggestions. Suggestions often show a
> quoted phrase from the Primer followed by a suggested revision. For
> example,
>
>    "cypress-tree" cypress tree
>
> is a suggestion to replace "cypress-tree" with "cypress tree".
>
>
> === substantive (to varying degrees)  ===
>
> Section 1. says "The Resource Description Framework (RDF) is a framework
> for describing info about resources in the World Wide Web." 1.1 says
> that "An IRI identifies a web resource" and then references
> http://www.ietf.org/rfc/rfc3987.txt, but I couldn't find anything in
> that RFC about IRIs being limited to the identification of web
> resources. I know that URLs define web resources, but if I assign an IRI
> to the chair I'm sitting in, couldn't I use RDF to state facts about the
> chair's location, manufacturer, etc., without this having anything to do
> with the web? Or am I misunderstanding something? I always thought that
> we could assign IRIs to absolutely anything and then use RDF to describe
> them; limiting its use to web-based resources really limits its power.

Reformulated as:

[[
     The Resource Description Framework (RDF) is a framework for
     expressing information about <strong>resources</strong>. Resources
     can be anything, including documents, people, physical objects, and 
abstract
     concepts.
]]

> 3.1 "Resources typically occur in multiple triples, for example Bob and
> the Mona Lisa painting in the examples above." The Mona Lisa resource
> only occurs in one triple above this sentence, not two, unless you want
> the reader to assume case insensitivity in the sample data, which I
> think is a bad idea. I would capitalize <the Mona Lisa> consistently and
> then explicitly point out how the same resource can appear in the
> subject of one triple and the object of another, which is a new idea at
> this point of the Primer. (A wonderful new idea!) After normalizing the
> capitalization, the sentence might be better off like this: "The same
> resource is often referenced in multiple triples. In the example above,
> Bob is the subject of four triples, and the Mona Lisa is the subject of
> one and the object of another. This ability to have the same resource be
> in the subject position of one triple and the object position of another
> makes it possible to find connections between triples, which is an
> important part of RDF's power. We can therefore visualise triples as..."

Changed as suggested. Note that the Mona Lisa occurs in two object 
positions.

> "The example above... an RDF graph" Move that paragraph before the
> "Resources typically" paragraph, i.e. right after the example itself,
> maybe in one of the green "NOTE" blocks.

Changed as suggested.

> In the note that begins
>
>    The RDF Data Model is described in this section in the form of an
> "abstract syntax"
>
> do "encoding" and "concrete RDF syntax" refer to the same thing? If so,
> make that clearer. I think it would be better off to never use the word
> "encoding," which people are more likely to associate with things like
> UTF-8 vs. Latin 1, and instead use the term "concrete syntax"
> consistently. The first time the Primer uses the phrase "concrete
> syntax," a parenthesized phrase after it could say something like "(the
> syntax used to represent triples stored in text files)", because as a
> Primer this should provide more hints about the meaning of highly
> technical phrases. These same issues come up in the paragraph of Section
> 5 beginning "Many different concrete syntaxes..."

Changed as suggested.

> "three types of RDF data that occur in triples" three types of RDF
> resources that occur in triples
>
> "The notion of IRI is a generalization of URI (Uniform Resource
> Identifier)" To assume that someone who doesn't understand RDF (the
> intended audience of the Primer) understands what URIs are and their
> relationship to URLs is a huge, huge  assumption. How about adding,
> after the sentence with this, something like "The URLs (Uniform Resource
> Locators) that people use as web addresses are one form of URI, with an
> important difference: URIs are not necessarily locators that provide the
> address of a resource; they are often merely identifiers that provide a
> unique ID for a given resource. IRIs are a generalization of this
> because..."

Changed as suggested, with slightly different wording:

[[
     The URLs (Uniform Resource Locators) that
     people use as Web addresses are one form of IRI. Other forms of IRI
     provide an identifier for a resource without implying its location
     or how to access it.
]]

>
> 3.2 "RDF is agnostic about what the IRI stands for" Unlike section 5.1
> ("in this example foaf:Person stands for
> <http://xmlns.com/foaf/0.1/Person>") I think that "stands for" is not
> appropriate here. (After all, IRI stands for "International Resource
> Identifier.")  "Represents" or "identifies" would be better.

Changed to "represents".

> 3.4 I don't think algebra variables are a very good analogy here. Those
> are named things that may not have values, and blank nodes are unnamed
> things that do have values.

Hmm, I think the notion of variable actually comes close to the 
intuition about blank nodes. I prefer to leave it in, unless we have a 
better analogy.

> Section 3.4 overall is a little too brief and abstract for an RDF
> neophyte. Blank nodes are a difficult concept for people who are new to
> RDF. Either don't cover them in the Primer or cover them a bit more. For
> example, this section would greatly benefit from a new diagram similar
> to the one in Figure 1 that includes the cypress tree.

I've added this as a potential todo to the document.

> Also: 'Resources such as the unidentified cypress tree are called "blank
> nodes" in RDF.' The resource (the tree, in this case) is not called a
> blank node. How about this: 'Resources without identifiers such as the
> painting's cypress tree can be represented by "blank nodes" in RDF.'

Changed as suggested.

> 3.5 "does not specify a particular semantics" That's normative
> spec-speak, not primer-speak, and should be reworded to be clearer to
> beginners. A bit later, the "i.e." parenthetical remark after "RDF
> provides no way to convey this semantic assumption" provides a good
> model of connecting this high-level talk of semantics to the actual data
> being discussed.

Sentence dropped. Indeed, the ramarks later make this point in a 
clearerr way.

> Section 1 said that "For example retrieving http://www.example.org/bob
> could provide data about Bob," leading me to believe that this URI
> represented the resource Bob. In the section on named graphs, the same
> URI represents a named graph, not a person. I understand that this
> doesn't invalidate the "For example" sentence--if it's the name of a
> graph, retrieving it could still "provide data about Bob"--but I think
> this can still confuse the RDF beginner, and recommend that the examples
> in the section on named graphs use new IRIs that have not appeared in
> the Primer before.

Another reviewer suggested to change the IRI in Sec. 1. to 
http://www.example.org/bob/Bob#me.  I assume this also addresses your 
remark. Changed accordingly.

> "In the example default (unnamed) graph below we see two triples that
> have a graph name as subject:" Insert a sentence before this about why
> someone would want to do this, e.g. "When you can reference a graph with
> a IRI, you can create triples that provide metadata about that graph."

The current text is a compromise, as the RDF group didn't define 
semantics for triples in which graph names occur. Therefore, the 
explanation about what the example triples stand for is at the end of 
the paragraph following the example. I hope you think this is clear enough.

> "subsets of triples" doesn't make sense. "subsets of a dataset [ or
> collection] of triples"?

Changed to "subsets of a collection of triples"

> 4. "For example, one can state that the IRI ex:friendOf can be used as a
> property" the idea of this being an IRI will come as a complete surprise
> to the reader, because the use of prefixes hasn't been discussed at all
> yet. (Is a qname considered an IRI?) The original RDF Primer at
> http://www.w3.org/TR/rdf-primer/ has a good paragraph beginning "The
> full triples notation requires" that introduces this well. However it's
> done, as a Primer this should explain any new syntax, such as the use of
> namespace prefixes, before using that syntax.

Oops, that was unintended. We prefer not to introduce qnames here. 
Changed to: "http://www.example.org/friendOf".

> "domain respectively range restrictions" domain and range restrictions,
> respectively (The sentence with this is another example of assuming a
> pre-existing, strong understanding of the relevant technical vocabulary
> by the reader; the Primer really should have a few more sentences to
> explain the use of rdfs:domain and rdfs:range, which is always a
> difficult point with RDF beginners.)

Included a sentence linking to the earlier friendOf/Person example.

> After Example 2 add something like this, because the idea of (and value
> of!) properties as subjects or objects in triples has not been covered
> at all up to this point and often comes as a surprise to people with an
> object-oriented background: "Note that, while <is a friend of> is a
> property typically used as the predicate of a triple (as it was in
> Example 1), properties like this are themselves resources that can be
> described by triples or provide values in the descriptions of other
> resources. In this example, <is a friend of> is the subject of triples
> that assign type, domain, and range values to it, and it's the object of
> a triple that describes something about the <is a good friend of>
> property."

Added as suggested.

> "RDFa (for HTML embedding)" I always think it's a shame that people
> think that RDFa is only for use with HTML. It can be very useful with
> other kinds of XML as well; see
> http://www.devx.com/semantic/Article/42543 . I would love to see the
> several references to this say "for HTML and XML embedding."

Included.

> Section 5.1 is more like a quick reference of Turtle syntax than a
> Primer, because it covers so much so quickly. Readers who are new to RDF
> (the intended audience of this document) will find it confusing. A brief
> introduction to N-Triples before the Turtle part would make the Turtle
> part much easier to understand, because then the reader will understand
> that the use of angle brackets around full IRIs, quotes around literals,
> and a period after each triple are the most important parts of the
> syntax and that everything else in Turtle is just a syntactical
> convenience.

Added a B-Triples-conformant example to the beginning of the section, 
plus text to explain this basic form. Also placed a note right after the 
graph figure to point readers to the N-Triples examples.

> "the predicate-object part of triples with <http://example.org/bob#me>
> as subject"  the predicate-object part of triples that have
> <http://example.org/bob#me> as their subject

Changed.

> "The semicolons at the end of lines 9-11 indicate that the set is not
> yet complete. A period is used to signal the end of a Turtle statement."
> The use of "set" here is confusing. Set of what? I know that it refers
> to predicate-object pairs associated with a common subject, but someone
> new to Turtle might think that it's some specific Turtle construct. I
> think it would be better to say "The semicolons at the end of lines 9-11
> each indicate the the predicate-object pair that follows them is part of
> a new triple that uses the most recent subject shown in the data--in
> this case, <bob#me>."

Changed as suggested.

> 'The term _:x is a blank node. It represents some unnamed tree depicted
> in the Mona Lisa painting and belonging to the "Cypress" class.'  The
> term _:x is a blank node. It represents an unnamed resource depicted in
> the Mona Lisa painting that is an instance of the "Cypress" class. [It's
> safer to say that it represents a resource, not a tree, and the idea of
> "belonging" here is not quite accurate.]

Changed as suggested.


> === copyediting ===
>
> There are several places where "for example" should have a comma after
> it: "For example retrieving", "For example a dataset about paintings",
> "For example 'Léonard de Vinci'",
>
> 3.1 " <subject>  <predicate> <object>" has an extra space after <subject>
>
> - "multiple triples, for example" [em dash not comma]
>
> "allow writing literals" allow writing of literals
>
> "markup webpages": "mark up" should be two words when used as a verb.
> I'd say "web pages" as two words as well.
>
> "Library of Congress published its..." The Library of Congress published
> its
>
> The phrase "Using the Web Ontology Language" would make sense, but
> "Using the OWL" in section 4 does not. Just say "Using OWL."
>
> "a RDF vocabulary" an RDF vocabulary
>
> "the reader is referred to the Turtle document" see the Turtle document
>
> " the reader can find for each RDF syntax corresponding"  the reader can
> find, for each RDF syntax, corresponding
>
> "cypress-tree" cypress tree
>
> "cater for" cater to [although that could just be a British vs. American
> usage thing]

Apparently it is "cater for" if you provide something like food and 
"cater to" if you satisfy some desire. I guess an application is not a 
food, so changed to "cater to" :).

>
> "semantics which is specified in the RDF" semantics which are specified
> in the RDF

"Semantics" is nowadays (and in this sentence) often used as a singular. 
Possibly a form of language pollution, but making it plural wouldn't 
work here.

> "Wikidata, a free, collaborative..." end that bullet point with a period
> like the other bullets in that list.

Thanks, these comments were *extremely* helpful. You can check the 
changes in the new editor's draft [1]. Not that this draft is likely to 
change over the next couple of days, as we are also including comments 
from other people.

Regards,
Guus Schreiber

[1]

>
>
> Thanks,
>
> Bob DuCharme
>
Received on Tuesday, 28 January 2014 22:12:36 UTC