Re: RDF 1.1 Primer from Guus Schreiber on 2013-11-27 (public-rdf-wg@w3.org from November 2013)

From: Guus Schreiber <guus.schreiber@vu.nl>
Date: Wed, 27 Nov 2013 16:23:14 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: Yves Raimond <Yves.Raimond@bbc.co.uk>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <52960E62.4090304@vu.nl>
Pat,

Here is the first set of responses to your comments. The responses 
concern your comments on Secs. 1-5. Secs. 6+ to follow.

Guus

 > First pass of major howlers, I will get back with more details and 
suggestions for
 > replacements later.
 >
 > Pat
 > ----------
 >
 > First para. The examples are very atypical and misleading. RDF does 
not do
 > times well, and it is not mostly used for annotating Web pages or 
videos, and
 > 'resources' does not mean just Webbish things.  Might be better to 
use some
 > DBpedia examples right off the bat, and talk explicitly about *data* 
rather than
 > annotation.

Hmm. We were intending to include some “data” examples further-on in the 
document (in the RDF Data section). But I’m surprised you consider 
annotation to be atypical. Of course, Yves and I are a bit biased (due 
to our RDF work on music, TV, musea, archives, libraries). But there is 
lots of RDF annotations out there. And isn’t most of DBpedia in fact 
annotation? We take the examples of a well-known person and painting to 
get the intuition about RDF across.  I would like to discuss this a bit 
more before changing.

 > 3.1. It is misleading to describe the <subject> as being what the 
statement is
 > "about". A triple is as much "about" its object as it is about its 
subject. (BTW, this
 > bad idea was one of the drivers behind the design of RDF/XML, which 
gives you
 > some idea of what is bad about it :-)

OK, point taken. BTW Turtle does the same thing in the shorthand 
notation. However, I’m not sure this is a subtlety the Primer should 
care about. If we say that a sentence has a “subject” we as humans mean 
that the sentence is “about” the subject”, don’t we? Of course, it says 
also things about the object (and about the predicate). It may confuse 
people if we don’t use “subject” in the usual way.

 > Also, this may be rather pedantic, but the S P O terminology refers 
to the parts of
 > the triple, ie it is RDF 'grammar', rather than what these IRIs refer 
to. So the
 > predicate (is an IRI which) refers to the property (which is a real 
thing in the
 > world that relates other things to one another.) I don't mean to 
suggest putting
 > this into the primer, but it might be good to keep it in mind and use the
 > terminology consistently throughout. The usage of 'sentences' versus 
'facts'
 > might be useful here.(?)

Fully agree. Actually, I think we’ve tried to make this distinction 
throughout the rest of the document, but you’re right it is not here. We 
maybe should say:

   RDF Fact: resource property resource
   RDF Sentence: subject predicate object

but it actual traditional for introductory  RDF  texts to say only the 
latter. Hmm, I suggest we discuss this a bit more before changing.

 > The terminology of "feature" for the property is not standard and not 
particularly
 > helpful to the reader.

Right. I suggest to simply say “property”.

 > Why do you say that the subject IS what the triple is about, whereas 
the object
 > REPRESENTS the value of the feature? This looks like a use/mention 
confusion.
 > I would suggest avoiding the word "value" altogether, as it seems to 
generate
 > confusion wherever it appears.

Now rephrased as:
[[
The subject represents the resource we like to make a statement about. 
The predicate represents a property of the subject. The object 
represents the value of the property for this subject. Because RDF 
statements consist of three elements they are called triples.
]]

I take your point about “value”, but I am at a loss for another term. I 
don’t think that “value” creates much confusion here. The term “property 
value” is very common.

 > The example <This video document> is misleading: RDF has nothing like the
 > "this document" construction. In fact, all the examples in example 1 are
 >  misleading as they suggest that RDF uses English prose fragments 
rather than
 >  IRIs.

Well, the purpose here was to use English prose. I wouldn’t like to get 
rid of that. But the wording of the video example is misleading. Suggest 
to rephrase as Video xyz, or maybe better, BBC program xyz.

 > The last sentence about the "three basic constructs" is therefore 
puzzling
 >  as it comes without any explanation or introduction.

Indeed. Suggest to delete this paragraph. In the sections itself it 
becomes clear enough. As n aside: would it be useful to include a 
summary like “subject => IRI or blank node;  predicate => IRI;  object 
=> all three” somewhere?

 > It is misleading to use the phrase "anonymous resources" when talking 
about
 > blank nodes. This phrasing suggests that IRIs and bnodes denote different
 > *kinds* of resource, which is misleading. (It is like saying that the 
pronoun
 > "someone" refers to a nameless person, a distinct category from 
people with
 > names.)

Right, although I’m not sure many people will be misled by the term 
“anonymous”.
Suggested rephrasing:

[[
In addition, it is sometimes handy to be able to talk about resources 
which have no identifier. For example, we might want to state that the 
Mona Lisa painting has in its background an unidentified tree which we 
know to be a cypress tree. Resources such as the unidentified cypress 
tree are called "blank nodes" in RDF.
]]

 > Your Mona Lisa example is strange as there is an obvious name to use 
there,
 > and (not surprisingly) a dbpedia IRI:
 > http://dbpedia.org/resource/Leonardo_da_Vinci. A more plausible 
example of
 > bnode use might be saying that the Mona Lisa has in its background an 
X  and X
 > is a Cypress tree. That is the kind of information that makes it 
genuinely
 > implausible to assume that there is an IRI for the value, and it is 
the kind of thing
 > that one might well want to record in for example a museum guide:
 >
 > http://dbpedia.org/resource/Mona_Lisa  http://purl.org/net/lio#shows 
  _:x .
 > _:x http://www.w3.org/1999/U2/22-rdf-syntax-ns#type
 > http://dbpedia.org/resource/Cypress .

See rephrasing above. We should consider including the Cypress tree 
example in the overall example in the Syntax section.

 > "It should be noted that many RDF users in practice don't use blank 
nodes." No,
 > it should not be noted. A recent scan found that over half of 
published RDF
 > graphs use blank nodes, most (all?) OWL/RDF contains blank nodes, 
etc.. RDF
 > could not function without blank nodes, and it is time to forget the 
brain-dead
 > doctrine which says that their use should be avoided. And it is just 
silly to say in
 > a primer that blank nodes make RDF "look complicated". What could be 
simpler
 > than a blank node in a graph?

Well, blank nodes definitely make the normative specifications much 
harder to read. That is what I wanted, admittedly very poorly, to bring 
across.  Suggest to delete for now.

 > "We can then make statements about these two graphs, for example adding
 >  license and provenance information:
 >
 >        <http://example.com/bob> <is published by> <http://example.org>.
 >        <http://example.com/bob> <has license>
 > <http://creativecommons.org/licenses/by/3.0/>."
 >
 > <hair tearing> AAAARRRRGGGH</hair tearing> NO WE CAN'T. Or at least, this
 > use is NOT SUPPORTED BY RDF with the specs in their current state. That
 > 'metadata' use works ONLY when we know that the "identifying" graph IRIs
 > denote their graphs, and WE HAVE EXPLICITLY SAID THAT RDF DOES NOT
 > ASSUME THIS. A conforming RDF engine would be perfectly conforming if it
 > refused to treat those subject IRIs as denoting the graph in these 
triples.  There
 > is NOTHING in the RDF specs that say that a general IRI must be taken to
 > denote what it conventionally identifies.  We do this only for 
datatype IRIs, and
 > even getting that much into the specs was an uphill struggle; and in 
the case of
 > graph labels in a dataset, we explicitly warn people to not expect 
this to be true
 > (because it often isn't.) I know the Primer has to be simple, but 
please let us not
 >  put actual lies into it.

I don’t think the Primer says any of this (or certainly doesn’t want 
to). We can write down these statements, no problem. Would a rephrase 
like this help:

[[
We can then write down triples that include the graph names, for example:
   <example>
These two triples could be interpreted as license and provenance 
information of the graph <xyz. (And then a note about lack of RDF 
semantics for this).
]]

 > "The original data model assumed that all triples are part of the 
same (large)
 > graph." The RDF data model still assumes this. It is very misleading 
to suggest
 > that datasets are a modification to the basic RDF data model, or even 
that this
 > data model has changed. We considered such changes and rejected them.

I deleted this sentence.

 > "The RDF data model provides a way to make statements about (Web)
 > resources."
 >
 > RDF makes statements about resources, i.e. about anything at all. The 
implied
 > qualification in "(Web)" is false and misleading.

Right. Deleted.

 > "As we mentioned, this data model does not make any assumptions about 
what
 > these resources stand for."
 >
 > Resources don't (usually) stand for anything. Did you mean, what IRIs 
stand for?
 > (Another use/mention confusion.)

Oops, right. Corrected.

 > ".. a vocabulary description language called RDF-Schema " ?? In what 
sense is
 > RDFS a 'vocabulary description' language? If you must say this, a 
least explain
 > what it is supposed to mean.

It is actually the title of the document (“RDF Vocabulary Description 
Language 1.0 RDF Schema”). Bit I agree with your point. Rephrased now as 
“To support the definition of vocabularies RDF provides the RDF-Schema 
language.”.

It raises another point: should we rename the RDF Schema document?

 > The introduction of classes is very awkward, and not really correct. 
I would
 > suggest saying that they are categories which can be used to classify 
things: Bill
 > rdf:type Human, Mona Lisa rdf:type Artwork, etc.. and avoid "group" 
(and "set")
 > altogether. Then you don't need to immediately say that what you just 
said is
 > false, which is not very reassuring for the reader. And you should 
mention
 > rdf:type in the same breath.

Right. I was struggling with this. New phrasing in document.

 > Need to clean up the terminology. It is very confusing to be told in 
quick
 > succession:
 >
 > ==RDF Schema is a vocabulary description language
 > ==FOAF is a *vocabulary* which is a *schema* which was one of the 
first *RDF
 > Schemas* (Is RDF Schema one of the RDF Schemas?)
 >
 > ==DC is a vocabulary which is a *metadata element set* (Why isn't it 
an RDF
 > Schema, like FOAF?)
 > ==*schema*.org is a *vocabulary* (Given what it is called, why isn't 
it a schema?)
 > ==SKOS is a *vocabulary* for publishing *schemes* (not schemas?) such as
 > terminologies and thesauri. (Isn't a terminology a kind of 
vocabulary? So are
 > schemes and vocabularies the same thing? Or was it schemas that were like
 > vocabularies, and schemes are something different? And does SKOS describe
 > them or is it an example of one of them? Or maybe both at the same 
time??)
 >
 > Personally, I would never want to see the word "schema" ever again.

In principle agreed. But the term “schema” is used in the outside world 
in ambiguous ways; nothing we can do about that. But I deleted the term 
“schema” from our own text, that i indeed better. Two specific things:
- Saying that schema.org is a vocabulary is, I think, correct.
- SKOS is a meta-vocabulary for specifying classification schemes/…

 > Why don't we just say that anyone can publish an RDF vocabulary - a 
set of IRIs,
 > typically from a single namespace - and specify what it is supposed 
to mean,
 > and then everyone can then use that vocabulary to write RDF data. It 
is good
 > practice to have the root namespace IRI link to something that 
defines the
 > meaning of the vocabulary, and to re-use IRIs from existing 
vocabularies where
 > you can, to make it easier to share meanings. And it is gold standard 
to publish
 > an RDF graph which specifies at least part of your intended meanings 
for your
 > vocabulary in a machine-readable way, if you can, using vocabularies 
intended
 > for the purpose, for example RDFS or OWL or SKOS, because then they 
can be
 > used by others in entailment rules. Example are given below....

OK, will use this for rephrasing parts of this section.

 > and then after you have talked about entailments in the semantics 
section, you
 > might for example show how dbpedia uses rdfs:subClassOf to create 
category
 > hierarchies, or how FOAF uses owl:inverseFunctionalProperty to imitate
 > database keys.

Good suggestion. Will include this.

 > "RDF Schema provides basic facilities for modeling semantics of RDF 
data. For
 > a specification of these semantics the reader is referred to the RDF 
Semantics
 > document [RDF11-MT]. For more comprehensive semantic modeling of RDF
 > data the W3C recommends using the Web Ontology Language OWL
 >  [OWL2-OVERVIEW]."
 >
 > ?? I don't even know what this is supposed to mean. RDFS and OWL "model
 > semantics of RDF data" ?? That is either meaningless or false, I'm 
not sure
 > which. Maybe both. Also, this reads as though the W3C recommends 
using OWL
 > over RDFS, which if true is news to me (and not likely to lead to a 
rapid take-up
 > of RDF, if users have to read the OWL specs first.)

I wanted to say something nice about OWL! :). Seriously, suggest to 
rephrase as:

[[
For a formal specification of the semantics of the RDF Schema
constructs the reader is referred to
the RDF Semantics document [[RDF11-MT]]. Users interested in more 
comprehensive
semantic modeling of RDF data might consider using the Web Ontology
Language OWL [[OWL2-OVERVIEW]]
]]

 > Section 5.  The idea that all these different syntaxes are all ways 
of describing
 > the same RDF graph structures is not immediately obvious, and I think 
is a major
 > barrier to comprehension.  Need to talk a little about concrete vs. 
abstract
 > syntax, maybe not in those terms, but to get across the idea of the 
graph syntax
 > being a level of abstraction higher than the particular notation used 
to describe it.

Good point. Included the following sentence in the first paragraph:
[[
However, different encodings of the same graph lead to exactly the same 
triples.
]]
I also suggest to include a graph diagram of the current example, and 
clarify the point about the abstract graph (added as a todo issue to the 
document).

 > Having one simple but not entirely trivial example graph (with at 
least one bnode,
 > at least two triples sharing a common object and one node used as both a
 > subject and an object) written out in all the different notations 
would be a very
 > useful thing to see. It would also hammer home the point about 
abstract graph
 > syntax, especially if you also provided a graph diagram for it.

The current example has all these features, except for the bnode. I’ll 
add an issue about including a (separate?) example with bnodes.

 > " therefore bringing the benefits of RDF to the JSON world. " Omit. 
Could be
 > read as condescending. I am sure there are many who would say, it 
brings JSON
 > sanity to the RDF world.

Deleted/rephrased.
Received on Wednesday, 27 November 2013 15:23:39 UTC