Re: RDF 1.1 Primer from Pat Hayes on 2013-11-22 (public-rdf-wg@w3.org from November 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 22 Nov 2013 13:51:00 -0600
To: Guus Schreiber <guus.schreiber@vu.nl>, yves.igetenoughspam.raimond@gmail.com
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <E3046661-F8E2-4803-8AA2-BF1176E2D583@ihmc.us>
First pass of major howlers, I will get back with more details and suggestions for replacements later. 

Pat
----------

First para. The examples are very atypical and misleading. RDF does not do times well, and it is not mostly used for annotating Web pages or videos, and 'resources' does not mean just Webbish things.  Might be better to use some DBpedia examples right off the bat, and talk explicitly about *data* rather than annotation. 

3.1. It is misleading to describe the <subject> as being what the statement is "about". A triple is as much "about" its object as it is about its subject. (BTW, this bad idea was one of the drivers behind the design of RDF/XML, which gives you some idea of what is bad about it :-)

Also, this may be rather pedantic, but the S P O terminology refers to the parts of the triple, ie it is RDF 'grammar', rather than what these IRIs refer to. So the predicate (is an IRI which) refers to the property (which is a real thing in the world that relates other things to one another.) I don't mean to suggest putting this into the primer, but it might be good to keep it in mind and use the terminology consistently throughout. The usage of 'sentences' versus 'facts' might be useful here.(?)

The terminology of "feature" for the property is not standard and not particularly helpful to the reader.

Why do you say that the subject IS what the triple is about, whereas the object REPRESENTS the value of the feature? This looks like a use/mention confusion. I would suggest avoiding the word "value" altogether, as it seems to generate confusion wherever it appears. 

The example <This video document> is misleading: RDF has nothing like the "this document" construction. In fact, all the examples in example 1 are misleading as they suggest that RDF uses English prose fragments rather than IRIs. The last sentence about the "three basic constructs" is therefore puzzling as it comes without any explanation or introduction. 

It is misleading to use the phrase "anonymous resources" when talking about blank nodes. This phrasing suggests that IRIs and bnodes denote different *kinds* of resource, which is misleading. (It is like saying that the pronoun "someone" refers to a nameless person, a distinct category from people with names.)

Your Mona Lisa example is strange as there is an obvious name to use there, and (not surposingly) a dbpedia IRI:  http://dbpedia.org/resource/Leonardo_da_Vinci. A more plausible example of bnode use might be saying that the Mona Lisa has in its background an X  and X is a Cypress tree. That is the kind of information that makes it genuinely implausible to assume that there is an IRI for the value, and it is the kind of thing that one might well want to record in for example a museum guide:
  
http://dbpedia.org/resource/Mona_Lisa  http://purl.org/net/lio#shows  _:x .
_:x http://www.w3.org/1999/U2/22-rdf-syntax-ns#type http://dbpedia.org/resource/Cypress .

"It should be noted that many RDF users in practice don't use blank nodes." No, it should not be noted. A recent scan found that over half of published RDF graphs use blank nodes, most (all?) OWL/RDF contains blank nodes, etc.. RDF could not function without blank nodes, and it is time to forget the brain-dead doctrine which says that their use should be avoided. And it is just silly to say in a primer that blank nodes make RDF "look complicated". What could be simpler than a blank node in a graph? 

"We can then make statements about these two graphs, for example adding license and provenance information:

        <http://example.com/bob> <is published by> <http://example.org>.
        <http://example.com/bob> <has license> <http://creativecommons.org/licenses/by/3.0/>."

<hair tearing> AAAARRRRGGGH</hair tearing> NO WE CAN'T. Or at least, this use is NOT SUPPORTED BY RDF with the specs in their current state. That 'metadata' use works ONLY when we know that the "identifying" graph IRIs denote their graphs, and WE HAVE EXPLICITLY SAID THAT RDF DOES NOT ASSUME THIS. A conforming RDF engine would be perfectly conforming if it refused to treat those subject IRIs as denoting the graph in these triples.  There is NOTHING in the RDF specs that say that a general IRI must be taken to denote what it conventionally identifies.  We do this only for datatype IRIs, and even getting that much into the specs was an uphill struggle; and in the case of graph labels in a dataset, we explicitly warn people to not expect this to be true (because it often isn't.) I know the Primer has to be simple, but please let us not put actual lies into it. 

"The original data model assumed that all triples are part of the same (large) graph." The RDF data model still assumes this. It is very misleading to suggest that datasets are a modification to the basic RDF data model, or even that this data model has changed. We considered such changes and rejected them. 

"The RDF data model provides a way to make statements about (Web) resources."

RDF makes statements about resources, i.e. about anything at all. The implied qualification in "(Web)" is false and misleading.

"As we mentioned, this data model does not make any assumptions about what these resources stand for."  

Resources don't (usually) stand for anything. Did you mean, what IRIs stand for? (Another use/mention confusion.)

".. a vocabulary description language called RDF-Schema " ?? In what sense is RDFS a 'vocabulary description' language? If you must say this, a least explain what it is supposed to mean. 

The introduction of classes is very awkward, and not really correct. I would suggest saying that they are categories which can be used to classify things: Bill rdf:type Human, Mona Lisa rdf:type Artwork, etc.. and avoid "group" (and "set") altogether. Then you don't need to immediately say that what you just said is false, which is not very reassuring for the reader. And you should mention rdf:type in the same breath. 

Need to clean up the terminology. It is very confusing to be told in quick succession:

==RDF Schema is a vocabulary description language
==FOAF is a *vocabulary* which is a *schema* which was one of the first *RDF Schemas* (Is RDF Schema one of the RDF Schemas?)
==DC is a vocabulary which is a *metadata element set* (Why isn't it an RDF Schema, like FOAF?)
==*schema*.org is a *vocabulary* (Given what it is called, why isn't it a schema?)
==SKOS is a *vocabulary* for publishing *schemes* (not schemas?) such as terminologies and thesauri. (Isn't a terminology a kind of vocabulary? So are schemes and vocabularies the same thing? Or was it schemas that were like vocabularies, and schemes are something different? And does SKOS describe them or is it an example of one of them? Or maybe both at the same time??)

Personally, I would never want to see the word "schema" ever again. 

Why don't we just say that anyone can publish an RDF vocabulary - a set of IRIs, typically from a single namespace - and specify what it is supposed to mean,  and then everyone can then use that vocabulary to write RDF data. It is good practice to have the root namespace IRI link to something that defines the meaning of the vocbaulary, and to re-use IRIs from existing vocabularies where you can, to make it easier to share meanings. And it is gold standard to publish an RDF graph which specifies at least part of your intended meanings for your vocabulary in a machine-readable way, if you can, using vocabularies intended for the purpose, for example RDFS or OWL or SKOS, because then they can be used by others in entailment rules. Example are given below....

and then after you have talked about entailments in the semantics section, you might for example show how dbpedia uses rdfs:subClassOf to create category hierarchies, or how FOAF uses owl:inverseFunctionalProperty to imitate database keys. 

"RDF Schema provides basic facilities for modeling semantics of RDF data. For a specification of these semantics the reader is referred to the RDF Semantics document [RDF11-MT]. For more comprehensive semantic modeling of RDF data the W3C recommends using the Web Ontology Language OWL [OWL2-OVERVIEW]."  

?? I don't even know what this is supposed to mean. RDFS and OWL "model semantics of RDF data" ?? That is either meaningless or false, I'm not sure which. Maybe both. Also, this reads as though the W3C recommends using OWL over RDFS, which if true is news to me (and not likely to lead to a rapid take-up of RDF, if users have to read the OWL specs first.) 

Section 5.  The idea that all these different syntaxes are all ways of describing the same RDF graph structures is not immediately obvious, and I think is a major barrier to comprehension.  Need to talk a little about concrete vs. abstract syntax, maybe not in those terms, but to get across the idea of the graph syntax being a level of abstraction higher than the particular notation used to describe it. 

Having one simple but not entirely trivial example graph (with at least one bnode, at least two triples sharing a common object and one node used as both a subject and an object) written out in all the different notations would be a very useful thing to see. It would also hammer home the point about abstract graph syntax, especially if you also provided a graph diagram for it. 

" therefore bringing the benefits of RDF to the JSON world. " Omit. Could be read as condescending. I am sure there are many who would say, it brings JSON sanity to the RDF world. 

Semantics. 

"This document takes a logical stance on RDF graphs". Remove this comment. It suggests that the 'stance' is somehow local to the Semantics document. The 'stance' is not taken by the document, but by the entire RDF spec.

I think it would be best in the primer just to say that RDF and RDFS (and OWL, and maybe other) vocabularies have a defined semantics which supports a range of entailment patterns (rules) that can be used to derive new information by deduction. And then just talk about those patterns, as you do with the Alice/range/person example, without mentioning semantics again. (After reading some recent email threads: you might want to emphasise that these patterns can be used without needing to dereference the IRIs in the triples. Apparently some newbies find this very hard to understand.)

Issue 7: No. If you want to mention combining graphs, do it earlier. It is not a particularly semantic issue. I think that all that needs be said is that RDF allows you to combine triples from any source into a graph and process it as legal RDF.

Issue 8: It is not Semantics that views graphs this way, all of RDF does. So if you want to mention it, say it earlier and don't link it explicitly to semantics. 

On Nov 19, 2013, at 5:21 PM, Guus Schreiber <guus.schreiber@vu.nl> wrote:

> All,
> 
> Yves, and I think that the Primer is ready for a first round of review by the WG. The Editor's Draft is here:
> 
>  https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-primer/index.html#
> 
> We are still working on Sec. 7 (RDF Data), but would welcome comments on the rest.
> 
> Guus
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 22 November 2013 19:51:40 UTC