Re: RDF 1.1 Primer

On Nov 27, 2013, at 9:23 AM, Guus Schreiber <guus.schreiber@vu.nl> wrote:

> Pat,
> 
> Here is the first set of responses to your comments. The responses concern your comments on Secs. 1-5. Secs. 6+ to follow.
> 
> Guus
> 
> > First pass of major howlers, I will get back with more details and suggestions for
> > replacements later.
> >
> > Pat
> > ----------
> >
> > First para. The examples are very atypical and misleading. RDF does not do
> > times well, and it is not mostly used for annotating Web pages or videos, and
> > 'resources' does not mean just Webbish things.  Might be better to use some
> > DBpedia examples right off the bat, and talk explicitly about *data* rather than
> > annotation.
> 
> Hmm. We were intending to include some “data” examples further-on in the document (in the RDF Data section). But I’m surprised you consider annotation to be atypical.

Well, I don't think the primer should give the impression that annotation is the primary or major use. Sure it is one, but yes I do see it as slightly atypical. What proportion of all RDF triples have a webpage or document as their subject? 

> Of course, Yves and I are a bit biased (due to our RDF work on music, TV, musea, archives, libraries). But there is lots of RDF annotations out there. And isn’t most of DBpedia in fact annotation?

Um..no. I was going to cite DBPedia as a counterxample, in fact.  Every DBpedia URI with /resource/ in it denotes a thing, not a web page, and the /info/ pages are all http-range-14 compliant HTTP redirects from the /resource/ URI. I don't see any annotations there at all. Maybe you and I mean something different by "annotation"? 

> We take the examples of a well-known person and painting to get the intuition about RDF across.  I would like to discuss this a bit more before changing.

FIne with a well-known example :-)
> 
> > 3.1. It is misleading to describe the <subject> as being what the statement is
> > "about". A triple is as much "about" its object as it is about its subject. (BTW, this
> > bad idea was one of the drivers behind the design of RDF/XML, which gives you
> > some idea of what is bad about it :-)
> 
> OK, point taken. BTW Turtle does the same thing in the shorthand notation. However, I’m not sure this is a subtlety the Primer should care about. If we say that a sentence has a “subject” we as humans mean that the sentence is “about” the subject”, don’t we?

No. Well, I don't. A sentence is about all the things it mentions. "Subject" here is a grammatical convention, not like the "subject" of a portrait or a biography. "Pat is Simon's father" says exactly the same as "Simon is Pat's son", so how can one of them be about Pat and the other about SImon? 

> Of course, it says also things about the object (and about the predicate). It may confuse people if we don’t use “subject” in the usual way.

But you aren't using it in the usual way, that is what I am talking about :-)

> 
> > Also, this may be rather pedantic, but the S P O terminology refers to the parts of
> > the triple, ie it is RDF 'grammar', rather than what these IRIs refer to. So the
> > predicate (is an IRI which) refers to the property (which is a real thing in the
> > world that relates other things to one another.) I don't mean to suggest putting
> > this into the primer, but it might be good to keep it in mind and use the
> > terminology consistently throughout. The usage of 'sentences' versus 'facts'
> > might be useful here.(?)
> 
> Fully agree. Actually, I think we’ve tried to make this distinction throughout the rest of the document, but you’re right it is not here. We maybe should say:
> 
>  RDF Fact: resource property resource
>  RDF Sentence: subject predicate object
> 
> but it actual traditional for introductory  RDF  texts to say only the latter. Hmm, I suggest we discuss this a bit more before changing.

I don't think the primer should introduce special terminology for all this, just be super careful about how it phrases things to be consistent with the distinctions. 
> 
> > The terminology of "feature" for the property is not standard and not particularly
> > helpful to the reader.
> 
> Right. I suggest to simply say “property”.

Yup :-)
> 
> > Why do you say that the subject IS what the triple is about, whereas the object
> > REPRESENTS the value of the feature? This looks like a use/mention confusion.
> > I would suggest avoiding the word "value" altogether, as it seems to generate
> > confusion wherever it appears.
> 
> Now rephrased as:
> [[
> The subject represents the resource we like to make a statement about. The predicate represents a property of the subject. The object represents the value of the property for this subject. Because RDF statements consist of three elements they are called triples.
> ]]

Better, but read on
> 
> I take your point about “value”, but I am at a loss for another term. I don’t think that “value” creates much confusion here. The term “property value” is very common.

But still misleading. For example, the way you phrase it ("the value") seems to indicate that there is a single value for a property and a subject, ie that all properties are functional. So you ought to say "represents **a** value of the property for the subject" which however is going to puzzle some readers who think that values are single-valued. 

Also, calling the predicate a "property of the subject" sounds wrong to me. Some of my properties are being male, being retired, being a father. But none of these have values. The SPO things are relations to other things: being the father OF SImon, being the husband OF Jackie. being the owner OF my home, etc. These aren't properties of me, they are relations between me and other things. The RDF predicate actually indicates a binary relation, not a property in the sense that word is usually used. Maybe this is why "property value" seems like such a very odd usage to me in this context. (Outside RDF and logic, "property value" means how much your real estate is worth, by the way, using a very different sense of "property".)


> 
> > The example <This video document> is misleading: RDF has nothing like the
> > "this document" construction. In fact, all the examples in example 1 are
> >  misleading as they suggest that RDF uses English prose fragments rather than
> >  IRIs.
> 
> Well, the purpose here was to use English prose. I wouldn’t like to get rid of that. But the wording of the video example is misleading. Suggest to rephrase as Video xyz, or maybe better, BBC program xyz.

Much better, yes. I still don't like using the prose in this way, but maybe we will just have to agree to differ on that. 

> 
> > The last sentence about the "three basic constructs" is therefore puzzling
> >  as it comes without any explanation or introduction.
> 
> Indeed. Suggest to delete this paragraph. In the sections itself it becomes clear enough.

No, I think the transition from short prose fragments to triples with IRIs and literals is still quite opaque and rather disconcerting. In what sense can three IRIs be seen as anything like a sentence, even a short sentence? 

> As n aside: would it be useful to include a summary like “subject => IRI or blank node;  predicate => IRI;  object => all three” somewhere?

Yes, but not in the first section :-)

> 
> > It is misleading to use the phrase "anonymous resources" when talking about
> > blank nodes. This phrasing suggests that IRIs and bnodes denote different
> > *kinds* of resource, which is misleading. (It is like saying that the pronoun
> > "someone" refers to a nameless person, a distinct category from people with
> > names.)
> 
> Right, although I’m not sure many people will be misled by the term “anonymous”.
> Suggested rephrasing:
> 
> [[
> In addition, it is sometimes handy to be able to talk about resources which have no identifier. For example, we might want to state that the Mona Lisa painting has in its background an unidentified tree which we know to be a cypress tree. Resources such as the unidentified cypress tree are called "blank nodes" in RDF.
> ]]
> 

Sorry to be so picky, but this has the same problem. The point is that the thing denoted by the blank node might very well have an identifier, for all you know: that is irrelevant to the bnode usage. You use a bnode when *you don't care to identify it*, whether it has an identifier or not. It has nothing to do with the thing itself, whether it has an identifier or not (they all do, in fact, in a sense, because if it doesn't, you can invent one yourself and use it, that is what skolemization does); it is entirely to do with the pragmatics of writing your description, about whether or not you want to be bothered to use an identifier at this point. 

Also the last sentence has a use/mention confusion in it.

so, reword: " it is sometimes handy to be able to talk about resources without bothering to use an identifier.  For example, we might want to state that the Mona Lisa painting has in its background an unidentified tree which we know to be a cypress tree. To do this in RDF, we use a blank node to indicate the un-named thing. Blank nodes are like simple variables in algebra; they represent some thing without saying what their value is. 

(Last sentence is optional, of course. Might cause more heat than light.)


> > Your Mona Lisa example is strange as there is an obvious name to use there,
> > and (not surprisingly) a dbpedia IRI:
> > http://dbpedia.org/resource/Leonardo_da_Vinci. A more plausible example of
> > bnode use might be saying that the Mona Lisa has in its background an X  and X
> > is a Cypress tree. That is the kind of information that makes it genuinely
> > implausible to assume that there is an IRI for the value, and it is the kind of thing
> > that one might well want to record in for example a museum guide:
> >
> > http://dbpedia.org/resource/Mona_Lisa  http://purl.org/net/lio#shows  _:x .
> > _:x http://www.w3.org/1999/U2/22-rdf-syntax-ns#type
> > http://dbpedia.org/resource/Cypress .
> 
> See rephrasing above. We should consider including the Cypress tree example in the overall example in the Syntax section.
> 
> > "It should be noted that many RDF users in practice don't use blank nodes." No,
> > it should not be noted. A recent scan found that over half of published RDF
> > graphs use blank nodes, most (all?) OWL/RDF contains blank nodes, etc.. RDF
> > could not function without blank nodes, and it is time to forget the brain-dead
> > doctrine which says that their use should be avoided. And it is just silly to say in
> > a primer that blank nodes make RDF "look complicated". What could be simpler
> > than a blank node in a graph?
> 
> Well, blank nodes definitely make the normative specifications much harder to read. That is what I wanted, admittedly very poorly, to bring across.  Suggest to delete for now.

Agreed. 

> 
> > "We can then make statements about these two graphs, for example adding
> >  license and provenance information:
> >
> >        <http://example.com/bob> <is published by> <http://example.org>.
> >        <http://example.com/bob> <has license>
> > <http://creativecommons.org/licenses/by/3.0/>."
> >
> > <hair tearing> AAAARRRRGGGH</hair tearing> NO WE CAN'T. Or at least, this
> > use is NOT SUPPORTED BY RDF with the specs in their current state. That
> > 'metadata' use works ONLY when we know that the "identifying" graph IRIs
> > denote their graphs, and WE HAVE EXPLICITLY SAID THAT RDF DOES NOT
> > ASSUME THIS. A conforming RDF engine would be perfectly conforming if it
> > refused to treat those subject IRIs as denoting the graph in these triples.  There
> > is NOTHING in the RDF specs that say that a general IRI must be taken to
> > denote what it conventionally identifies.  We do this only for datatype IRIs, and
> > even getting that much into the specs was an uphill struggle; and in the case of
> > graph labels in a dataset, we explicitly warn people to not expect this to be true
> > (because it often isn't.) I know the Primer has to be simple, but please let us not
> >  put actual lies into it.
> 
> I don’t think the Primer says any of this (or certainly doesn’t want to).

It implies it. The "We can then..." strongly implies that this usage is somehow a consequence of the dataset labelling, but this is incorrect. That labelling has no bearing at all on what that "meta" RDF means. You could have written it down without the dataset existing and it would have means exactly as much (or as little) as it does in your example.

> We can write down these statements, no problem.

In the sense that they are not a syntax error, yes you can. In the sense that they will mean what you want them to mean, no you can't.

> Would a rephrase like this help:
> 
> [[
> We can then write down triples that include the graph names, for example:
>  <example>
> These two triples could be interpreted as license and provenance information of the graph <xyz. (And then a note about lack of RDF semantics for this).
> ]]

Better, but the second sentence is still misleading: they *could* be interpreted that way, but they could also be interpreted differently, and they could have been interpreted this way even if the dataset did not exist. The presence of the dataset makes absolutely no difference at all to how they can be interpreted, so why do you say this here, implying that is somehow relevant?

The fact is, we had an opportunity to make this kind of usage standard, and we botched it. Thoroughly and completely and catastrophically botched it. I think that it is better to just quietly draw a curtain over this, than to force readers of a primer to get their heads around the fact that the most obvious thing one might want to do with graph names is not in fact supported by the RDF specs. I would strongly suggest just not mentioning anything at all about using graph names in RDF. Simply mention that they can be used to direct SPARQL queries to a graph in a dataset, and leave it at that. That is strictly correct, useful, and uncontroversial. 

> 
> > "The original data model assumed that all triples are part of the same (large)
> > graph." The RDF data model still assumes this. It is very misleading to suggest
> > that datasets are a modification to the basic RDF data model, or even that this
> > data model has changed. We considered such changes and rejected them.
> 
> I deleted this sentence.
> 
> > "The RDF data model provides a way to make statements about (Web)
> > resources."
> >
> > RDF makes statements about resources, i.e. about anything at all. The implied
> > qualification in "(Web)" is false and misleading.
> 
> Right. Deleted.
> 
> > "As we mentioned, this data model does not make any assumptions about what
> > these resources stand for."
> >
> > Resources don't (usually) stand for anything. Did you mean, what IRIs stand for?
> > (Another use/mention confusion.)
> 
> Oops, right. Corrected.
> 
> > ".. a vocabulary description language called RDF-Schema " ?? In what sense is
> > RDFS a 'vocabulary description' language? If you must say this, a least explain
> > what it is supposed to mean.
> 
> It is actually the title of the document (“RDF Vocabulary Description Language 1.0 RDF Schema”). Bit I agree with your point. Rephrased now as “To support the definition of vocabularies RDF provides the RDF-Schema language.”.

I forgot it was there in the title :-) 
 
> It raises another point: should we rename the RDF Schema document?

This has been covered now, right?

> > The introduction of classes is very awkward, and not really correct. I would
> > suggest saying that they are categories which can be used to classify things: Bill
> > rdf:type Human, Mona Lisa rdf:type Artwork, etc.. and avoid "group" (and "set")
> > altogether. Then you don't need to immediately say that what you just said is
> > false, which is not very reassuring for the reader. And you should mention
> > rdf:type in the same breath.
> 
> Right. I was struggling with this. New phrasing in document.
> 
> > Need to clean up the terminology. It is very confusing to be told in quick
> > succession:
> >
> > ==RDF Schema is a vocabulary description language
> > ==FOAF is a *vocabulary* which is a *schema* which was one of the first *RDF
> > Schemas* (Is RDF Schema one of the RDF Schemas?)
> >
> > ==DC is a vocabulary which is a *metadata element set* (Why isn't it an RDF
> > Schema, like FOAF?)
> > ==*schema*.org is a *vocabulary* (Given what it is called, why isn't it a schema?)
> > ==SKOS is a *vocabulary* for publishing *schemes* (not schemas?) such as
> > terminologies and thesauri. (Isn't a terminology a kind of vocabulary? So are
> > schemes and vocabularies the same thing? Or was it schemas that were like
> > vocabularies, and schemes are something different? And does SKOS describe
> > them or is it an example of one of them? Or maybe both at the same time??)
> >
> > Personally, I would never want to see the word "schema" ever again.
> 
> In principle agreed. But the term “schema” is used in the outside world in ambiguous ways; nothing we can do about that.

True.

> But I deleted the term “schema” from our own text, that i indeed better. Two specific things:
> - Saying that schema.org is a vocabulary is, I think, correct.

Yes.

> - SKOS is a meta-vocabulary for specifying classification schemes/…

Why not just, a vocabulary? The fewer unexplained disticntions, the better, and 'vocabulary' is accurate in any case. 

> 
> > Why don't we just say that anyone can publish an RDF vocabulary - a set of IRIs,
> > typically from a single namespace - and specify what it is supposed to mean,
> > and then everyone can then use that vocabulary to write RDF data. It is good
> > practice to have the root namespace IRI link to something that defines the
> > meaning of the vocabulary, and to re-use IRIs from existing vocabularies where
> > you can, to make it easier to share meanings. And it is gold standard to publish
> > an RDF graph which specifies at least part of your intended meanings for your
> > vocabulary in a machine-readable way, if you can, using vocabularies intended
> > for the purpose, for example RDFS or OWL or SKOS, because then they can be
> > used by others in entailment rules. Example are given below....
> 
> OK, will use this for rephrasing parts of this section.
> 
> > and then after you have talked about entailments in the semantics section, you
> > might for example show how dbpedia uses rdfs:subClassOf to create category
> > hierarchies, or how FOAF uses owl:inverseFunctionalProperty to imitate
> > database keys.
> 
> Good suggestion. Will include this.
> 
> > "RDF Schema provides basic facilities for modeling semantics of RDF data. For
> > a specification of these semantics the reader is referred to the RDF Semantics
> > document [RDF11-MT]. For more comprehensive semantic modeling of RDF
> > data the W3C recommends using the Web Ontology Language OWL
> >  [OWL2-OVERVIEW]."
> >
> > ?? I don't even know what this is supposed to mean. RDFS and OWL "model
> > semantics of RDF data" ?? That is either meaningless or false, I'm not sure
> > which. Maybe both. Also, this reads as though the W3C recommends using OWL
> > over RDFS, which if true is news to me (and not likely to lead to a rapid take-up
> > of RDF, if users have to read the OWL specs first.)
> 
> I wanted to say something nice about OWL! :). Seriously, suggest to rephrase as:
> 
> [[
> For a formal specification of the semantics of the RDF Schema
> constructs the reader is referred to
> the RDF Semantics document [[RDF11-MT]]. Users interested in more comprehensive
> semantic modeling of RDF data might consider using the Web Ontology
> Language OWL [[OWL2-OVERVIEW]]
> ]]

Better, but I am still puzzled by the idea that OWL does semantic modelling **of RDF data**. To me that sounds like OWL redefines or modifies the RDF semantics (?). Perhaps I am just being semantically pedantic about the semantics of "semantics". 

More seriously, it might be worth saying that all of these can be treated as RDF vocabularies and used in RDF data freely, even mixed together.  This might be a fairly new idea to some readers, and calling OWL a different "Language" suggests that it is an alternative to RDF rather than just one vocabulary among many. 

> > Section 5.  The idea that all these different syntaxes are all ways of describing
> > the same RDF graph structures is not immediately obvious, and I think is a major
> > barrier to comprehension.  Need to talk a little about concrete vs. abstract
> > syntax, maybe not in those terms, but to get across the idea of the graph syntax
> > being a level of abstraction higher than the particular notation used to describe it.
> 
> Good point. Included the following sentence in the first paragraph:
> [[
> However, different encodings of the same graph lead to exactly the same triples.
> ]]
> I also suggest to include a graph diagram of the current example, and clarify the point about the abstract graph (added as a todo issue to the document).

If you are on a Mac, Omnigraffle generates beautiful diagrams. 

Pat


> 
> > Having one simple but not entirely trivial example graph (with at least one bnode,
> > at least two triples sharing a common object and one node used as both a
> > subject and an object) written out in all the different notations would be a very
> > useful thing to see. It would also hammer home the point about abstract graph
> > syntax, especially if you also provided a graph diagram for it.
> 
> The current example has all these features, except for the bnode. I’ll add an issue about including a (separate?) example with bnodes.
> 
> > " therefore bringing the benefits of RDF to the JSON world. " Omit. Could be
> > read as condescending. I am sure there are many who would say, it brings JSON
> > sanity to the RDF world.
> 
> Deleted/rephrased.
> 
> 
> 
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes

Received on Thursday, 28 November 2013 07:35:45 UTC