Re: RDF 1.1 Primer

On 28-11-13 08:35, Pat Hayes wrote:
>
> On Nov 27, 2013, at 9:23 AM, Guus Schreiber <guus.schreiber@vu.nl>
> wrote:
>
>> Pat,
>>
>> Here is the first set of responses to your comments. The responses
>> concern your comments on Secs. 1-5. Secs. 6+ to follow.
>>
>> Guus
>>
>>> First pass of major howlers, I will get back with more details
>>> and suggestions for replacements later.
>>>
>>> Pat ----------
>>>
>>> First para. The examples are very atypical and misleading. RDF
>>> does not do times well, and it is not mostly used for annotating
>>> Web pages or videos, and 'resources' does not mean just Webbish
>>> things.  Might be better to use some DBpedia examples right off
>>> the bat, and talk explicitly about *data* rather than
>>> annotation.
>>
>> Hmm. We were intending to include some “data” examples further-on
>> in the document (in the RDF Data section). But I’m surprised you
>> consider annotation to be atypical.
>
> Well, I don't think the primer should give the impression that
> annotation is the primary or major use. Sure it is one, but yes I do
> see it as slightly atypical. What proportion of all RDF triples have
> a webpage or document as their subject?
>
>> Of course, Yves and I are a bit biased (due to our RDF work on
>> music, TV, musea, archives, libraries). But there is lots of RDF
>> annotations out there. And isn’t most of DBpedia in fact
>> annotation?
>
> Um..no. I was going to cite DBPedia as a counterxample, in fact.
> Every DBpedia URI with /resource/ in it denotes a thing, not a web
> page, and the /info/ pages are all http-range-14 compliant HTTP
> redirects from the /resource/ URI. I don't see any annotations there
> at all. Maybe you and I mean something different by "annotation"?

Yves and I discussed this at some length. Of the 6 triples in the first 
example 2 triples might be construed as being "annotation" (a term we 
don't use in the primer BTW). One of these is is the fact that Leonardo 
created the Mona Lisa, which could just as easily have come from 
DBpedia. The second one is about the subject of a video, which we think 
is a natural enough example.

We suggest to replace in Sec. 3.2 the VIAF URI for Leonardo with a 
DBpedia URI.

>
>> We take the examples of a well-known person and painting to get the
>> intuition about RDF across.  I would like to discuss this a bit
>> more before changing.
>
> FIne with a well-known example :-)
>>
>>> 3.1. It is misleading to describe the <subject> as being what the
>>> statement is "about". A triple is as much "about" its object as
>>> it is about its subject. (BTW, this bad idea was one of the
>>> drivers behind the design of RDF/XML, which gives you some idea
>>> of what is bad about it :-)
>>
>> OK, point taken. BTW Turtle does the same thing in the shorthand
>> notation. However, I’m not sure this is a subtlety the Primer
>> should care about. If we say that a sentence has a “subject” we as
>> humans mean that the sentence is “about” the subject”, don’t we?
>
> No. Well, I don't. A sentence is about all the things it mentions.
> "Subject" here is a grammatical convention, not like the "subject" of
> a portrait or a biography. "Pat is Simon's father" says exactly the
> same as "Simon is Pat's son", so how can one of them be about Pat and
> the other about SImon?
>
>> Of course, it says also things about the object (and about the
>> predicate). It may confuse people if we don’t use “subject” in the
>> usual way.
>
> But you aren't using it in the usual way, that is what I am talking
> about :-)
 >
>>> Also, this may be rather pedantic, but the S P O terminology
>>> refers to the parts of the triple, ie it is RDF 'grammar', rather
>>> than what these IRIs refer to. So the predicate (is an IRI which)
>>> refers to the property (which is a real thing in the world that
>>> relates other things to one another.) I don't mean to suggest
>>> putting this into the primer, but it might be good to keep it in
>>> mind and use the terminology consistently throughout. The usage
>>> of 'sentences' versus 'facts' might be useful here.(?)
>>
>> Fully agree. Actually, I think we’ve tried to make this distinction
>> throughout the rest of the document, but you’re right it is not
>> here. We maybe should say:
>>
>> RDF Fact: resource property resource RDF Sentence: subject
>> predicate object
>>
>> but it actual traditional for introductory  RDF  texts to say only
>> the latter. Hmm, I suggest we discuss this a bit more before
>> changing.
>
> I don't think the primer should introduce special terminology for all
> this, just be super careful about how it phrases things to be
> consistent with the distinctions.
>>
>>> The terminology of "feature" for the property is not standard and
>>> not particularly helpful to the reader.
>>
>> Right. I suggest to simply say “property”.
>
> Yup :-)
>>
>>> Why do you say that the subject IS what the triple is about,
>>> whereas the object REPRESENTS the value of the feature? This
>>> looks like a use/mention confusion. I would suggest avoiding the
>>> word "value" altogether, as it seems to generate confusion
>>> wherever it appears.
>>
>> Now rephrased as: [[ The subject represents the resource we like to
>> make a statement about. The predicate represents a property of the
>> subject. The object represents the value of the property for this
>> subject. Because RDF statements consist of three elements they are
>> called triples. ]]
>
> Better, but read on
>>
>> I take your point about “value”, but I am at a loss for another
>> term. I don’t think that “value” creates much confusion here. The
>> term “property value” is very common.
>
> But still misleading. For example, the way you phrase it ("the
> value") seems to indicate that there is a single value for a property
> and a subject, ie that all properties are functional. So you ought to
> say "represents **a** value of the property for the subject" which
> however is going to puzzle some readers who think that values are
> single-valued.
>
> Also, calling the predicate a "property of the subject" sounds wrong
> to me. Some of my properties are being male, being retired, being a
> father. But none of these have values. The SPO things are relations
> to other things: being the father OF SImon, being the husband OF
> Jackie. being the owner OF my home, etc. These aren't properties of
> me, they are relations between me and other things. The RDF predicate
> actually indicates a binary relation, not a property in the sense
> that word is usually used. Maybe this is why "property value" seems
> like such a very odd usage to me in this context. (Outside RDF and
> logic, "property value" means how much your real estate is worth, by
> the way, using a very different sense of "property".)

I still think that subject to most people will mean "what the statement 
is about", cf. book subject. Also, RDF statements are read from left to 
right, so it is not a neutral relationship (in RDF we write "X friendOf 
Y",  not "friends  X Y"). Another reason for not using "relationship:" 
is that leads to using three terms for "the thing in the middle": 
predicate, relation, property.

However, I see your point as well (especially to get rid of "value"). 
Here is an alternative text:

[[
An RDF statement represents a relationship between two resources.  The 
subject and the object represent the two resources being related; the 
predicate represents the nature of their relationship. The relationship 
is phrased in  a directional way (from subject to object and is called 
in RDF a <strong>property</strong>.
]]

>
>
>>
>>> The example <This video document> is misleading: RDF has nothing
>>> like the "this document" construction. In fact, all the examples
>>> in example 1 are misleading as they suggest that RDF uses English
>>> prose fragments rather than IRIs.
>>
>> Well, the purpose here was to use English prose. I wouldn’t like to
>> get rid of that. But the wording of the video example is
>> misleading. Suggest to rephrase as Video xyz, or maybe better, BBC
>> program xyz.
>
> Much better, yes. I still don't like using the prose in this way, but
> maybe we will just have to agree to differ on that.

Added a sentence just after the diagram:

[[
The example above does not constitute actual RDF syntax; it is just 
intended to provide an very informal view of the notion of an RDF graph.
]]

I included also a note at this point in the text about the notion of 
abstract syntax.

>
>>
>>> The last sentence about the "three basic constructs" is therefore
>>> puzzling as it comes without any explanation or introduction.
>>
>> Indeed. Suggest to delete this paragraph. In the sections itself it
>> becomes clear enough.
>
> No, I think the transition from short prose fragments to triples with
> IRIs and literals is still quite opaque and rather disconcerting. In
> what sense can three IRIs be seen as anything like a sentence, even a
> short sentence?

Now rephrased as follows:

[[
In the next three subsections we discuss the three types of RDF data 
that occur in triples: IRIs, literals and blank nodes.
]]

>
>> As n aside: would it be useful to include a summary like “subject
>> => IRI or blank node;  predicate => IRI;  object => all three”
>> somewhere?
>
> Yes, but not in the first section :-)
>
>>
>>> It is misleading to use the phrase "anonymous resources" when
>>> talking about blank nodes. This phrasing suggests that IRIs and
>>> bnodes denote different *kinds* of resource, which is misleading.
>>> (It is like saying that the pronoun "someone" refers to a
>>> nameless person, a distinct category from people with names.)
>>
>> Right, although I’m not sure many people will be misled by the term
>> “anonymous”. Suggested rephrasing:
>>
>> [[ In addition, it is sometimes handy to be able to talk about
>> resources which have no identifier. For example, we might want to
>> state that the Mona Lisa painting has in its background an
>> unidentified tree which we know to be a cypress tree. Resources
>> such as the unidentified cypress tree are called "blank nodes" in
>> RDF. ]]
>>
>
> Sorry to be so picky, but this has the same problem. The point is
> that the thing denoted by the blank node might very well have an
> identifier, for all you know: that is irrelevant to the bnode usage.
> You use a bnode when *you don't care to identify it*, whether it has
> an identifier or not. It has nothing to do with the thing itself,
> whether it has an identifier or not (they all do, in fact, in a
> sense, because if it doesn't, you can invent one yourself and use it,
> that is what skolemization does); it is entirely to do with the
> pragmatics of writing your description, about whether or not you want
> to be bothered to use an identifier at this point.
>
> Also the last sentence has a use/mention confusion in it.
>
> so, reword: " it is sometimes handy to be able to talk about
> resources without bothering to use an identifier.  For example, we
> might want to state that the Mona Lisa painting has in its background
> an unidentified tree which we know to be a cypress tree. To do this
> in RDF, we use a blank node to indicate the un-named thing. Blank
> nodes are like simple variables in algebra; they represent some thing
> without saying what their value is.

Adopted, including the last sentence :).

>
> (Last sentence is optional, of course. Might cause more heat than
> light.)
>
>
>>> Your Mona Lisa example is strange as there is an obvious name to
>>> use there, and (not surprisingly) a dbpedia IRI:
>>> http://dbpedia.org/resource/Leonardo_da_Vinci. A more plausible
>>> example of bnode use might be saying that the Mona Lisa has in
>>> its background an X  and X is a Cypress tree. That is the kind of
>>> information that makes it genuinely implausible to assume that
>>> there is an IRI for the value, and it is the kind of thing that
>>> one might well want to record in for example a museum guide:
>>>
>>> http://dbpedia.org/resource/Mona_Lisa
>>> http://purl.org/net/lio#shows  _:x . _:x
>>> http://www.w3.org/1999/U2/22-rdf-syntax-ns#type
>>> http://dbpedia.org/resource/Cypress .
>>
>> See rephrasing above. We should consider including the Cypress tree
>> example in the overall example in the Syntax section.
>>
>>> "It should be noted that many RDF users in practice don't use
>>> blank nodes." No, it should not be noted. A recent scan found
>>> that over half of published RDF graphs use blank nodes, most
>>> (all?) OWL/RDF contains blank nodes, etc.. RDF could not function
>>> without blank nodes, and it is time to forget the brain-dead
>>> doctrine which says that their use should be avoided. And it is
>>> just silly to say in a primer that blank nodes make RDF "look
>>> complicated". What could be simpler than a blank node in a
>>> graph?
>>
>> Well, blank nodes definitely make the normative specifications much
>> harder to read. That is what I wanted, admittedly very poorly, to
>> bring across.  Suggest to delete for now.
>
> Agreed.
>
>>
>>> "We can then make statements about these two graphs, for example
>>> adding license and provenance information:
>>>
>>> <http://example.com/bob> <is published by> <http://example.org>.
>>> <http://example.com/bob> <has license>
>>> <http://creativecommons.org/licenses/by/3.0/>."
>>>
>>> <hair tearing> AAAARRRRGGGH</hair tearing> NO WE CAN'T. Or at
>>> least, this use is NOT SUPPORTED BY RDF with the specs in their
>>> current state. That 'metadata' use works ONLY when we know that
>>> the "identifying" graph IRIs denote their graphs, and WE HAVE
>>> EXPLICITLY SAID THAT RDF DOES NOT ASSUME THIS. A conforming RDF
>>> engine would be perfectly conforming if it refused to treat those
>>> subject IRIs as denoting the graph in these triples.  There is
>>> NOTHING in the RDF specs that say that a general IRI must be
>>> taken to denote what it conventionally identifies.  We do this
>>> only for datatype IRIs, and even getting that much into the specs
>>> was an uphill struggle; and in the case of graph labels in a
>>> dataset, we explicitly warn people to not expect this to be true
>>> (because it often isn't.) I know the Primer has to be simple, but
>>> please let us not put actual lies into it.
>>
>> I don’t think the Primer says any of this (or certainly doesn’t
>> want to).
>
> It implies it. The "We can then..." strongly implies that this usage
> is somehow a consequence of the dataset labelling, but this is
> incorrect. That labelling has no bearing at all on what that "meta"
> RDF means. You could have written it down without the dataset
> existing and it would have means exactly as much (or as little) as it
> does in your example.
>
>> We can write down these statements, no problem.
>
> In the sense that they are not a syntax error, yes you can. In the
> sense that they will mean what you want them to mean, no you can't.
>
>> Would a rephrase like this help:
>>
>> [[ We can then write down triples that include the graph names, for
>> example: <example> These two triples could be interpreted as
>> license and provenance information of the graph <xyz. (And then a
>> note about lack of RDF semantics for this). ]]
>
> Better, but the second sentence is still misleading: they *could* be
> interpreted that way, but they could also be interpreted differently,
> and they could have been interpreted this way even if the dataset did
> not exist. The presence of the dataset makes absolutely no difference
> at all to how they can be interpreted, so why do you say this here,
> implying that is somehow relevant?
>
> The fact is, we had an opportunity to make this kind of usage
> standard, and we botched it. Thoroughly and completely and
> catastrophically botched it. I think that it is better to just
> quietly draw a curtain over this, than to force readers of a primer
> to get their heads around the fact that the most obvious thing one
> might want to do with graph names is not in fact supported by the RDF
> specs. I would strongly suggest just not mentioning anything at all
> about using graph names in RDF. Simply mention that they can be used
> to direct SPARQL queries to a graph in a dataset, and leave it at
> that. That is strictly correct, useful, and uncontroversial.

It was an explicit request for the Primer to have metadata/provenance 
examples. The Primer is a Note and not a Rec. I would like to have a bit 
more discussion in the WG before deleting such examples. To be continued.

>
>>
>>> "The original data model assumed that all triples are part of the
>>> same (large) graph." The RDF data model still assumes this. It is
>>> very misleading to suggest that datasets are a modification to
>>> the basic RDF data model, or even that this data model has
>>> changed. We considered such changes and rejected them.
>>
>> I deleted this sentence.
>>
>>> "The RDF data model provides a way to make statements about
>>> (Web) resources."
>>>
>>> RDF makes statements about resources, i.e. about anything at all.
>>> The implied qualification in "(Web)" is false and misleading.
>>
>> Right. Deleted.
>>
>>> "As we mentioned, this data model does not make any assumptions
>>> about what these resources stand for."
>>>
>>> Resources don't (usually) stand for anything. Did you mean, what
>>> IRIs stand for? (Another use/mention confusion.)
>>
>> Oops, right. Corrected.
>>
>>> ".. a vocabulary description language called RDF-Schema " ?? In
>>> what sense is RDFS a 'vocabulary description' language? If you
>>> must say this, a least explain what it is supposed to mean.
>>
>> It is actually the title of the document (“RDF Vocabulary
>> Description Language 1.0 RDF Schema”). Bit I agree with your point.
>> Rephrased now as “To support the definition of vocabularies RDF
>> provides the RDF-Schema language.”.
>
> I forgot it was there in the title :-)
>
>> It raises another point: should we rename the RDF Schema document?
>
> This has been covered now, right?
>
>>> The introduction of classes is very awkward, and not really
>>> correct. I would suggest saying that they are categories which
>>> can be used to classify things: Bill rdf:type Human, Mona Lisa
>>> rdf:type Artwork, etc.. and avoid "group" (and "set") altogether.
>>> Then you don't need to immediately say that what you just said
>>> is false, which is not very reassuring for the reader. And you
>>> should mention rdf:type in the same breath.
>>
>> Right. I was struggling with this. New phrasing in document.
>>
>>> Need to clean up the terminology. It is very confusing to be told
>>> in quick succession:
>>>
>>> ==RDF Schema is a vocabulary description language ==FOAF is a
>>> *vocabulary* which is a *schema* which was one of the first *RDF
>>> Schemas* (Is RDF Schema one of the RDF Schemas?)
>>>
>>> ==DC is a vocabulary which is a *metadata element set* (Why isn't
>>> it an RDF Schema, like FOAF?) ==*schema*.org is a *vocabulary*
>>> (Given what it is called, why isn't it a schema?) ==SKOS is a
>>> *vocabulary* for publishing *schemes* (not schemas?) such as
>>> terminologies and thesauri. (Isn't a terminology a kind of
>>> vocabulary? So are schemes and vocabularies the same thing? Or
>>> was it schemas that were like vocabularies, and schemes are
>>> something different? And does SKOS describe them or is it an
>>> example of one of them? Or maybe both at the same time??)
>>>
>>> Personally, I would never want to see the word "schema" ever
>>> again.
>>
>> In principle agreed. But the term “schema” is used in the outside
>> world in ambiguous ways; nothing we can do about that.
>
> True.
>
>> But I deleted the term “schema” from our own text, that i indeed
>> better. Two specific things: - Saying that schema.org is a
>> vocabulary is, I think, correct.
>
> Yes.
>
>> - SKOS is a meta-vocabulary for specifying classification
>> schemes/…
>
> Why not just, a vocabulary? The fewer unexplained disticntions, the
> better, and 'vocabulary' is accurate in any case.

Fine, that was our original phrasing.

>
>>
>>> Why don't we just say that anyone can publish an RDF vocabulary -
>>> a set of IRIs, typically from a single namespace - and specify
>>> what it is supposed to mean, and then everyone can then use that
>>> vocabulary to write RDF data. It is good practice to have the
>>> root namespace IRI link to something that defines the meaning of
>>> the vocabulary, and to re-use IRIs from existing vocabularies
>>> where you can, to make it easier to share meanings. And it is
>>> gold standard to publish an RDF graph which specifies at least
>>> part of your intended meanings for your vocabulary in a
>>> machine-readable way, if you can, using vocabularies intended for
>>> the purpose, for example RDFS or OWL or SKOS, because then they
>>> can be used by others in entailment rules. Example are given
>>> below....
>>
>> OK, will use this for rephrasing parts of this section.
>>
>>> and then after you have talked about entailments in the semantics
>>> section, you might for example show how dbpedia uses
>>> rdfs:subClassOf to create category hierarchies, or how FOAF uses
>>> owl:inverseFunctionalProperty to imitate database keys.
>>
>> Good suggestion. Will include this.
>>
>>> "RDF Schema provides basic facilities for modeling semantics of
>>> RDF data. For a specification of these semantics the reader is
>>> referred to the RDF Semantics document [RDF11-MT]. For more
>>> comprehensive semantic modeling of RDF data the W3C recommends
>>> using the Web Ontology Language OWL [OWL2-OVERVIEW]."
>>>
>>> ?? I don't even know what this is supposed to mean. RDFS and OWL
>>> "model semantics of RDF data" ?? That is either meaningless or
>>> false, I'm not sure which. Maybe both. Also, this reads as though
>>> the W3C recommends using OWL over RDFS, which if true is news to
>>> me (and not likely to lead to a rapid take-up of RDF, if users
>>> have to read the OWL specs first.)
>>
>> I wanted to say something nice about OWL! :). Seriously, suggest to
>> rephrase as:
>>
>> [[ For a formal specification of the semantics of the RDF Schema
>> constructs the reader is referred to the RDF Semantics document
>> [[RDF11-MT]]. Users interested in more comprehensive semantic
>> modeling of RDF data might consider using the Web Ontology Language
>> OWL [[OWL2-OVERVIEW]] ]]
>
> Better, but I am still puzzled by the idea that OWL does semantic
> modelling **of RDF data**. To me that sounds like OWL redefines or
> modifies the RDF semantics (?). Perhaps I am just being semantically
> pedantic about the semantics of "semantics".
>
> More seriously, it might be worth saying that all of these can be
> treated as RDF vocabularies and used in RDF data freely, even mixed
> together.  This might be a fairly new idea to some readers, and
> calling OWL a different "Language" suggests that it is an alternative
> to RDF rather than just one vocabulary among many.

Added "OWL is a RDF vocabulary, so it can be
used in combination with RDF Schema."

>
>>> Section 5.  The idea that all these different syntaxes are all
>>> ways of describing the same RDF graph structures is not
>>> immediately obvious, and I think is a major barrier to
>>> comprehension.  Need to talk a little about concrete vs.
>>> abstract syntax, maybe not in those terms, but to get across the
>>> idea of the graph syntax being a level of abstraction higher than
>>> the particular notation used to describe it.
>>
>> Good point. Included the following sentence in the first
>> paragraph: [[ However, different encodings of the same graph lead
>> to exactly the same triples. ]] I also suggest to include a graph
>> diagram of the current example, and clarify the point about the
>> abstract graph (added as a todo issue to the document).
>
> If you are on a Mac, Omnigraffle generates beautiful diagrams.
>
> Pat
>
>
>>
>>> Having one simple but not entirely trivial example graph (with at
>>> least one bnode, at least two triples sharing a common object and
>>> one node used as both a subject and an object) written out in all
>>> the different notations would be a very useful thing to see. It
>>> would also hammer home the point about abstract graph syntax,
>>> especially if you also provided a graph diagram for it.
>>
>> The current example has all these features, except for the bnode.
>> I’ll add an issue about including a (separate?) example with
>> bnodes.

Yves and I discussed this a bit more. We think introducing a bnode in 
the overall example would make the example too complex. Instead we added 
the Cypress example as a separate example to Sec. 5.1.

Also, although bnodes occur conceptually in many RDF sources, in 
practice they're typically skolemized in some way (cf. DBpedia). 
Therefore, the actual syntax of bnodes is less important. It might be 
worthwhile to include a note about skolezmized URIs (without using the 
term) in the RDF Data section of the Primer.

Guus

>>
>>> " therefore bringing the benefits of RDF to the JSON world. "
>>> Omit. Could be read as condescending. I am sure there are many
>>> who would say, it brings JSON sanity to the RDF world.
>>
>> Deleted/rephrased.
>>
>>
>>
>>
>>
>>
>>
>>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 home 40 South Alcaniz St.            (850)202 4416
> office Pensacola                            (850)202 4440   fax FL
> 32502                              (850)291 0667   mobile
> (preferred) phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

Received on Friday, 29 November 2013 12:56:52 UTC