Re: review of RDF Concepts

Hi Peter,

Thanks for this comprehensive review, which raises many excellent points.

See below for details on how I addressed them.

There are a few points where further discussion may be required, but I believe that all of them are marked with issue boxes in the text, and therefore can be handled after publication of this WD. I've therefore gone ahead with the publication request.

On 28 Nov 2012, at 18:45, Peter Patel-Schneider wrote:
> I have a few changes to wording to correct what I see as errors.  
> 
> Something needs to be done to fix the situation with respect to blank nodes,
> but the issue note already warns about this so there is no need to hold up
> publication right now.
> 
> I think that something needs to be done about social meaning.  I have
> suggestions below.  As the sections involved at non-normative, I don't think
> that publication needs to be held up until the problem is resolved.
> 
> 
> 
> Minor grammar changes:
> 
> 0.1/ Be consistent with commas before "being".

Done, by placing commas.

> 0.2/ string, numbers -> strings, numbers

Fixed.

> 0.3/ "semantics" should be reserved for the model-theoretic semantics.  Use
> "meaning" instead in other situations.

The only instances of "semantics" that don't refer to the model-theoretic semantics are in Section 6, which is all about the "semantics of fragment identifiers". And that use of the word "semantics" is consistent with many other relevant documents, see for example:

http://tools.ietf.org/html/rfc3986#section-3.5
http://www.w3.org/TR/fragid-best-practices/

Talking about the "meaning of fragment identifiers" would be a net loss in clarity. I don't think we can reasonably reserve the word "semantics" to only refer to the model-theoretic semantics of RDF.

> 0.4/ "should" should be avoided except when it is "SHOULD".

I agree that "should" should be avoided in places where ambiguity between "SHOULD" and "should" could arise. However, this isn't the case anywhere in RDF Concepts, because all instances of "should" are in informative text (notes or introduction, where SHOULD doesn't make sense), and all these informative uses of "should" do indeed represent a "SHOULD" that is normatively stated in some other specification.

That being said, I've removed a handful of instances of "should" where it was easy.

> Significant changes:
> 
> A/ I worry about Section 1.3 and the last bit of Section 6.  It appears to
> me that this is edging back towards social meaning, which was ripped out of
> RDF the last time around.

I don't know anything about this "social meaning" that apparently was ripped out last time.

I realize that Section 1.3 has to tread a fine line, it is trying to be helpful to practitioners while avoiding any tight coupling between the various bits of web architecture involved here.

With Section 6 however I don't see a problem, I think everything in there was either already present in the 2004 Recommendation, or is backed up by RFC 3986 Section 3.5 or the fragid-best-practices TAG finding. There is nothing new there really.

> Initial suggestion for Section 1.3:
> 
> 1.3 The Referent of an IRI
> 
> The resource denoted by an IRI is also called its referent. What exactly is
> denoted by any given IRI is not defined by this specification. 
> 
> Basic guidelines for determining the referent of an IRI are provided in
> other documents, like Architecture of the World Wide Web, Volume One
> [WEBARCH] and Cool URIs for the Semantic Web [COOLURIS]. A very brief,
> informal and partial account of these guidelines follows:
> - IRIs have global scope: An IRI is assumed to denote the same resource
>   regardless of where the IRI occurs. 

Up till here, you have only done minor rewording, which I have partially adopted.

You have removed the following part:

[[
By social convention, the IRI owner [WEBARCH] gets to say what an IRI denotes. They do this when “minting” a new IRI.
]]

The reference for this statement is [WEBARCH], which says:

[[
URI ownership is a relation between a URI and a social entity, such as a person, organization, or specification. URI ownership gives the relevant social entity certain rights, including […] to associate a resource with an owned URI
]]

I don't see any reason not to include a statement to that effect.

I don't have a good normative reference for the term “minting”, but it's a common term of art in the RDF world and I think its inclusion here makes sense.

> - By social convention, The IRI owner [WEBARCH] provides the 
>   can establish the intended referent by means of a
>   specification or other document that explains what is denoted. 
>   <<RDF-SCHEMA is special here, FOAF would be a much better example>>
>   For example, [RDF-SCHEMA] specifies the referents of various IRIs that
>   start with http://www.w3.org/2000/01/rdf-schema#. 

I agree that the choice of RDFS as example can be confusing. I have replaced the example. It now uses the W3C Organization Ontology, which is a REC-track W3C document:
http://www.w3.org/TR/2012/WD-vocab-org-20121023/

> - A good way of providing the intended referent is to set
>   up the IRI so that it dereferences [WEBARCH] to a document.  Such a
>   document can, in fact, be an RDF document that describes the denoted
>   resource by means of RDF statements.  

You replaced "communicating the intended referent to the world" with "providing the intended referent". This change would make the sentence less accurate, because the referent can be a physical entity or some other thing that is certainly not "provided" by making a document available. I have however removed the unnecessary phrase "to the world".

> Suggestion for the last bit of Section 6:
> 
> It is a good idea to, whereever reasonable, set up fragment identifiers in
> RDF-bearing representations in a way that is consistent with  
> non-RDF representations. For example, if the fragment chapter1 identifies a
> document section in an HTML representation of the primary resource, then the
> IRI <#chapter1> should be taken to denote that same section in all
> RDF-bearing representations of the same primary resource.

Well, this looks acceptable to me, but I don't understand what problem in the current text it is trying to address. 

> B/ Something needs to be done to Section 3.4.
> 
> There should be some wording here to indicate that blank nodes can be shared
> between RDF graphs, but that simply reusing a blank node identifier between
> to unrelated graphs results in different blank nodes.  There is more work
> needed here as well.

I agree, and intend to improve the section further after this working draft is out, as indicated in the issue box.

> Other changes:
> 
> 1/ Change [required]:
> 
> This document defines an abstract syntax (a data model) which serves to link
> all RDF-based languages and specifications, including: 
> - Serialization syntaxes for storing and exchanging RDF (e.g., Turtle
>   [TURTLE-TR] and RDF/XML [RDF-SYNTAX-GRAMMAR]), 
> - the SPARQL Query Language [RDF-SPARQL-QUERY],
> - the RDF Vocabulary Description Language [RDF-SCHEMA],
> - a formal model-theoretic semantics for RDF [RDF-MT].
> 
> to:
> 
> This document defines an abstract syntax (a data model) for RDF.  Concepts
> defined in this document are vital to understanding any aspect of RDF,
> and support
> - serialization syntaxes for storing and exchanging RDF (e.g., Turtle
>   [TURTLE-TR] and RDF/XML [RDF-SYNTAX-GRAMMAR]), 
> - the RDF Vocabulary Description Language (RDFS) [RDF-SCHEMA],
> - the formal model-theoretic semantics for RDF and RDFS [RDF-MT], and
> - the SPARQL Query Language [RDF-SPARQL-QUERY].

I have partially adopted this rephrasing.

I have changed "a semantics" to "the semantics".

I have changed "semantics for RDF" to "semantics for RDF and RDFS".

I have however retained the "serves to link" phrasing, which was already present in RDF 2004, and the order of the four bullet points.

> *********************
> 
> 2/ Change [strongly recommended]:
> 
> The core structure of the abstract syntax is a collection of triples, each
> consisting of a subject, a predicate and an object. A set of such triples is
> called an RDF graph. This can be illustrated by a node and directed-arc
> diagram, in which each triple is represented as a node-arc-node link; hence
> the term “graph”. 
> 
> to:
> 
> The core structure of the abstract syntax for RDF is a set of triples, each
> consisting of a subject, a predicate and an object. A set of such triples is
> called an RDF graph. An RDF graph can be visualized as a node and
> directed-arc diagram, in which each triple is represented as a node-arc-node
> link.

Okay, changed.

> *********************
> 
> 3/ Change [picky wording change]:
> 
> There may be three kinds of nodes in an RDF graph: IRIs, literals, and blank
> nodes. 
> 
> to:
> 
> There can be three kinds of nodes in an RDF graph: IRIs, literals, and blank
> nodes. 

Okay, changed.

> *********************
> 
> 4/ Change [wording change]:
> 
> Any IRI and literal denotes
> 
> to:
> 
> Any IRI or literal denotes

Okay, changed.

> *********************
> 
> 5/ Change [very picky wording change]:
> 
> holds between the resources denoted by the subject and object.
> 
> to:
> 
> holds from the resource denoted by the subject to
> the resource denoted by the object.

I agree that this rephrasing would be more accurate, but I'd rather not make the sentence any more complicated, so have left the current version.

> *********************
> 
> 6/ Change [strongly recommended - there is no need to have referents to define
> a vocabulary]:
> 
> An RDF vocabulary is a collection of IRIs with clearly established referents
> intended for use in RDF graphs. 
> 
> to:
> 
> An RDF vocabulary is a collection of IRIs intended for use in RDF.

A vocabulary without established referents is useless. The question whether such a collection of IRIs is *technically* a vocabulary anyway seems rather pointless, so I've left it as is.

> *********************
> 
> 7/ Change [strongly recommended]:
> 
> An RDF dataset is a collection of RDF graphs. All but one are named graphs
> associated with an IRI. The last one is the unnamed default graph, and is
> often used to hold triples that involve the graph names. 
> 
> A common use of RDF datasets is to hold snapshots of multiple RDF sources.
> 
> to:
> 
> An RDF dataset is a collection of RDF graphs.  All but one of these graphs
> have an associated IRI.  These graphs are called named graphs, and the IRI
> is called the name of the named graph.  The remaining graph does not have an
> associated IRI.  It is called the default graph of the dataset.
> 
> There are many possible uses for RDF datasets.  One such use is to hold
> snapshots of multiple RDF sources.   It is common to have the default graph
> contain triples that involve the names of the other graphs in the dataset. 

I've adopted a minimally shortened version of this.

[[
An RDF dataset is a collection of RDF graphs. All but one of these graphs have an associated IRI. They are called named graphs, and the IRI is called the graph name. The remaining graph does not have an associated IRI, and is called the default graph of the RDF dataset.

There are many possible uses for RDF datasets. One such use is to hold snapshots of multiple RDF sources. It is common to have the default graph contain triples that involve the graph names of the other graphs in the dataset.
]]

> *********************
> 
> 8/ Change [required - don't dump on the semantics]:

I don't know how the current phrasing "dumps on the semantics".

> This
> treatment of RDF graphs as logical expressions is normatively defined in the
> RDF Semantics specification [RDF-MT], using the formalism of Model
> Theory. 
> 
> to:
> 
> The logical meaning of RDF graphs is normatively defined in the
> RDF Semantics specification [RDF-MT], using a model-theoretic semantics. 

I disagree. The 2004 documents are a tautological nightmare in this regard, and I've worked quite hard to unravel that. RDF Semantics doesn't "define the meaning" of RDF graphs. RDF Semantics does *exactly* what Section 1.7 says: It treats RDF triples as logical expressions, and thereby gives rise to a number of useful properties and relationships of RDF graphs, most notably consistency, entailment and equivalence. It does so by employing the formalism of Model Theory. It also defines, somewhat implicitly, an extension mechanism that today is usually called "entailment regimes". I think that spelling these things out in Section 1.7 is a *major* improvement over the tautological notion that RDF Semantics defines the "meaning" of RDF graphs. The "meaning" of RDF graphs arises from an ever-changing hodgepodge of technology, logic and social conventions.

I've changed "the formalism of Model Theory" to "a model-theoretic semantics" although I don't see how this is an improvement.

> *********************
> 
> 9/ Change [grammar and clarification]:
> 
> An RDF graph A entail another RDF graph B if every possible arrangement of
> things in the world that makes A true also makes B true. If the truth of A
> is presumed or demonstrated, then the truth of B can be inferred. 
> 
> to
> 
> An RDF graph A entails another RDF graph B if every possible arrangement of
> the world that makes A true also makes B true. When A entails B, if the
> truth of A is presumed or demonstrated then the truth of B is established.

Okay, changed.

> *********************
> 
> 10/ Change [grammar]:
> 
> A concrete RDF syntaxes
> 
> to:
> 
> Concrete RDF syntaxes

Fixed.

> *********************
> 
> 11/ Change [be cleaner about RDF meaning]:
> 
> semantics, which lies exclusively in the encoded graph or dataset.
> 
> to:
> 
> meaning, which is exclusively mediated by the encoded graph or dataset.

The use of "mediated" here is rather obscure... I hope this works for you:

[[
While these aspects can have great effect on the convenience of working with the RDF document, they are not significant for its meaning.
]]

In analogy to significant vs. insignificant white space.

> *********************
> 
> 12/ Change [required, is currently false]:
> 
> This transformation does not change the meaning of an RDF graph, provided
> that the Skolem IRIs do not occur anywhere else. It does however permit the
> possibility of other graphs subsequently using the IRI to also refer to the
> same entity, which was not possible when the node was blank.
> 
> to:
> 
> This transformation does not appreciably change the meaning of an RDF graph,
> provided that the Skolem IRIs do not occur anywhere else. It does however
> permit the possibility of other graphs subsequently using the Skolem IRIs,
> which is not possible for blank nodes.

Okay. This paragraph has been word-smithed many times -- I'll just hope everybody can live with this.

> *********************
> 
> 13/ Change [garden path grammar]:
> 
> An RDF Dataset is a collection of RDF graphs and comprises:
> 
> to:
> 
> An RDF Dataset is a collection of RDF graphs, and comprises:

Fixed.

> *********************
> 
> 14/ Change [grammar and false?]:
> 
> RDF re-uses the XML Schema built-in datatypes
> 
> to:
> 
> RDF uses many XML Schema datatypes

Not quite sure what the problem is. "Built-in datatypes" is a technical term in XML Schema:
http://www.w3.org/TR/xmlschema11-2/#built-in-datatypes

I'm adding "many of", as indeed not all XML Schema built-ins are appropriate for RDF:

[[
RDF re-uses many of the XML Schema built-in datatypes
]]

> *********************
> 
> 15/ Change [currently different from definition in RDF semantics]:
> 
> A datatype map is an implementation-defined set of <IRI, datatype> pairs
> such that no IRI appears twice in the set and the IRI denotes the
> datatype. It can be seen as a function from IRIs to datatypes.
> 
> to:
> 
> A datatype map is an implementation-defined set of <IRI, datatype> pairs
> such that no IRI appears twice in the set. It can be seen as a function from
> IRIs to datatypes.

The definition of "datatype map" is to be removed from RDF Semantics. This implies no other change to RDF Semantics  -- D-Entailment will still make it so that the IRI denotes the datatype, while the other entailment regimes are silent on the issue. The general statement from the Concepts introduction of course still applies: "What exactly is denoted by any given IRI is not defined by this specification. Guidelines for determining the referent of an IRI are provided in other documents..."

How about this:

[[
A datatype map is an implementation-defined set of <IRI, datatype> pairs such that no IRI appears twice in the set. It can be seen as a function from IRIs to datatypes, where the IRIs denote the datatypes.
]]

I'd rather not be completely silent on the relationship between IRIs and datatypes in the definition of datatype maps. The association is not arbitrary. Interoperability requires that it is consistent across implementations. The notion of denotation encourages this consistency, as outlined in subsection 1.3.

> *********************
> 
> 16/ Change [false]:
> 
> Otherwise, the literal is ill-typed, and no literal value can be associated
> with the literal. Such a case, while in error, is not syntactically
> ill-formed.
> 
> to:
> 
> Otherwise, the literal is ill-typed, and no literal value can be associated
> with the literal. Ill-typed literals are not syntactically ill-formed and, while
> ill-typed literals are not normal, just their use does not make an RDF graph
> inconsistent.

That sentence is straight from RDF Concepts 2004. As noted in the document, there's an open issue on this, so this will likely still change in some way.

Thanks again!
Richard

Received on Monday, 14 January 2013 11:49:29 UTC