review comments on rdf-syntax-grammar (version of 25 Mar)

Aside: I'm sending this from a Java SSH applet in a hotel, so apologies
for the lack of detailed URIs into the document version I'm reviewing.

This is mostly terminology/wording. In a few cases I succeed in proposing
specific edits, in others I don't (yet?) have suggested improvement text.

See (6) below for the only new idea and the only major concern about the
document's technical content. I don't consider either to be a showstopper
issue, but busy readers might skim directly to (6) for further discussion.


I am looking at http://ilrt.org/discovery/2001/07/rdf-syntax-grammar/ on
Sunday March 24th. The WD-draft is currently dated 25 March; no CVS
version number obviously visible (Dave, could you add $Id$ to the <h2/>
while drafts are in preparation? (perhaps with a ptr to the ILRT cvsweb
interface so we can see log comments, compare versions etc?).

Comments:
--------

I took a brief look at this in the week, while planning the RDFS
reshuffle. I stick by my initial reaction: very nice work :)

Some detailed comments:

1) Wordsmithing the Abstract (and terminology issues)

In the abstract, we say

 'This W3c Working Draft introduces and defines the XML syntax...'

I would leave document status / maturity info to the 'SOTD' section.
Instead we could say

 'This specification defines an XML syntax for RDF, as amended (etc)'

 - dropping 'introduces' as this contradicts the claim that M+S'99
introduced this syntax

 - s/the XML syntax/an XML syntax/
 (we can't say often enough that RDF can be written in XML in various
ways; referencing the Cambridge Communique, WebData and XLink-harvesting
notes might help drum that point home..). Later in the doc it says 'An',
not 'The', so this is a consistency fix.

 - s/creating RDF models/creating RDF graphs/
 We're going to have to be careful with inter-spec terminology. Your
 language here seems to have been largely obsoleted by the MT, which uses
 'model' in the logicians sense, rather than the two(!) previous senses in
 which  RDF employed the term.
 We *used* to use 'model' as a noun corresponding roughly to 'RDF
description of' (as in the practice of modeling, eg. a lego model).
 We *also* used it to refer to RDF's basic architecture, the 'graph data
 model', ie. triples as an encoding for 'statements' about the properties
 of resources.

In short, various places you say 'model' might be safer as 'graph'

Aside: we have a similar problem with the MT and the Schema specs at
least; they're in conflict about the use of the term 'vocabulary'. I plan
to suggest the MT adopt a less useful word and free up 'vocabulary' for
talking classes, properties etc. per current Schema usage. @@TODO



2) Section (2), An XML syntax...

The 3rd paragraph tells us about rdf:Description. IMHO rdf:Description is
a redundant historical relic, since we can always write rdfs:Resource
there instead and have a typed node instead of mere encoding syntax.
This is perhaps a matter of taste, and not a showstopper for publication.
If we could introduce rdf:Description as a deviation from the general
typed node / striped pattern, things might be a little easier for
implementors. The rdf:Description construct adds nothing to the language
except another syntactic variation. I wonder whether we could more
explicitly encourage the use of typed nodes (minimal case: rdfs:Resource).
Or would this be considered a fwd reference to the Schema spec and hence
problematic? Perhaps not, if both (a) we get them to REC at same time, (b)
we decide -- say in PR period -- that implementors like the new specs and
would much prefer a single consolidated 'rdfcore' namespace for
core classes and properties.

Syntax / striping examples:
Using CSS to style the property elements and the node elements differently
might be useful.

3) in (3) the RDF namespace

...you list the classes, properties and syntactic gizmos from the rdf:
namespace. Would it be worth separating them out, or at least separating
the purely syntactic constructs?

Your 'implementors note' regarding the removal of names from the language
(and namespace) may come across as rather casual.  Presumably implementors
in this sense include content authors and tool authors. If we are telling
them that docs they wrote to the '99 W3C REC are no longer legal RDF, we
should be a little more cautious/humble.

Suggest: 'In this Working Draft, the RDF Core WG propose the removal of
xyz from the RDF namespace. Feedback from tool and content authors is
particularly sought on this point, and on the costs and benefits of
adopting a new namespace URI to reflect this change. In the current
draft, we simply omit these constructs from our account of the RDF
namespace.'.

(hmmm, will we edit the RDF doc that's available at the ns URI?)

(sincere apologies if I am re-visiting old territory here. I am doing my
best  to catch up with RDF Core work, but I have missed fair chunks of
discussion.)


4) in (3.5, Identifiers)

First sentence doesn't quite work. '3 types of identifiers (or labels)',
ie. absolute urirefs, literals, unlabelled/blank nodes.

Suggest:
RDF graphs are structured using three mechanisms for representing the
identity of resources and literal values. These are: <mumble>

OK, I can't think of better wording so withdraw my quibble. However
unlabelled/blank nodes don't seem to be a 'type of identifier or label',
more an 'identification mechanism'. Tricky to find words for this. Maybe
the MT spec has something?


5) the word 'reify' is introduced deep in section 5.5 without explanation.
Some gloss along lines of 'describing using RDF' might work.
I have no concrete suggestions here.

6) Serializing to RDF/XML

Style: second sentence ('if you do...') is very casual, chatty, but
carries a very important technical point.  Q: can we take the notion of
'round trip' as a concept everyone will understand, or is it too
colloquial for a W3C spec? (we might seek unput from I18N folk on this).
Suggested alternate wording: <mumble> we might talk about 'information
loss'.


Substantial concern:

Basic serialization strategy: 'all blank nodes are assigned arbitrary URIs'.

 - on my understanding of our decisions w.r.t. blank nodes, this would
result in information loss. Just as we decided not to autogenerate uriref
node labels for bNodes in the graph, we we need to preserve that (lack of)
information on re-serialization. DanC recently circulated a good brief
explanation of this (@@todo, find msg).

I'm not sure what to suggest as an immediate fix. We might add:

'editorial note: this serialization strategy loses information,
specifically it destroys any trace of the distinction between bNodes and
uriref-labelled nodes in any RDF graph serialized to XML/RDF following
these rules. Future revisions of this specification may provide revised
rules that preserve the bNode / labeled node distinction'.


also in the serialization section....

My one (hopefully) new idea:

I've been thinking about strategies for dealing with 'unserializable'
graphs. These are (AFAIK) all (or nearly all?? would be good to be clear)
to do with having the edges in some RDF graph be labelled with a URIref
that doesn't split conveniently into  namespace name and local name.

(i) (as an aside) we should note that the rdfs:isDefinedBy property can be
used to represent in RDF the relationship between a property and its 'home'
namespace. When available, this information should be used in preference
to  URI-syntax-scraping.

(ii) When a property URIref is not suitably serializable, an alternative
to exception/error throwing that MAY often be useful is for the RDF/XML
parser to describe problem triples in terms of a previously unknown
sub-property of the annoyingly named Property:

Example:

<http://www.w3.org/TR/soap12-part1/>  <http://example.com/annoying-ns!>  "Etc." .

Since our predicate ends in '!', we can't serialize this as-is.

However we could emit something like:
<http://www.w3.org/TR/soap12-part1/>  <uuid:123321123321123321/genprop>   "Etc." .

...accompanied by adjunct triples describing how the property used is a
sub-property of the annoying one:

 <uuid:123321123321123321/ns1>  <http://www.w3.org/2000/01/rdf-schema#subPropertyOf> <http://example.com/annoying-ns!>   .

In RDF/XML this would be

<rdf:Property rdf:about="uuid:123321123321123321/genprop">
<rdfs:subPropertyOf rdf:resource="http://example.com/annoying-ns!"/>
</rdf:Property>
<rdf:Description rdf:about="http://www.w3.org/TR/soap12-part1/">
 <g1:genprop xmlns:g1="uuid:123321123321123321/">Etc.</g1:genprop>
</rdf:Description>

Notes:

 - this is something of a hack, and perhaps belongs in a Developers  'hints
   and tips' FAQ rather than the core Syntax spec.
 - it does suggest that all RDF graphs may be XML-serializable, but at the
   cost that the generated XML may contain triples that weren't in the
   original data
 - it provides no guidance on generation of URIrefs for the generated
   properties
 - it leaves no hint in the output RDF/XML that this augmentation happened
(ie. subsequent parsers might want to reverse the process. Using a
different property than rdfs:subPropertyOf would be one complication to
consider. But I'm not sure complication is what we need right now.)

Suggestion:
 - add this technique to the Primer or RDF FAQ and put a reference in the
syntax spec to give folk some option other than warning/exception
throwing.

Received on Sunday, 24 March 2002 07:11:51 UTC