Re: review of LCC documents as of 26 December 2002 from Frank Manola on 2003-01-19 (www-rdf-comments@w3.org from January to March 2003)

From: Frank Manola <fmanola@mitre.org>
Date: Sun, 19 Jan 2003 17:36:22 -0500
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
CC: www-rdf-comments@w3.org
Message-ID: <3E2B2866.1080300@mitre.org>
Peter--

As I said in my earlier message, thanks very much for these comments. 
Detailed responses follow below.

--Frank

Peter F. Patel-Schneider wrote:

> Integrated Review of the RDF Core WG LCC Documents (as of 26 December 2002)
> 
> 
> This review is the result of reading the RDF Core WG LCC Documents as they
> existed on 26 December 2002.  I read the documents in the order they were
> listed on the RDF Core WG page - Primer, Concepts, Syntax, Schema,
> Semantics, and Test.  All except Schema were listed as last call candidate
> versions.
> 
snip
> 
> RDF Primer (Editor's Draft of 17 December 2002)
> 
> Summary:
> 
> A Primer need not be comprehensive, so remove the portions that deal with
> difficult RDF stuff.  A Primer needs to be more than correct, so fix the
> errors and misleading portions, particularly about the use of URI
> references and the meaning of RDF. 
> 
> Section 1:
> 
> The RDF Primer starts with a serious organizational mistake.  It repeats
> much of the abstract verbatim in its first two paragraphs.  This leads one
> to consider just how much care has gone into the rest of the Primer. 


I see your point (actually, the situation used to be worse!)  I'll have 
a crack at fixing this.


> 
> The first section of the Primer is very confused as to what RDF actually
> is.  It states ``RDF is a language'' but then goes on to also state that
> ``RDF also provides an XML-based syntax''.  How can a language provide
> anything? 


I'm not sure I see the problem (at least, not one that's as stark as 
your extracts above suggest).  The Primer states that RDF is a language 
for representing information, then shows how statements are represented 
as graphs.  It goes on to say that RDF provides an XML-based syntax "for 
recording and exchanging these graphs".  Is the problem that the graphs 
aren't also identified as a language?  What do you suggest?


> 
> The Primer states that RDF is based on the idea of representing things
> using URIs, but the very next example uses non-URIs (literals).   The text
> after the example admits the use of non-URIs, but in a clunky and confusing
> fashion. The first also example doesn't mean what the accompanying text
> says because it uses a URI where a blank node is called for. 


The Primer says that RDF is based on the idea of identifying things 
using URIs, *and describing resources in terms of simple properties and 
property values*.  I don't think people are going to read this as saying 
that *only* URIs can be used, and so be put off by the following example 
(in fact, it's the use of URIs to identify things other than Web pages 
that I expect many of the readers of the Primer need to be introduced 
to;  they will naturally expect literals to be used to identify things). 
  Where do you think the confusion will arise?

Regarding the use of a URI where a blank node is called for, is the 
problem that the URI is not explicitly mentioned in the accompanying 
text (i.e., the text should say something like "there is someone who is 
identified by the URI...."?)


> 
> The Primer uses the word simple several times, but fails to give any
> support to the use of the word.  


I'm not sure I see why use of this word needs explicit support.


> 
> The role of the Primer is stated in a rather silly fashion.  If the Primer
> is supposed to augment the other documents, then how can it be a Primer?
> It would have been much better to state that the Primer provides a
> non-normative introduction and/or high-level description of  parts of RDF. 


The Primer augments the other documents by providing a non-normative 
introduction to RDF, and by describing some RDF applications, things 
that are not done in those other documents.  Right?  I didn't claim that 
it augmented the actual specification of RDF (which is the purview of 
the normative documents).  However, I'll remove "augment" and simply 
describe what the Primer does.


> 
> Section 2:
> 
> If the Primer is supposed to be an introductory document, then
> constructions like ``simple way to state properties (make assertions
> about)'' are not very helpful.  An introductory document should not have
> such parenthetical remarks.  Similarly an introductory document should have
> simple, and correct, sentence constructions, so turning a complete sentence
> into a parenthetical sentence fragment is bad on two counts.


How about "a simple way to make statements about Web resources"?


> 
> The first extended example in this section also has serious problems.
> First, the text description does not match the boxed sentences.  The text
> description talks about the name of the creator, whereas the first boxed
> sentence does not.  This is a particularly bad mistake to use because of
> the similar problems with the Dublin Core use of  RDF.  This example also
> is confusing because the boxed sentences use URIs for subjects, but
> non-URIs for predicates.  If the boxed sentences are supposed to evoke RDF
> triples, and it sure looks as if they are, then they should at least evoke
> valid RDF triples.  It would be better to not talk about web pages in the
> examples, so this sort of problem would not occur.  


The boxed sentence is supposed to be a sentence in (admittedly stilted) 
English that is similar in form to an RDF triple.  But we haven't got to 
RDF yet.  Is the problem that it isn't sufficiently clear that this is 
not RDF?  The sentences used URIs for subjects because we're introducing 
the idea of identifying things by URIs, and Web pages are things many of 
the readers are already used to seeing identified by URIs.  The 
predicate is an ordinary word because people don't use URIs to identify 
properties in English sentences.  The use of URIs to identify properties 
gets introduced later.


> 
> Not using web pages in the early example would also not give rise to the
> problems related to how to dereference a URI reference in RDF.  This
> problem is pervasive in much of the non-formal document on RDF.  The
> general problem is that the relationship between the RDF meaning of a URI
> reference and the WWW meaning of a URI reference is very poorly determined
> for URI references that actually have a well-defined WWW meaning.  For
> example, http://www.w3.org/People/EM/contact#me (probably) references an
> actual portion of an actual document.  How then can RDF treat URI reference
> as referring to the person Eric Miller?  Because the Primer has examples
> that have this problem, it needs to address it. 


I agree that this is an important issue, and I think the Primer *does* 
address it, both in Section 2.3, and in Appendix A.  The problem isn't 
whether to address the issue, but rather in what order to address this 
issue vs. other things (and whether this particular issue needs to be 
thoroughly dealt with before introducing various other things).  It 
seems to me that the problem would arise anyway (at least when http URIs 
are used for non-Web-pages), because many readers would naturally wonder 
what Web-retrievable resource was referred to by the URI for, say, a 
person.  Earlier versions of the Primer had a longer section dealing 
with URIs.  Much of that material got moved to Appendix A due to 
complaints that it interrupted the flow.  Do you have a suggestion? 
Perhaps a few sentences from Appendix A pointing the problem out, and 
referring to the Appendix for further discussion?


> 
> The Primer is not consistent with the other RDF documents.  It states that
> a resource is anything that is identifiable by a URI reference.  However,
> the model theory for RDF does not make this extraordinarily strong
> assumption, only requiring that URI references denote resources.  The
> Primer also states that URI references can identify ``practically
> anything'', a laughable claim to anyone who understands the difference
> between countable and uncountable sets. 


I'm sorry, I don't understand the problem you're driving at here.  Can 
you clarify?  If I can identify something by a URIref, surely I can 
consider it a resource?  Saying that URI refs can identify "practically 
anything" isn't the same as saying they can identify "anything".  Do you 
think it would help to make the caveat about countable vs uncountable 
sets (possibly among other examples) at this point?  (We can certainly 
identify an uncountable set by a URIref, even if we can't come up with 
URIrefs for all the members, right?).


> 
> In Section 2.2, the problem with referring to web documents resurfaces.
> The Primer states that it is ``introducing'' the URIref of the web
> document.  However, that URIref was already in the statement, and so cannot
> be considered to be introduced here.  


Right.  Would substituting "used" address the problem sufficiently?


> 
> Although the Primer is non-authoritative, it should refrain from making
> false statements, such as that the subject and object of an RDF statement
> are labeled with URIrefs.  Only some subjects and objects are labeled with
> URIrefs.  The Primer goes on to immediately use RDF statements whose
> subjects are literals, in direct violation of the claim it makes just
> above. 


OK, I see the problem.  The paragraph makes a general comment about the 
way RDF does things, but it only literally applies to this specific 
example (Figure 2), and omits the use of literals and blank nodes.  I'll 
fix this.


> 
> The example in Section 2.2, inappropriately glosses over the issue of
> whether to turn the object of a statement into a URIref or a literal.  It
> is inappropriate to gloss over this issue because the example does both,
> without any explanation at all. 


I can see that something should be said about this, but I doubt I'll be 
able to discuss this in any depth at this point in the Primer (it would 
probably take another section or appendix to do justice to it).


> 
> The claim that RDF graphs are labeled directed graphs is not very helpful,
> as labeled directed graphs generally allow only one arc between two nodes.
> If the Primer wants to allude to graphs, it should be more precise. 


Right.  I'll take the "labeled directed" qualification out.  It doesn't 
add anything, and a full technical explanation of what kind of graphs 
they are doesn't belong in the Primer.


> 
> The Primer starts the unfortunate blurring between RDF, a simple formalism,
> and the entirely of human understanding in its talk about knowing the
> ``exactly what is meant by'' http://purl.org/dc/elements/1.1/creator.  It
> would be much better to avoid anything in the Primer that even hints that
> an  RDF processor will be able (or, worse, required) to understand exactly
> what is meant by such things, as their meaning includes a gigantic portion
> that is outside of RDF. 


When I referred to a program "that understands 
http://purl.org/dc/elements/1.1/creator" I was thinking in terms of a 
program *written* to understand that particular term (or written to 
behave according to that term's definition when it encountered it) 
rather than a generic RDF processor that somehow sucked in that 
"understanding".  But I see how the problem you mention can arise.  I'll 
try to make that clearer (I think it's still necessary to mention 
"programs", but I agree that the limitations of what "understanding" RDF 
provides to those programs needs to be clarified).


> 
> Similarly, the Primer should stay far away from hinting that RDF can be a
> unifying model for formal logic.  Although all that is actually claimed is
> that RDF is similar to simple statements in formal logic, the overall tone
> of the paragraph could easily give rise to a false expectation. 


I wasn't trying to hint that, but I see what you mean.  How about 
"allowing RDF to be used to integrate data from many sources", 
eliminating the comment about "unifying model"?


> 
> Section 2.3 finally gets into the issue of whether something is a literal
> or not.  This would be fine if the earlier exposition had not set the stage
> so misleadingly.  Similarly, Section 2.3 finally mentions the problem of
> using the wrong URIref to refer to a non-web-accessible resource.  Again,
> however, the previous parts of the Primer have already given the wrong
> impression about this important issue. 


I think the order in which the Primer has tried to introduce things is 
fairly reasonable, and I don't think anyone will be terribly misled. 
However, I also think that if the changes you've suggested already are 
made, both of these issues will be brought up earlier anyway.


> 
> Section 2.4 misses a golden opportunity to clear the air about typed and
> untyped literals.   It should have drawn attention to the fact that the
> untyped example makes ages be strings, not numbers, and that treating these
> strings as numbers is a risky business, depending as it does on an
> outside-of-RDF interpretation of RDF information that undermines the
> utility of RDF as a neutral mechanism for information exchange. 


Good point.  I'll try to amplify on this.


> 
> Emphasizing and reemphasizing that datatypes are an extension to RDF begs
> the question of whether RDFS is an optional extension or part of RDF.  If
> the Primer is going to be so picky about datatypes, then it should be
> similarly picky about RDFS. 


I've raised this issue myself.  I agree this should be very clear (the 
Schema section is the place to do that, if it isn't clear enough already).


> 
> Section 3:
> 
> RDF has no notion of the source of terms.  As far as it is concerned a
> URIref is an atomic entity with absolutely no internal structure.  That
> being the case, any statement that even hints at the contrary must be
> ruthlessly expunged from the RDF documents.  In particular, stating that a
> namespace declaration provides the source of a term is not a part of RDF ?
> namespace declarations are nothing more than an abbreviation mechanism for
> XML/RDF. 


Agree.  I'll try to make the appropriate changes.


> 
> It is somewhat misleading to describe the abbreviated syntax forms as
> abbreviations.  Instead they are really alternative syntax forms that have
> their own direct mapping to triples, they just happen to be shorter than
> the standard syntax form.  If they were abbreviations, then their meaning
> should instead be given by a source-to-source translation. 


They are referred to as abbreviations in both the Syntax spec and in 
M&S.  Also, I doubt that Primer readers will necessarily catch the 
distinction you have in mind even if we don't use "abbreviations".


> 
> One would think that a Primer would provide exemplary examples.
> Unfortunately, this primer does not, instead ignoring opportunities to
> effectively use typed literals.  The Primer even makes this into somewhat
> of a joke, asserting that ``typed literals from appropriate datatypes, such
> as XML Schema datatypes, can always be used instead.'' 


I can certainly try to use more typed literals in the examples.


> 
> The Primer makes an interesting point when it states that ``RDF doesn't
> specify or control how URIrefs are assigned to resources''.  This point is
> a foundation of the RDF model theory.  However, the Primer and many of the
> other RDF documents both undermine this point, by evoking an
> external-to-RDF mapping between URIrefs and resources, and completely
> contradict it, by using a URI to identify the web document that it
> references.  The Primer makes another interesting point when it states that
> ``anyone should be able say anything they want about existing resources''.
> However, other documents about RDF attempt to limit the ability to make
> statements about existing resources in various ways or limit the ability to
> use certain URIrefs. 


Can you clarify your reference to an "external-to-RDF mapping"?  Is this 
a reference to the overloading of URIs that, e.g., identify both a 
person and a web page?


> 
> Section 3 is supposed to be about XML/RDF.  However, it introduces
> rdf:type.  It would be very much better if this important concept were not
> introduced inside the section on XML/RDF.  (The forward reference
> concerning rdf:type in Section 2 is also a bad idea.) 


Do you believe rdf:type should not be introduced until Section 5?  It's 
introduced earlier because it's part of RDF (rather than RDFS), even 
though RDF doesn't have classes.


> 
> Section 4: 
> 
> A Primer need not be complete.  Being that this is the case, there is no
> place for the difficult-to-understand or insufficiently-specified parts of
> RDF in the Primer.  (I was going to say ``junk'', but nobly refrained.)
> Removing Sections 4.1 and 4.3 would make the Primer much better. 


I don't agree.  The Primer is serving two roles here:  both the usual 
role of a "Primer", and also as a place for non-normative explanations 
of things that mostly aren't covered in the other documents.  I'm not 
necessarily arguing in support of either the container or reification 
syntax.  But given that they are still in RDF (we didn't deprecate 
them), I think that saying nothing about them is worse than discussing 
them and trying to clarify their status.  Saying nothing about them 
would simply leave current users of these facilities believing that they 
are somehow fully-specified (we've received comments from people who 
have read the Primer, and have picked up from these descriptions that 
they don't mean what they thought they meant).  Certainly in the case of 
the container syntax, people are successfully using it to exchange data, 
based on the "conventional" meanings.


> 
> If Section 4.1 is retained, it needs to be completely rewritten.   The
> continued reference to ``intended'' meaning is just silly.  It would be
> much better to come right out and say that containers are just
> uninterpreted conventions that can be given any meaning whatsoever by
> applications, but that there is some vague notion that this meaning should
> be somehow related to bags, sequences, or alternatives.  In particular,
> having examples where the same predicate is used to relate to individuals
> as well as to a bag containing the individuals is more than misleading.
> Presenting such misrepresentations in a Primer is a disservice to the RDF
> community.  Similarly making a committee be a bag is misusing even the weak
> intended meaning of bags.  A committee is not a bag of its members. 


I've tried to convey the idea that the container syntax is based on the 
use of conventions, and that the conventions need to be built into 
software that processes the containers.  That's what the "intended 
meaning" refers to.  I can certainly try to clarify this (but there's a 
fair bit of discussion of this already).  Regarding "examples where the 
same predicate is used to relate to individuals as well as to a bag 
containing the individuals", can you clarify?  Regarding making the 
committee a bag, the example could certainly identify the committee as a 
resource, and give it a "members" property with the bag as an object. 
On the other hand, the committee certainly has members itself, and they 
are unordered.


> 
> Section 4.3 also needs a complete rewrite.  It perpetuates the old RDF myth
> that instances of  rdf:Statement have something to do with RDF statements.
> There is just no way in which an instance of rdf:Statement can be
> considered to be a model of an RDF statement.  There is also no notion of
> reification supported by RDF.  This is only made worse by the attempt to go
> from RDF statements to particular instances of RDF statements, a notion
> that exists nowhere in RDF.   If a notion is not part of RDF, then the
> Primer has no business talking about it. 


The section tries to explain the points you raise, and does so because 
reification (or at least the "reification vocabulary") is still part of 
RDF.  Thus I think some explanation is called for.  This section also 
follows pretty closely the corresponding discussion in the Semantics 
document.


> 
> Section 4.4 is also suspect in the same way as Sections 4.1 and 4.3, but to
> a lesser extent.  
> 
> If any of the stuff in Sections 4.1, 4.2, and 4.3 are to be included in the
> Primer it would be much better to relegate them to some dusty appendix,
> perhaps called ``Conventional Ways of Using RDF''.  This appendix could
> also discuss things like using collections of triples to represent
> structured values or n-ary relationships. 


See my comments above.


> 
> Section 5: 
> 
> Can RDFS be used to ``specify'' anything?  Not really.  Instead, RDFS
> allows one to build a sort of a primitive type theory for a domain.  Saying
> that RDFS is a specification language does a grave disservice to true
> specification languages. 


I'm not sure I understand the distinction you're making here.  Would 
replacing "specify" with "describe" be more accurate in your opinion?


> 
> Describing inference as the ability to ``act as if'' is not a particularly
> correct description. 


OK.  What would you suggest?


> 
> Generalization hierarchies are generally written with the more-general
> classes above the less-general classes. 


I know.  How important do you think this change is?


> 
> The general idea of the section about the use of schema-supplied
> information is good, but the emphasis is placed in the wrong place.
> Instead it would be better to state something like applications may choose
> to use schema-supplied information to limit the kinds of data they generate
> outside of RDF so that it explicitly conforms to the schema.  The third
> example here is the worst one because it uses non-datatype inconsistency a
> notion alien to RDF.   The next paragraph is simply wrong, as there is no
> notion in RDF that allows for invalidity due to the absence or presence of
> properties. 


The use of schema information for constraints is there (and we certainly 
don't say it *can't* be used as constraints).  The emphasis is placed on 
the other uses of schema information because it is these uses that don't 
seem to occur to most readers, based on the comments we've received. 
Regarding the other comments, we've talked about what an *application* 
might do, not about what RDF itself does.  Is the problem that this 
distinction isn't sufficiently clear?


> 
> It would be better to state something like that rdfs:comment and the other
> non-logical RDFS properties are used only  conventionally, and completely
> externally to RDF.  Nothing in RDF requires that they be used for their
> conventional purposes.  The phrase ``can be used'', although it is
> technically correct, is very misleading here. 


This is good text.  It could also be used in the container and 
reification sections.


> 
> Section 7:
> 
> Here again we see wording to the effect that the meaning of an RDF graph
> can be conveyed by any mechanism whatsoever, including in principle the
> inaccessible thoughts of a deceased creator of some web page that contains
> some RDF/XML.  Although it is true that any formal system will have an
> intended meaning that is outside the formal system itself, it is a bad idea
> to give this intended meaning the status that is being given to the
> non-model theory meaning of RDF.   Doing so is an open invitation for
> applications and users to claim that their intended meanings for their RDF
> documents are part of the official meaning provided by RDF.  
> 
> Separating RDF meaning into a formal and a social part will do nothing to
> prevent this.  About the only way to mention intended meanings in the RDF
> documents without legitimizing this meaning-bloat would be to explicitly,
> strongly, and continually mention that the non-formal meanings given to RDF
> documents in applications are not a part of their RDF meaning. 


Here, the Section is attempting to briefly describe what the Concepts 
document says.  I agree that the distinction between the RDF meaning and 
meaning people associate with RDF by other means needs to be made very 
clear.


> 
> References 
> 
> It is very weird that a non-normative document has normative references! 


Perhaps.  I've asked about whether this distinction is appropriate in 
the Primer, so this may become less-weird.


> 
> Appendix A 
> 
> Finally, some discussion of the problems with referring to web pages.  If
> only the advice here was taken in the body of the document. 
> 


I think this is covered in previous comments.


Thanks again.


--Frank



-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
Received on Sunday, 19 January 2003 17:17:45 UTC