[VM] collected notes from reviewers on March 16th Editor's draft

Hi all,

Here, summarized by section, is the collection of comments (in one 
place) on the March 16th draft of the document.

Thanks,

Elisa
Comments on March 16 Editor's Draft, Vocabulary Management


0. General

Diego: I think the document is moving to a maturity level and it is ready to be
aired. The most important aspects of vocab management are covered, and
five clear recommendations can be found. I would suggest to put more
visual stress on these recommendations, so they catch the eye even when
skimming the document. Please consider putting a colorized frame box to
the list at the beginning of section 2.


Mark van Assem:

Abstract

- recommend to change "citation" to "reference"
- this document is not only focused on "re-use" but also on initial "use", so recommend "(re-)use)"

- I myself am not familiar with the term "principles of good practice". To me the word "best practice" (like used in the Recipes document) or "principles" seems more natural. (Ignore this if this is a typical comment of a non-native speaker ;)

- the use of the word "user" in the whole document is ambiguous to me, because it might mean "end user". I assume "developer that uses the vocabulary" is meant, so then e.g. "developer" could be reserved for "user of the vocabulary" and e.g. "publisher" for those who make it available.

- sentences "Further ... methodologies/approaches" is unclear to me.

- what is the intended readership of this document? Specifically, does it include people well-versed in vocabulary development but not familiar with SW-technology?

Status

- the word "strategies" is used which seems to refer to "principles of good practice"

- the relationship between "vocabulary" and "ontology" is unclear. Depending on the intended readership it might be a good idea to leave the whole notion of "ontology" out.

1. Introduction

- "vocabulary" seems to be presented as equivalent to *schema* (examples given are SKOS, DC, FOAF, OWL). However, in other communities the word "vocabulary" is associated with the actual concept schemes (e.g. AAT, LCSH, MeSH). If a distinction between the two is to be maintained and the intended readership includes non-SW people, I recommend to explain the difference. 


----------------------------------------------------------

1. Section 1 and 2.1 -- Definition of RDF vocabulary

Ralph:  Introduction, paragraph 2, says '...  the notion of an "RDF
vocabulary" is similar to the notion of a "web ontology"'.
It seems to me that we can and should say something
stronger than "is similar to".  I've generally felt that by
"web ontology" we [should] mean "OWL ontology" and
that the class of RDF vocabularies is a subset of the
class of OWL ontologies.  This may not be sufficiently
precise for some people but I expect you could propose
some words along these lines. 


Tom: 1. I agree with Ralph [2] that the notions of "Web ontology",
"OWL ontology", and "RDF vocabulary" need to be clarified.
In particular, there remains an apparent contradiction between
Section 1, first sentence:

        An RDF vocabulary is a set of resources denoted by URIs. 

and Section 2.1, first sentence:

        An RDF vocabulary consists of a set of URIs.

Attention has been drawn to this apparent contradiction
in various BPD and SWD telecons over the years; it's
still there in the draft; and I must admit that I still
do not entirely grasp its implications.  The 2004 RDF
Recommendation documents say, for example (from Primer):
"Since RDF uses URIrefs instead of words to name things in
statements, RDF refers to a set of URIrefs (particularly
a set intended for a specific purpose) as a vocabulary."

This draft should at any rate explain this group's
understanding of what an "RDF vocabulary" is -- for example,
compared to "_the_ RDF vocabulary" or an "OWL ontology", and
saying whether its URIs (or URIrefs??) "are" or "denote"
the terms of the vocabulary.  If that position contradicts,
or seems to contradict, the position in other current
RDF documents, then the draft should point this out
somehow, citing relevant sources.


+1


>> 1. I agree with Ralph [2] that the notions of "Web ontology",
>>    "OWL ontology", and "RDF vocabulary" need to be clarified.
>>    In particular, there remains an apparent contradiction between
>>    Section 1, first sentence:
>> 
>>         An RDF vocabulary is a set of resources denoted by URIs.
>> 
>>    and Section 2.1, first sentence:
>> 
>>         An RDF vocabulary consists of a set of URIs.


Yes, this contradiction must be dealt with before publication, I think.


>>    Attention has been drawn to this apparent contradiction
>>    in various BPD and SWD telecons over the years; it's
>>    still there in the draft; and I must admit that I still
>>    do not entirely grasp its implications.  The 2004 RDF
>>    Recommendation documents say, for example (from Primer):
>>    "Since RDF uses URIrefs instead of words to name things in
>>    statements, RDF refers to a set of URIrefs (particularly
>>    a set intended for a specific purpose) as a vocabulary."
>> 
>>    This draft should at any rate explain this group's
>>    understanding of what an "RDF vocabulary" is -- for example,
>>    compared to "_the_ RDF vocabulary" or an "OWL ontology", and
>>    saying whether its URIs (or URIrefs??) "are" or "denote"
>>    the terms of the vocabulary.  If that position contradicts,
>>    or seems to contradict, the position in other current
>>    RDF documents, then the draft should point this out
>>    somehow, citing relevant sources.


The RDF semantics [1] is unequivocal, from section 0.3:

"""
A name is a URI reference or a literal. These are the expressions that need
to be assigned a meaning by an interpretation

...

A set of names is referred to as a vocabulary. The vocabulary of a graph is
the set of names which occur as the subject, predicate or object of any
triple in the graph.
"""

Grasping the disctinction between the syntax of RDF and its model-theoretic
semantics has, for me, been extremely valuable. I recommend using the word
"vocabulary" strictly in the sense used at [1] (i.e. as meaning a set of
names), because it helps to reinforce the distinction between an RDF graph
(a set of triples) and its interpretation (the set of resources, and the
sets classes, properties and their extensions).

>From the OWL semantics [2], section 3.1:

"""
An OWL vocabulary V consists of a set of literals VL and seven sets of URI
references, VC, VD, VI, VDP, VIP, VAP, and VO...
"""

I.e. [2] is again clear that a vocabulary is a set of names. 

However, again from [2], section 2.1:

"""
An OWL ontology in the abstract syntax contains a sequence of annotations,
axioms, and facts.
"""

So the notion of an OWL ontology, strictly speaking, is quite different from
the notion of a vocabulary. The notion of an OWL ontology, as defined at
[2], is something akin to a document, and as such is much closer to the
notion of an RDF graph than to the notion of an RDF vocabulary.

To sum up:
 * "RDF vocabulary" and "OWL vocabulary" mean something very similar (a set
of names)
 * "RDF graph" and "OWL ontology" mean something fairly similar (a set of
assertions, more or less) 

The word "term" is problematic, as illustrated very nicely in Tom's comment
above. You could use "term" in a strict sense to be synonymous with "name",
but then you might as well use "name" instead. I recommend avoiding "term"
altogether.


Tom:
2. The draft says that resources in an RDF vocabulary will
   "usually (but not necessarily) be of type rdf:Property,
   rdfs:Class, owl:Class, or skos:Concept."   Maybe this
   is close enough, but can we define the universe of RDF
   vocabularies more precisely?  Would a vocabulary declaring
   things to be ex:Property, where ex:Property has no declared
   relation to rdf:Property, be an RDF vocabulary?

   Can we say that the relationship to _the_ two main RDF
   and RDFS vocabularies is a defining characteristic of RDF
   vocabularies?  To my way of thinking, an RDF schema has
   the function of declaring an ontological commitment for
   how the resources described therein fit into the RDF/RDFS
   model -- saying, in essence, "this thing here is a Property
   as defined by the RDF specification", and "this thing here
   is a Class".  If the classes of OWL ontologies and SKOS
   concept schemes are subsets of the class of RDF vocabularies
   (Ralph put it the other way around in [2]...?), then all
   RDF vocabularies have the function of relating their terms,
   ultimately, to the RDF/RDFS model.

   Can we agree on this, and should the note say this more
   clearly?


Tom:
3. A few stylistic points: 
   -- suggest avoiding "we" (e.g., Section 2, paragraph 2)
   -- "Web" should always be uppercase here
   -- suggest avoiding constructs such as "owner(s)" and
      "developers/maintainers"


Tom:
4. I suggest dropping the reference to CORES resolution,
   which reports on discussions as of 2002 and is now dated.


Tom:
5. The section Status of this Document cites things that
   seem out of scope for a Status section, such as references 
   to Web applications.


Tom:
6. Introduction, fourth paragraph. It is not clear to me why
   portals are highlighted so prominently in the Introduction.
   The paragraph says that "repositories" supply additional
   metadata that is "almost as important as the vocabulary
   itself".  This seems to emphasize the importance of
   third-party contextual information (in this case from a
   Web application) for using a vocabulary.  I had assumed
   we would want to emphasize a "follow your nose" approach
   as the main take-home message, and this appears to point
   in another direction.


------------------------------------------------------------------------

2.1

Diego: the Recipes documents cited as guidelines to choose URI
namespaces. While the Recipes indeed contain some advise on this topic,
they also make the following remark: [[ This document is intended for
creators and maintainers of existing vocabularies. Proper guidance on
choosing the best URI namespace for any given situation is beyond the
scope of this document. ]]. I would suggest to rephrase the citation in
order to not to create too many expectations.


Tom: 
7. Section 2.1, paragraph 7 on "URI schemes".
   W3C uses HTTP URIs (and not, for example, "info:" URIs).
   What is meant here by "URI schemes", I think, is methods
   for constructing URI strings.  However, I think we need
   to be emphasizing the principle of URI opacity [3].
   Does the example encourage vocabulary managers to design
   URIs which embed enough information in their strings to
   "assist potential users in finding various artifacts on
   [a Web] site" (or am I perhaps misunderstanding the point
   of the example)?


Mark:
2.

It would improve readability to divide each subparagraph of Sec.2 into sub-elements (in layout), e.g. "Principle: ...", "Motivation: ...", "Examples: ...", "Notes: ...".


2.1

- This section starts with five bullet points. Recommend to address each bullit point in the succeeding text in turn. Currently the text directly after the list addresses an issue not in the list: dependencies between vocabularies.

- recommend to provide example of "good" and "bad" URIs
- what is a "subordinate URI scheme"? 

------------------------------------------------------------------------

2.2

Ralph: 
2.2. Provide Readable Documentation, paragraph 3, says
"... we recommend publishing both human and machine-readable
documentation ...".  This comes after two paragraphs that describe
the human-readable documentation and so might mislead a
reader to thinking that the human readable documentation might
be more important.  As noted later in section 2.5, the machine-
readable variant is really a MUST in the Semantic Web.  It would
be sufficient, I think, to add to the start of this paragraph "/+The
Semantic Web relies on machine-readable descriptions of
vocabularies.+/ In practice, we recommend ..."


Diego:
* Sect 2.2: I would add some sentences about RDFS annotation properties
(rdfs:label, rdfs:comment), and their importance to provide in-line,
multilingual documentation of the vocabularies.

* Sect 2.2: Related to the previous comment, I would suggest to include
something about tools that can generate readable documentation (HTML)
from the annotated vocabularies. One of such tools is SpecGen [2], which
is being used to create HTML documentation for the SIOC ontology.

[2] http://forge.morfeo-project.org/wiki_en/index.php/SpecGen


Ralph:
2.2. Provide Readable Documentation, paragraph 4, "A recent
EU activity ...".  I suggest using absolute temporal coordinates;
i.e. "A 2007 EU activity ..." so that the document will be less
confusing further down the time axis.


Ralph: 
2.2. Provide Readable Documentation, after paragraph 4 I think
it would be good to add a short paragraph citing the Dublin Core
revision history and giving a forward reference to section 2.3.2.


Diego:
* Sect 2.2: There is a cite to the Recipes at the end of this section,
but I would suggest to complement it with some words about the
convenience of making human-readable documentation and machine-readable
definitions from the same URI using content-negotiation.


Tom:
8. Section 2.2, "Provide Readable Documentation", final
   paragraphs.  The paragraph-long descriptions of specific
   tools for change management seem a bit out-of-place here.
   It is good to say that documentation should also cover
   changes, but the focus on specific tool functionalities
   and interfaces distracts from that point.  The general
   point about tracking, documenting, and annotating changes
   is perhaps a better fit to the section on versions.


Mark:
- remove "*" around words and make the words bold/italic
- the WordNet example seems to cite "a plugin for Protege, an RDF version, an ontology and ..." as documentation of WordNet, which seems not intuitive to me
- the paragraphs about collaborative development are loosely connected to the previous paragraphs. Recommend to provide subheading to separate them, e.g. "Collaborative Development"

The principle seems to distinguish two kinds of documentation: usage documentation and change documentation; this could be made explicit. 

----------------------------------------------------------------------

2.3

Diego:
* Sect 2.3: There was a recent thread on semantic-web@w3.org regarding
the convenience of publishing early drafts of ontologies [3]. I wonder
if it is worth to mention this topic here.

[3] http://lists.w3.org/Archives/Public/semantic-web/2008Mar/0119.html


Tom:
9. Section 2.3, last paragraph.  Section 2.3 is about
   articulating maintenance policies for "RDF vocabularies",
   but the list covers vocabularies "and other artifacts
   that have a published maintenance policy".  The list
   mixes two different types of thing -- the SKOS, DCMI, and
   OWL vocabularies; and artifacts that are not themselves
   vocabularies (the URIs for W3C Namespaces guide and the
   Catalog of OMG Specifications).  Instead of references to
   "vocabularies covered by policies", perhaps the list should
   reference policy documents themselves, such as [4].  This
   however raises the question of what to cite for SKOS; see 
   next point...


Tom:
10. Section 2.3.1, Maintenance policies for SKOS.  This
   section is particularly relevant in light of the proposal
   currently on the table for handling deprecated properties
   [5] and our discussion on Tuesday [6].

   The SKOS Core Vocabulary Specification WD of 10 May
   2005 (not yet actually cited in the draft) has a Policy
   Statement [7] that as yet has no equivalent in the current
   SKOS Reference.  If we want to publish a note advising
   vocabulary maintainers to "Articulate Maintenance Policies",
   then somewhere we need to record our positions on these
   issues with regard to SKOS.  I think we should do so in the
   SKOS Reference itself.  Indeed, if we were not to move this
   forward, we would perhaps need to remove the discussion
   of SKOS from the Vocabulary Management note altogether,
   else we be in the position of citing an obsoleted document
   as an example of good practice for maintenance policy.

   Discussion of the Vocabulary Management note in Washington
   will be limited to just one or two hours, however any
   discussion we would have on SKOS policies would be directly
   reusable in this context.


Mark: 
The links to examples of vocabularies "and other artifacts" (?) that have a published maintenance policy do not point to information on maintenance policies, but about the vocabularies themselves.

Secs. 2.3.1 - 2.3.3 have open TODOs.

-----------------------------------------------------------------------

2.4

Mark:
- recommend to emphasize that not using unique version URIs causes URI clashes

- par.5 distinguished two approaches: (1) parallel editing and then publishing one final version (2) releasing different versions, with the "same" elements at different URIs. It is not clear to me if the first approach proposes publication at the same or different URIs. I also don't get why the first approach is "preferable when a vocabulary ... makes significant portions of the older version obsolete", and second approach "maintains compatibility across the different versions".

- the papers listed at the end could be moved to Sec.4  ("Additional reading")

------------------------------------------------------------------------


2.5

Diego:
* Sect 2.5: s/accept=/Accept: /


Mark:
- The word "Formal" in the section title may be confusing and doesn't seem needed to cover the contents of the section
- the section should explain some of the motivation of publishing content at the vocabulary URIs
- The section is currently relatively short as it refers to the Recipes for further explanation. However, it could provide some more "preview" to what is in the Recipes.
- provide an example (a clickable URI) that underlines what can be published as a resource description 

-----------------------------------------------------------------------

3

Ralph:
3. Research Topics, mentions C-OWL without providing a reference.
The long (fourth) paragraph describing C-OWL reasoning is pretty
dense and may be too detailed here relative to the rest of the
document.  Perhaps it can be summarized to be more accessible
to a wider readership, but at least for now it's interesting to keep it.


Tom:
11. Section 3, Research Topics.  Section 3 is devoted to
   describing topics of advanced research in reasoning.
   As this section amounts to one fifth of the draft's body,
   I have a concern about the message it sends.  To my way
   of thinking, very high-level principles such as "provide
   documentation" aim at encouraging newcomers to take the
   plunge and publish a vocabulary.  The emphasis here on
   advanced research seems to raise the bar,
   or at any rate it sends the message that RDF vocabularies
   can be intimidatingly complicated.  It is also the section
   of the draft which will gost quickly go out of date.
   My preference would be to de-emphasize open research
   issues in favor of the simple principles.


Mark:
I agree with Tom Baker [2] that it is preferable not to emphasize open research. Moreover, the current text primarily gives attention to one issue (automated version reasoning) that is not directly related to one of the principles, and has a large focus on one possible implementation (C-OWL).

-----------------------------------------------------------------------

4

Mark:
4.

Recommend to provide names of the documents instead of their URIs and explain/classify them (e.g. to what principle they are relevant) 

6

Ralph:
6. References; CoolURIs has been updated and a final Interest Group
Note published.   The preferred URI is http://www.w3.org/TR/cooluris/

Diego: 
* Section 6 (References), cite WEBARCH: s/and and/and/

Received on Wednesday, 7 May 2008 18:02:25 UTC