[VM] Draft of 2004-10-25

SWBPD "Vocabulary Management" 
Draft, 2004-10-27

Abstract

Metadata element sets, taxonomies, subject headings,
thesauri, and ontologies are examples of vocabularies which
are increasingly used in a "Semantic Web" environment.
Managing vocabularies for use in Semantic Web applications
means identifying, documenting, and publishing vocabulary
terms in ways that facilitate their citation and re-use in a
wide range of applications.  This paper examines practices in
the maintenance communities for representative vocabularies
ranging from small and informal to large and complex.
The paper formulates principles of good practice and summarizes
discussion on issues for which good practice has yet to emerge.

1. Introduction

1.1. Vocabularies in the Semantic Web

    The Semantic Web is an open, distributed, loosely-coupled
    environment with lots of languages (metadata element
    sets, controlled vocabularies, taxonomies, thesauri,
    ontologies, etc).  Organizations or even individuals can
    define and publish vocabulary terms in an open, bottom-up,
    and distributed manner.  This paper is addressed to people
    who want to create and maintain such a Vocabulary.

    This paper articulates some basic principles for doing
    so in a Semantic-Web-friendly way.  By this we mean
    vocabularies that can support processes of referencing,
    repurposing, recombining, or merging data from a diversity
    of sources; that are evolvable; that are extensible and
    mixable with other Semantic Web vocabularies; and that
    are declared in a way that is processable by networked
    machines in an emerging "semantic infrastructure".
    [Bernard asks: Which processes are the terms supposed to
    support -- indexing, vocabulary merging, data integration,
    search...?  Do we say something about those processes or
    are we agnostic?]

    TASK: James - One page on "vocabularies in Semantic Web"
        The two placeholder paragraphs above should be expanded
        into one short page providing a general introduction
        to the topic "vocabularies in the Semantic Web" --
        what kinds of vocabularies are we talking about here
        (e.g., the typology in [PIDCOCK]) and what does it mean
        to use them in a "Semantic Web" environment?  Rather
        than elaborate very much in-line, this section should
        point off to further reading about Semantic Web.

1.2. Method of this paper

    In Section 2, this paper will formulate a few principles
    of good practice applicable to Semantic Web vocabularies
    in general.  To illustrate these principles, the paper
    will describe practices used in several vocabularies
    chosen to exemplify a range from small and informal to
    large and complex:

    -- FOAF
       TASK: DanBri and Libby - One paragraph on FOAF
           FOAF serves as an example of a "relatively small"
           vocabulary for "descriptive metadata" about people
           and their interests [FOAF].  Its maintenance
           processes are "somewhat informal".

    -- Dublin Core
       TASK: Tom - One paragraph about Dublin Core
           Dublin Core serves as an example of a "medium-sized"
           vocabulary for "descriptive metadata" about
           information resources [DC].  Its maintenance
           processes are "lightweight but not weightless"
           and increasingly formal as DCMI evolves from a
           workshop-driven movement to a stable maintenance
           community supported by institutional stakeholders.

    -- SKOS
       TASK: Alistair - One paragraph about SKOS
           SKOS serves as an example of a "medium-sized"
           vocabulary for describing "thesauri" and similar
           types of knowledge organization systems.  (Not sure
           about maintenance issues.)  The SWBPD thesaurus
           activity should be cited.

    -- Princeton Wordnet
        TASK: Aldo - One paragraph about wordnet issues
           As a lexical system of synonym sets for the English
           language, Princeton Wordnet can serve as an example
           of a "large-scale" vocabulary.  (Not sure about
           maintenance issues.)  The SWBPD activity should
           be cited [SWBP-WNET].

     -- A major medical or life-sciences vocabulary?
        TASK: Alan or Natasha - An example of a large-scale ontology?
           Do we perhaps need another major example?  It would
           be good to have a "large-scale" vocabulary of the
           "ontology" sort, preferably with some well-defined
           maintenance and versioning policies...

    In addition, this paper cites several prior works on
    good practice in closely related areas:

    -- World Wide Web Architecture and Semantic Web principles
       TASK: DanBri - Bullet point on W3C good-practice documents
             TBL has written about Web architecture, and TAG
             has come out with Architecture of the World
             Wide Web, First Edition [SW-ARCHITECTURE and
             W3C-TAGARCHITECTURE].  A bullet point should put
             these various formal and informal position papers
             into the proper perspective for outsiders to W3C.

    -- OASIS Published Subjects
       TASK: Bernard - Bullet point on OASIS Published Subjects
             The bullet point should provide some context on Topic 
             Maps and Semantic Web and on the PSI Recommendation
             [OASIS-PUBSUBJ].

    The terminology used to talk about vocabularies and
    their underlying linguistic models differ between
    user communities.  Without wishing to imply that these
    differences are trivial, this paper uses a small set of
    words defined with deliberate fuzziness:

    Term                A named concept.
    Vocabulary          A set of terms.
    URI Reference       A globally unique identifier.
    Description         A set of statements about a term or vocabulary.
    Declaration         A machine-processable representation of 
                        a term or vocabulary.
    Vocabulary Owner    The maintainer of a term set.
    Versioning          The identification of changes to a term
                        or vocabulary.

    These words are qualified in the examples which follow
    and in the Glossary.  One potential source of confusion
    should perhaps be acknowledged and discussed up-front:
    the term "namespace", which is used in a number of
    vocabulary communities, W3C in particular, but is (in my
    opinion) difficult to pin down.  If we can agree to use
    "vocabulary" in this paper (noting the usage of "namespace"
    where appropriate), I would like to task someone (DanBri?)
    to explain the W3C use of the term "namespace".

    TASK: DanBri or Libby - Describe W3C usage of the word "namespace"

2. Principles of Good Practice

Short paragraph explaining that in this section, we formulate
and illustrate principles of good practice on which we
generally agree.

2.1. Identify Terms with URI References.

     TASK: DanBri - Define "URI Reference", elaborating in the Glossary
     TASK: DanBri - Sentence or two on FOAF term URIrefs
     TASK: Tom - Sentence or two on DCMI term URIrefs
     TASK: Tom - A sentence on the "CORES Resolution"
     TASK: Alistair - Sentence or two on SKOS term URIrefs
     TASK: Aldo - Sentence or two on Wordnet term URIrefs
     TASK: DanBri - What W3C says about identifying terms
     TASK: Bernard - What PSI says about identifying terms

2.2. Articulate and publish maintenance policies for the Terms 
     and their URI references.

     A Vocabulary Owner should specify and publish any policies
     governing the maintenance of the terms and their URI
     references: e.g. institutional commitments to persistence
     and semantic stability.  This short to medium-length
     section should simply describe a sample of such policies.

     [It would be nice if we could agree on something of
     the substance of those policies, such as stability of
     URI references in the face of "semantically compatible"
     evolution, but this may be difficult to define.]

     TASK: DanBri - Describe maintenance policies for FOAF
     TASK: Tom - Describe maintenance policies for DCMI
     TASK: Alistair - Describe maintenance policies for SKOS
     TASK: Aldo - Describe maintenance policies for Wordnet
     TASK: DanBri - What W3C says about maintenance policies
     TASK: Bernard - What PSI says about maintenance policies
     TASK: Alistair - TAG Versioning on "semantic stability"

2.3. Identify the historical version of a Vocabulary or
     its Terms.

     Building on the previous section, this section should
     look at versioning from the standpoint of identification.
     At what level of granularity does versioning operate? Are
     URI references being assigned to individual terms,
     to sets of terms in the abstract, or to documents or
     schemas of term sets?  Presumably, this section should
     highlight W3C practice in this area (e.g., the method
     of distinguishing a timeless Latest Version from a
     date-stamped This Version and Previous Version).

     TASK: Ralph - Longer paragraph on versioning in W3C
     TASK: DanBri - Short paragraph on versioning in FOAF
     TASK: Tom - Short paragraph on versioning in DCMI
     TASK: Alistair - Short paragraph on versioning in SKOS
     TASK: Aldo - Short paragraph on versioning in Wordnet
     TASK: Bernard - Short paragraph on versioning in PSI
     TASK: Alistair - What TAG says about versioning
     TASK: Alan - "What constitutes a change?"

2.4. Provide documentation about the Terms.  

    The Vocabulary Owner should describe and publish a
    human-readable description of the Terms -- typically,
    at a minimum, text definitions on a Web page.  This short
    section should merely say what sort of Web documents are
    made available for the example vocabularies.

     TASK: DanBri - One sentence pointing to FOAF Web documents
     TASK: Tom - One sentence pointing to DCMI Web documents
     TASK: Alistair - One sentence pointing to SKOS Web documents
     TASK: Aldo - One sentence pointing to Wordnet Web documents
     TASK: DanBri - One sentence pointing to W3C Web documents
     TASK: Bernard - One sentence pointing to PSI Web documents

2.5. Declare the Terms using a machine-processable schema
     language.

     This short section should merely say what sorts of
     schemas the example maintenance communities publish.
     Policies for dereferencing and choice of schema language
     will be discussed in more detail in Section 3.

     TASK: DanBri - Two sentences on FOAF schemas.
     TASK: Tom - Two sentences on DCMI schemas.
     TASK: Alistair - Two sentences on SKOS schemas.
     TASK: Aldo - Two sentences on Wordnet schemas.
     TASK: DanBri - Two sentences on W3C schemas.
     TASK: Bernard - Two sentences on PSI schemas.

3. Questions on the Bleeding Edge

Paragraph explaining that Section 3 discusses issues on
which consensus currently seems more elusive.  Our goal is
to describe the range of positions taken.

3.1. What should the identifier of a Vocabulary or Term (i.e.,
     its URI Reference) resolve to when someone "clicks on it"
     in a Web browser?

     We could reword this as the problem of resolving
     ("dereferencing") Term URIs to human-readable descriptions
     or machine-processable declarations.  Several years ago,
     Tim Berners-Lee said that "The namespace document (with
     the namespace URI) is a place for the language publisher
     to keep definitive material about a namespace.  Schema
     languages are ideal for this."  Others have disagreed with
     this and the question was taken up by TAG.  Point 3.1
     should summarize the state of discussion.  If Terms are
     documented in multiple ways, should a Vocabulary Owner
     distinguish between "canonical" versus "derived" sources?

     TASK: Ralph - Paragraph or two on W3C dereferencing policy
     TASK: Bernard - Paragraph on PSI dereferencing policy
     TASK: DanBri - Short paragraph on FOAF dereferencing policy
     TASK: Tom - Short paragraph on DCMI dereferencing policy
     TASK: Alistair - Short paragraph on SKOS dereferencing policy
     TASK: Aldo - Short paragraph on Wordnet dereferencing policy

3.2. Which schema language should be used to declare the
     Vocabulary machine-processably?

     Short answer: It depends what you want to say.
     This section should characterize the assertions made
     in schemas published by various communities.

     TASK: DanBri - Short paragraph on what FOAF schemas assert.
     TASK: Tom - Short paragraph on what DCMI schemas assert.
     TASK: Aldo - Short paragraph on what Wordnet schemas assert.
     TASK: DanBri - Short paragraph on what W3C schemas assert.
     TASK: Bernard - Short paragraph on what PSI schemas assert.
     TASK: Alistair - Short paragraph on what SKOS schemas assert.

     In particular, there was a discussion in September on
     the SWBPD list on different approaches to modeling
     thesauri [THESAURUS-MODEL].  For example, one could
     use OWL or RDFS to represent an existing language of
     thesaurus relations and simply translate an existing
     thesaurus into those terms.  Or one could fundamentally
     remodel the thesaurus using native OWL constructs --
     a much more ambitious task (because the semantics of
     class, subclass, etc, are not identical to thesaurus
     terms).  When is it "good enough" to express the fuzzy
     semantics of an existing thesaurus, which can be done
     rather automatically, and what does the extra effort of
     remodeling an ontology buy for applications?  There is
     an overlap here with the PORT task force.

     TASK: Alistair - Discuss alternative ways to model a thesaurus

3.3. What does it mean to "use" Terms from one Vocabulary
     in another?

     This issue has at least two aspects:

     -- The problem of "semantic context".  Terms may be
        embedded in clusters of relations from which they
        may be seen in part to derive their meaning.  It may
        therefore not always be sensible to use those terms out
        of context.  Examples include the terms of thesauri
        or ontologies, as well as XML elements, which may
        be defined with respect to parent elements and may
        therefore not always be reusable as properties in an
        RDF sense without violating their semantic intent.

        TASK: Bernard - Reuse of existing terms in a local context
        TASK: Tom - DCMI on "terms usable as RDF properties"
        TASK: Everyone - Using terms outside of their original contexts

     -- Application profiles.  Many (most?) vocabulary
        maintainers end up with some notion of "profile" to
        designate either a constrained subset of the vocabulary
        and/or a language which mixes multiple vocabularies
        for a particular purpose or application.  The VM note
        could characterize the nature of these constructs.

        TASK: Tom - Describe the DCMI notion of "application profile"
        TASK: Everyone - Describe other notions of "application profile"

3.4. What does it mean to "own" a Vocabulary?

     In this section, we acknowledge that "vocabularies"
     are inherently a human linguistic phenomenon.  As with
     other forms of language, there is inevitably a tension
     between the meaning intended by a speaker and meaning
     as interpreted or imposed by others.

     If this paper is addressed to vocabulary maintainers
     (existing and potential) -- and we have in essence
     articulated some responsibilities for vocabulary
     maintainers (in Section 2 above) -- we should also
     question our underlying assumptions.  The RDF Concepts and
     Abstract Syntax draft of 2003-01-23 said that "The social
     conventions surrounding use of RDF assume that any RDF URI
     reference gains its meaning from some defining individual,
     organization or context...  For important documents, the
     use of third-party vocabulary should be restricted to
     terms defined by trustworthy parties (e.g. recognized
     standards bodies or reputable organizations)...".
     In response to that draft, however, there was animated
     discussion about the "social meaning" versus the "formal
     meaning" of RDF assertions [SW-MEANING].  This debate
     should perhaps be summarized from the standpoint of a
     Vocabulary maintainer.

     TASK: Jeremy? - Summarize discussion of "social meaning"

     Even if we acknowledge the notion of "ownership" to be
     problematic, we should perhaps introduce the notion
     of "trust".  Tom could briefly describe negotiations
     between the DCMI Usage Board with the Library of
     Congress whereby LoC asserts certain MARC Relator terms
     (identified with URI references) to be sub-properties
     of dc:contributor, and DCMI endorses those assertions
     ("assertion etiquette"?).

     TASK: Tom - DCMI endorsing assertions about MARC Relator terms
     TASK: Everyone - Comment on the role of the "vocabulary owner"

3.5. When a term is needed, when should one adapt
     an existing term, declare a new one, or get an established
     vocabulary maintainer to host it?

     It would be good to end the VM note with this question,
     because I suspect that alot of the readers will be asking
     precisely this question.  This is where we can summarize
     our understanding of good practice for maintenance and
     persistence policy.  Andy Powell's sensible advice on
     these issues could be summarized here [DC-IDENTIFIERS],
     along with a general characterization of the "vocabulary
     market" [VOCABULARY-MARKET].  We could introduce the
     notion of a Vocabulary Host, and Tom would be happy
     to describe discussion about this within DCMI from the
     standpoint of long-term maintenance responsibility and
     related institutional models.  Given that one option is
     to coin a URI references, we should at least characterize
     choices with regard to forming the identifier strings:
     "hash or slash" and the implied semantics of words,
     version numbers, or directory hierarchies embedded in
     URI strings.

     TASK: DanBri or Libby - Describe the "vocabulary market"
     TASK: DanBri or Libby - Formation of URI strings ("hash or slash" etc)
     TASK: Tom - DCMI guidelines on coining URI references
     TASK: Tom - DCMI perspective on "namespace hosting"
     TASK: Everyone - When and how to declare new or reuse existing terms

Glossary

    This section -- if we need it -- can provide annotations
    for our minimal terminology from the standpoint of other
    vocabulary maintenance communities.  From the standpoint of
    Dublin Core, for example, one might note here that "term"
    corresponds to what DCMI calls an Element or Element
    Refinement (aka Property), or an Encoding Scheme, etc.
    Alan could point out how this use of "term" differs
    from "term" in the medical community (as distinct from
    "concept").

    -- Term: a named concept.
    -- Vocabulary: a set of terms.
    -- URI Reference: a globally unique identifier.
    -- Description: a set of statements about a term or vocabulary.
    -- Declaration: a machine-processable representation of 
       a term or vocabulary
    -- Vocabulary Owner: the maintainer of a term set.
    -- Versioning: the identification of changes to a term
       or vocabulary.

    TASK: DanBri or Libby - Define URI Reference
          According to my notes, the "RFC2396bis redraft
          will, in the Appendix, clearly state why we say
          URIref not just URI" [RFC2396bis].
    TASK: DanBri - Annotate Glossary with FOAF usage where appropriate
    TASK: Tom -  Annotate Glossary with DCMI usage where appropriate
    TASK: Alistair - Annotate Glossary with SKOS usage where appropriate
    TASK: Aldo -  Annotate Glossary with Wordnet usage where appropriate
    TASK: Ralph -  Annotate Glossary with W3C usage where appropriate
    TASK: Bernard -  Annotate Glossary with PSI usage where appropriate

References

[I have started to fill out the references.  The names next
to many of the references are my best guess as to who should
cover a particular resource in the context of the paper.
Note several related articles or resources are sometimes
grouped under one heading - please help decide which of them
is most salient for the purposes of citation.  Please also
let me know if any of the following are no longer needed, and
feel free to help fill in any missing citation information.]

[CORES-RESOLUTION] - Tom
    CORES Resolution on Metadata Element Identifiers,
    http://www.dlib.org/dlib/july03/baker/07baker.html.

[DC] - Tom
    http://dublincore.org/documents/dcmi-terms/
    http://dublincore.org/

[DC-IDENTIFIERS] - Tom
    Powell, A., Guidelines for assigning identifiers to metadata terms,
    [draft], http://www.ukoln.ac.uk/metadata/dcmi/term-identifier-guidelines/.

[DC-NAMESPACE] - Tom
    DCMI Namespace Policy,
    http://dublincore.org/documents/dcmi-namespace/

[DC-PROFILES] - Tom
    Dublin Core Application Profiles, http://www.cenorm.be/isss/cwa14855/.

[FOAF] - DanBri and Libby
    http://xmlns.com/foaf/0.1/
    http://www.w3.org/2001/sw/Europe/events/foaf-galway/
    http://rdfweb.org/topic/FoafGalway
    FOAF Community Process, http://rdfweb.org/topic/FOAFCommunityProcess.

[OASIS-PUBSUBJ] - Bernard
    Pepper, S., ed., Public Subjects: Introduction
    and Basic Requirements, OASIS Published Subjects
    Technical Committee Recommendation, 2003-06-24,
    http://www.oasis-open.org/committees/download.php/3050/pubsubj-pt1-1.02-cs.pdf.
    http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj
    http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/issues.htm
    Also: OASIS (ISO/TS 15000) ebXMLRegistry Semantic Content.

[PIDCOCK]
    Pidcock, W., Relationships between Metamodels, Ontologies,
    Thesauri, Taxonomies and Controlled Vocabularies,
    http://www.metamodel.com/article.php?story=20030115211223271
    Comments by Mike Uschold:
    http://www.metamodel.com/article.php?story=20030115211223271#comments

[RDF-PRIMER]
    RDF Primer, http://www.w3.org/TR/rdf-primer/.

[RDF-QUERY] - where does this fit?
    Libby and Dan work on RDF query, 
    http://www.ilrt.bris.ac.uk/discovery/2001/06/process/.

[RFC2396bis] - DanBri
    http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-07.txt 

[SKOS] - Alistair
    SKOS Core Guide, http://esw.w3.org/topic/SkosCoreGuideToc - SKOS Core Guide
    http://www.w3.org/2004/skos/core.rdf
    http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/
    http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping

[SWBP-WNET] - Aldo
    Gangemi, A., editor.  Porting Wordnets to the Semantic Web, 
    http://www.w3.org/2001/sw/BestPractices/WNET/Porting.
    http://www.cogsci.princeton.edu/%7Ewn/index.shtml

[SWAD-THESAURUS] - Dan, Bernard and Alistair participated
    SWAD-E Thesaurus - "standard" thesaurus change management 
    guidelines are wanted,
    http://lists.w3.org/Archives/Public/public-esw-thes/2004Apr/

[SW-ARCHITECTURE] - DanBri or Libby?
    Berners-Lee, T. Getting into RDF and Semantic Web using N3, 
    http://www.w3.org/2000/10/swap/Primer.
    Berners-Lee, T. Web Architecture from 50,000 feet, 1999,
    http://www.w3.org/DesignIssues/Architecture#Namespaces

[SWBP-THESAURUS] - Dan and Alistair
    Semantic Web Best Practices: Thesaurus Task Force,
    http://www.w3.org/2004/03/thes-tf/mission

[SW-MEANING] - volunteer needed to summarize!
    RDF Core discussion on issues related to social meaning (Jeremy),
    http://www.w3.org/TR/2003/WD-rdf-concepts-20030123/#section-Meaning had WG 
    consensus, then got trashed:
    http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0366
    then got revised:
    http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0486
    http://www.w3.org/2001/sw/meetings/tech-200303/social-meaning
    Mailing list addressing questions of "namespace ownership":
    http://lists.w3.org/Archives/Public/public-sw-meaning/2004Jun/

[THESAURUS-MODEL]
    VM discussion thread on SWBPD list, e.g.:
    http://lists.w3.org/Archives/Public/public-swbp-wg/2004Sep/0035.html
    http://lists.w3.org/Archives/Public/public-swbp-wg/2004Sep/0036.html
    http://lists.w3.org/Archives/Public/public-swbp-wg/2004Sep/0042.html

[VOCABULARY-MARKET] - DanBri
    Vocabulary Market, http://esw.w3.org/topic/VocabularyMarket
    Image Annotation meeting in Madrid,
    http://rdfig.xmlhack.com/2004/06/07/2004-06-07.html#1086615887.400193
    RDFIG Geo vocab workspace, http://www.w3.org/2003/01/geo/.

[W3C-VERSIONING] - Ralph
    W3C Publication Rules, http://www.w3.org/2004/02/02-pubrules.html
    URIs for W3C Namespaces, http://www.w3.org/1999/10/nsuri 

[W3C-TAGARCHITECTURE] - DanBri?
    Jacobs, I., Walsh, N., Architecture of the World Wide
    Web, First Edition, Technical Architecture Group (TAG),
    http://www.w3.org/TR/2004/WD-webarch-20040816/.

[W3C-TAGISSUES] - DanBri or Libby
    W3C TAG on "What should a 'namespace document' look like?
    http://www.w3.org/2001/tag/issues.html#namespaceDocument-8.
    TAG "consensus" on namespace documents,
    http://www.w3.org/2003/09/15-tag-summary.html.
    Resource Directory Description Language (RDDL), http://www.tbray.org/tag/rddl4.html.

[W3C-TAG-XMLVERSIONING] - Alistair
    Orchard, D., Walsh, N., eds. Versioning XML Languages,
    Proposed TAG Finding 16 November 2003 [Editorial Draft],
    http://www.w3.org/2001/tag/doc/versioning

[WGS84] - DanBri??
    Walsh, J. An RDF vocabulary for WGS84 geo positioning
    [Informational Internet draft], RDF Interest Group,
    http://space.frot.org/draft-geo-draft.html.


-- 
Dr. Thomas Baker                        Thomas.Baker@izb.fraunhofer.de
Institutszentrum Schloss Birlinghoven         mobile +49-160-9664-2129
Fraunhofer-Gesellschaft                          work +49-30-8109-9027
53754 Sankt Augustin, Germany                    fax +49-2241-144-2352
Personal email: thbaker79@alumni.amherst.edu

Received on Wednesday, 27 October 2004 13:46:28 UTC