[VM,ALL] Revised VM Task Force description

Dear all,

I have revised the description of the Vocabulary Management
Task Force (below).  It ended up turning into something of
an outline:

1. First we define our terms.
2. We articulate our assumptions regarding the scope of 
   "vocabulary use in a Semantic Web context".
3. We formulate principles of good practice for identifying 
   and declaring terms and term sets.
4. We identify and summarize related problems about which 
   good practice is still evolving.

After reviewing recent discussion on the list as well as
materials from Tim Berners-Lee, SKOS, OASIS Published Subjects,
the Proposed TAG Finding on Versioning XML Languages, etc etc,
and of course DCMI, I feel some hope that agreement on parts
of the "good practice" section (Section 3) might actually
be achievable...

I picture the deliverable as roughly fifteen pages long,
which means no more than a maximum of one page each even for
the hairiest of the bullet points in Section 4.  I'm thinking
we could perhaps divide up responsibility for drafting these
points among the Task Force members.

Ideally I would sit on this draft for a few days, but I want
to get this out before the telecon this afternoon.  The next
telecon falls on my first day of vacation (July 8), and I
return in August, which is a hopeless month for group work.

In today's call I would like to establish whether the document
is good enough as a task-force description to turn into a
First Draft and move ahead with in September.

Tom

P.S. Note that I have listed as Members everyone who
indicated even a tentative interest in the TF.

P.S.S. If Ralph can give me CVS Put access, I'd be happy
to move the draft to the CVS space.

-----

SWBPD "Vocabulary Management" Task Force Description
Draft, 2004-06-24

NAME          
    Vocabulary Management

STATUS        
    Considered

COORDINATORS  
    Tom Baker and ?

MEMBERS
    Libby Miller
    Natasha Noy
    Dan Brickley 
    Alistair Miles
    Alan Rector
    James Hendler
    Aldo Gangemi
    Bernard Vatant
    Ralph Swick

OBJECTIVES

1. To establish the terminology for our discussion of the
   declaration, identification, use, and management of
   vocabulary terms in a Semantic Web environment -- something
   roughly along the lines of:

   -- Term
   -- Vocabulary (a set of Terms)
   -- Namespace (hmm...)
   -- Namespace URI (identifies a Namespace)
   -- Namespace Owner (controls a Namespace)
   -- Language (uses and mixes Vocabularies)
   -- Versioning (identification of changes to a Language)
   -- Term Concept (notional)
   -- Term URI (identifies a Term Concept)
   -- Term Annotation (a representation of or gloss on a Term Concept)
   -- Term Version (an identifiable state of a cluster of Term Annotations)
   -- Term Version URI (identifies a Term Version)
   -- Term Declaration (represents a term in a machine-processable schema 
      language)
   -- Namespace Document (definitive material about a Namespace)
   -- Namespace Schema (definitive material about a Namespace in a 
      machine-processable schema language).

2. To articulate assumptions regarding the use of terms in 
   a Semantic Web environment, including:

   -- Open, loosely-coupled, mixed-language environments
      ("the Web").

   -- Organizations or even individuals defining and publishing
      vocabulary terms in an open, bottom-up, and distributed
      process (as both desirable and de-facto).

   -- The need to support processes of referencing,
      repurposing, recombining, merging data from a diversity
      of sources.

   -- The need to support the inevitable evolution of languages
      ("evolvability").

   -- The Must Ignore Principle: "If you find a language element 
      you don't understand, ignore it" (e.g., IETF practice, 
      Tim Berners-Lee, TAG Finding on Versioning).

   -- The Principle of Free Extension: "Allow extensibility:
      language designers should create extensible languages"
      (TAG Finding on Versioning).  Languages are extensible
      if they can mix Vocabularies.

   -- An emerging infrastructure (keyword "registries") for 
      holding or harvesting Vocabularies for display, search, 
      tool configuration, inferencing, or other such services.  

3. To articulate guidelines of good practice for Namespace
   Owners to identify and declare Terms and Term Sets (Vocabularies)
   for use in a Semantic Web environment.  Something like:

   -- Identify Terms using URIs.

   -- Term URIs should remain stabile within the limits of
      "semantically compatible" change and evolution of the
      Terms identified (where "semantically compatible"
      is defined with respect to backwards and forward
      compatibility, as in the TAG Finding on Versioning).

   -- Associate URI-identified Terms with human-interpretable
      Term Annotations -- usually, at a minimum, with text
      defining the Term.

   -- Consider associating the URI-identified Terms with
      machine-processable Term Declarations in Namespace
      Schemas.

   -- Optionally, identify Term Versions using URIs.
      Follow (by analogy) the W3C method of distinguishing
      the timeless "Latest Version" from the date-stamped
      "This Version" and "Previous Version" (is this method
      formally described anywhere?).

   -- The Namespace Owner should describe and publish a
      description of the terms identified by URIs and of
      policies governing their maintenance, e.g.: expectations
      about persistence, institutional commitment, and
      semantic stability.

   -- Only a Namespace Owner should change the meaning of a Term 
      in a namespace (though non-owners may constrain meanings in
      semantically compatible ways for use in specific contexts).

   -- When making assertions about terms belonging to another 
      Namespace Owner, consider seeking their endorsement of 
      those assertions ("assertion etiquette" or "good neighbor" 
      policies).

   -- Version Namespace Documents and Namespace Schemas the way
      W3C versions documents and schemas.

4. To point to and briefly summarize ongoing the evolving
   diversity of practices and approaches to declaring and
   managing vocabularies.  The following problems should each
   be discussed in one page or less:

    -- The problem of resolving (dereferencing) Term URIs.
       URI-identified Terms should be associated with or
       resolve to what sort of human-interpretable Term
       Annotations or machine-processable Term Declarations?
       The VM note should summarize the state of discussion
       about whether a URI resolves to anything at all, and if
       so, whether to a Web page, a machine-processable schema
       (of whatever flavor), or a resource directory, pointing
       to examples in practice.  If Terms are documented in
       multiple ways, should a Namespace Owner distinguish
       between "canonical" versus "derived" sources?

    -- The problem of work-flow and tools for documenting
       Terms.  The VM note should point to tools and methods
       for maintaining multiple documentation forms, such as
       schemas and Web pages.

    -- The problem of finding versus becoming a Namespace
       Owner.  People want to know: "If we want to declare
       a term but lack the institutional context to support
       a persistent namespace policy, how can we do it?
       Should I use an existing term, get a Namespace Owner
       (such as DCMI) to declare one, or declare my own?
       If I were to coin my own URI, where could I put it?"

    -- The problem of describing Terms. What are the properties
       of a Term Annotation or Term Declaration?  Besides
       a Definition, what are some of the properties
       more commonly in use?  How important is it for
       interoperability to use existing properties in Term
       Annotations or Term Declarations?

    -- The schema language of a Term Declaration: The
       VM note should not take a stand on the use of
       a particular flavor of OWL/RDF+S for declaring a
       vocabulary but should simply point to documents
       which focus on this issue.

    -- The formation of URIs.  The issues here include
       "hash or slash", the implied semantics of language
       strings and of implied directory hierarchies in URIs,
       and the use of version numbers in URI strings.

    -- Application profiles.  Most vocabulary initiatives
       end up having some notion of "profile" to designate
       either a constrained subset of a vocabulary and/or
       a language which mixes multiple vocabularies for
       a particular purpose or application.  The VM note
       should characterize the nature of these constructs,
       possibly referring to notions such as Term Usage (a
       cluster of Term Annotations about a Term of which one
       is not the Namespace Owner).

    -- The problem of "semantic context".  Terms may be
       embedded in clusters of relations from which they
       may be seen in part to derive their meaning.  It may
       therefore not always be sensible to use those terms out
       of context.  Examples include the terms of thesauri
       or ontologies, as well as XML elements, which may
       be defined with respect to parent elements and may
       therefore not always be reusable as properties in an
       RDF sense without violating their semantic intent.

APPROACH
    The issues above have been discussed and documented in
    various vocabulary maintenance communities.  The Task
    Force deliverable will provide an overview of the issues
    and principles involved in declaring and maintaining
    a vocabulary, pointing to available examples of good
    practice.  In order to do this, it must first define
    a common terminology for describing the diversity of
    practices in a comparable manner.

SCOPE
    Guidelines and principles for the identification,
    declaration, and management of Terms in Vocabularies
    (Metadata Element Sets, Thesauri, Ontologies, Published
    Subjects, and the like).

DELIVERABLE
    A relatively concise (fifteen-page?) technical note
    summarizing principles of good practice, with pointers to
    examples, about the identification of terms and term sets
    with URIs, related policies and etiquette, and expectations
    regarding documentation.

TARGET AUDIENCE
    -- Maintainers of terms and term sets (vocabularies)
       for use in a Semantic Web environment.
    -- Anyone else wishing to declare terms reusably.

DEPENDENCIES (in the broadest sense)
    -- THES - SWBP Thesaurus Task Force
       http://www.w3.org/2004/03/thes-tf/mission
    -- FOAF
       http://xmlns.com/foaf/0.1/
       http://www.w3.org/2001/sw/Europe/events/foaf-galway/
    -- Dublin Core - DCMI, for example:
       http://dublincore.org/documents/dcmi-namespace/
       http://dublincore.org/documents/dcmi-terms/
    -- Dublin Core - CEN MMI-DC Working Group
       http://www.bi.fhg.de/People/Thomas.Baker/Versioning-20040611.txt
       http://www.cenorm.be/isss/cwa14855/
    -- Proposed TAG Finding on Versioning XML Languages
       http://www.w3.org/2001/tag/doc/versioning/
    -- SKOS - SWAD Europe
       http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/
       http://www.w3.org/2004/skos/core.rdf
       http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
    -- W3C TAG on "What should a 'namespace document' look like?
       http://www.w3.org/2001/tag/issues.html#namespaceDocument-8
    -- SWAD-E Thesaurus (wants "standard" thesaurus change management guidelines)
       http://lists.w3.org/Archives/Public/public-esw-thes/2004Apr/
    -- Image Annotation meeting in Madrid
       http://rdfig.xmlhack.com/2004/06/07/2004-06-07.html#1086615887.400193
    -- Tim Berners-Lee on Evolvability
       http://www.w3.org/DesignIssues/Evolution.html
    -- OASIS Published Subjects Technical Committee
       http://www.oasis-open.org/committees/download.php/3050/pubsubj-pt1-1.02-cs.pdf
       http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj
       http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/issues.htm
    -- OASIS (ISO/TS 15000) ebXMLRegistry Semantic Content (Carl Mattocks)
    -- Libby and Dan work on RDF query
       http://www.ilrt.bris.ac.uk/discovery/2001/06/process/
    -- Sandro's work on a vocabulary directory (reference needed)
    -- Alan: experience in medical contexts with large vocabularies
    -- Alistair: recommendations for change management
    -- CORES Resolution on Metadata Element Identifiers
       http://www.dlib.org/dlib/july03/baker/07baker.html


-- 
Dr. Thomas Baker                        Thomas.Baker@izb.fraunhofer.de
Institutszentrum Schloss Birlinghoven         mobile +49-160-9664-2129
Fraunhofer-Gesellschaft                          work +49-30-8109-9027
53754 Sankt Augustin, Germany                    fax +49-2241-144-2352
Personal email: thbaker79@alumni.amherst.edu

Received on Thursday, 24 June 2004 07:19:26 UTC