- From: Thomas Baker <thomas.baker@izb.fraunhofer.de>
- Date: Thu, 24 Jun 2004 13:20:51 +0200
- To: SW Best Practices <public-swbp-wg@w3.org>
Dear all, I have revised the description of the Vocabulary Management Task Force (below). It ended up turning into something of an outline: 1. First we define our terms. 2. We articulate our assumptions regarding the scope of "vocabulary use in a Semantic Web context". 3. We formulate principles of good practice for identifying and declaring terms and term sets. 4. We identify and summarize related problems about which good practice is still evolving. After reviewing recent discussion on the list as well as materials from Tim Berners-Lee, SKOS, OASIS Published Subjects, the Proposed TAG Finding on Versioning XML Languages, etc etc, and of course DCMI, I feel some hope that agreement on parts of the "good practice" section (Section 3) might actually be achievable... I picture the deliverable as roughly fifteen pages long, which means no more than a maximum of one page each even for the hairiest of the bullet points in Section 4. I'm thinking we could perhaps divide up responsibility for drafting these points among the Task Force members. Ideally I would sit on this draft for a few days, but I want to get this out before the telecon this afternoon. The next telecon falls on my first day of vacation (July 8), and I return in August, which is a hopeless month for group work. In today's call I would like to establish whether the document is good enough as a task-force description to turn into a First Draft and move ahead with in September. Tom P.S. Note that I have listed as Members everyone who indicated even a tentative interest in the TF. P.S.S. If Ralph can give me CVS Put access, I'd be happy to move the draft to the CVS space. ----- SWBPD "Vocabulary Management" Task Force Description Draft, 2004-06-24 NAME Vocabulary Management STATUS Considered COORDINATORS Tom Baker and ? MEMBERS Libby Miller Natasha Noy Dan Brickley Alistair Miles Alan Rector James Hendler Aldo Gangemi Bernard Vatant Ralph Swick OBJECTIVES 1. To establish the terminology for our discussion of the declaration, identification, use, and management of vocabulary terms in a Semantic Web environment -- something roughly along the lines of: -- Term -- Vocabulary (a set of Terms) -- Namespace (hmm...) -- Namespace URI (identifies a Namespace) -- Namespace Owner (controls a Namespace) -- Language (uses and mixes Vocabularies) -- Versioning (identification of changes to a Language) -- Term Concept (notional) -- Term URI (identifies a Term Concept) -- Term Annotation (a representation of or gloss on a Term Concept) -- Term Version (an identifiable state of a cluster of Term Annotations) -- Term Version URI (identifies a Term Version) -- Term Declaration (represents a term in a machine-processable schema language) -- Namespace Document (definitive material about a Namespace) -- Namespace Schema (definitive material about a Namespace in a machine-processable schema language). 2. To articulate assumptions regarding the use of terms in a Semantic Web environment, including: -- Open, loosely-coupled, mixed-language environments ("the Web"). -- Organizations or even individuals defining and publishing vocabulary terms in an open, bottom-up, and distributed process (as both desirable and de-facto). -- The need to support processes of referencing, repurposing, recombining, merging data from a diversity of sources. -- The need to support the inevitable evolution of languages ("evolvability"). -- The Must Ignore Principle: "If you find a language element you don't understand, ignore it" (e.g., IETF practice, Tim Berners-Lee, TAG Finding on Versioning). -- The Principle of Free Extension: "Allow extensibility: language designers should create extensible languages" (TAG Finding on Versioning). Languages are extensible if they can mix Vocabularies. -- An emerging infrastructure (keyword "registries") for holding or harvesting Vocabularies for display, search, tool configuration, inferencing, or other such services. 3. To articulate guidelines of good practice for Namespace Owners to identify and declare Terms and Term Sets (Vocabularies) for use in a Semantic Web environment. Something like: -- Identify Terms using URIs. -- Term URIs should remain stabile within the limits of "semantically compatible" change and evolution of the Terms identified (where "semantically compatible" is defined with respect to backwards and forward compatibility, as in the TAG Finding on Versioning). -- Associate URI-identified Terms with human-interpretable Term Annotations -- usually, at a minimum, with text defining the Term. -- Consider associating the URI-identified Terms with machine-processable Term Declarations in Namespace Schemas. -- Optionally, identify Term Versions using URIs. Follow (by analogy) the W3C method of distinguishing the timeless "Latest Version" from the date-stamped "This Version" and "Previous Version" (is this method formally described anywhere?). -- The Namespace Owner should describe and publish a description of the terms identified by URIs and of policies governing their maintenance, e.g.: expectations about persistence, institutional commitment, and semantic stability. -- Only a Namespace Owner should change the meaning of a Term in a namespace (though non-owners may constrain meanings in semantically compatible ways for use in specific contexts). -- When making assertions about terms belonging to another Namespace Owner, consider seeking their endorsement of those assertions ("assertion etiquette" or "good neighbor" policies). -- Version Namespace Documents and Namespace Schemas the way W3C versions documents and schemas. 4. To point to and briefly summarize ongoing the evolving diversity of practices and approaches to declaring and managing vocabularies. The following problems should each be discussed in one page or less: -- The problem of resolving (dereferencing) Term URIs. URI-identified Terms should be associated with or resolve to what sort of human-interpretable Term Annotations or machine-processable Term Declarations? The VM note should summarize the state of discussion about whether a URI resolves to anything at all, and if so, whether to a Web page, a machine-processable schema (of whatever flavor), or a resource directory, pointing to examples in practice. If Terms are documented in multiple ways, should a Namespace Owner distinguish between "canonical" versus "derived" sources? -- The problem of work-flow and tools for documenting Terms. The VM note should point to tools and methods for maintaining multiple documentation forms, such as schemas and Web pages. -- The problem of finding versus becoming a Namespace Owner. People want to know: "If we want to declare a term but lack the institutional context to support a persistent namespace policy, how can we do it? Should I use an existing term, get a Namespace Owner (such as DCMI) to declare one, or declare my own? If I were to coin my own URI, where could I put it?" -- The problem of describing Terms. What are the properties of a Term Annotation or Term Declaration? Besides a Definition, what are some of the properties more commonly in use? How important is it for interoperability to use existing properties in Term Annotations or Term Declarations? -- The schema language of a Term Declaration: The VM note should not take a stand on the use of a particular flavor of OWL/RDF+S for declaring a vocabulary but should simply point to documents which focus on this issue. -- The formation of URIs. The issues here include "hash or slash", the implied semantics of language strings and of implied directory hierarchies in URIs, and the use of version numbers in URI strings. -- Application profiles. Most vocabulary initiatives end up having some notion of "profile" to designate either a constrained subset of a vocabulary and/or a language which mixes multiple vocabularies for a particular purpose or application. The VM note should characterize the nature of these constructs, possibly referring to notions such as Term Usage (a cluster of Term Annotations about a Term of which one is not the Namespace Owner). -- The problem of "semantic context". Terms may be embedded in clusters of relations from which they may be seen in part to derive their meaning. It may therefore not always be sensible to use those terms out of context. Examples include the terms of thesauri or ontologies, as well as XML elements, which may be defined with respect to parent elements and may therefore not always be reusable as properties in an RDF sense without violating their semantic intent. APPROACH The issues above have been discussed and documented in various vocabulary maintenance communities. The Task Force deliverable will provide an overview of the issues and principles involved in declaring and maintaining a vocabulary, pointing to available examples of good practice. In order to do this, it must first define a common terminology for describing the diversity of practices in a comparable manner. SCOPE Guidelines and principles for the identification, declaration, and management of Terms in Vocabularies (Metadata Element Sets, Thesauri, Ontologies, Published Subjects, and the like). DELIVERABLE A relatively concise (fifteen-page?) technical note summarizing principles of good practice, with pointers to examples, about the identification of terms and term sets with URIs, related policies and etiquette, and expectations regarding documentation. TARGET AUDIENCE -- Maintainers of terms and term sets (vocabularies) for use in a Semantic Web environment. -- Anyone else wishing to declare terms reusably. DEPENDENCIES (in the broadest sense) -- THES - SWBP Thesaurus Task Force http://www.w3.org/2004/03/thes-tf/mission -- FOAF http://xmlns.com/foaf/0.1/ http://www.w3.org/2001/sw/Europe/events/foaf-galway/ -- Dublin Core - DCMI, for example: http://dublincore.org/documents/dcmi-namespace/ http://dublincore.org/documents/dcmi-terms/ -- Dublin Core - CEN MMI-DC Working Group http://www.bi.fhg.de/People/Thomas.Baker/Versioning-20040611.txt http://www.cenorm.be/isss/cwa14855/ -- Proposed TAG Finding on Versioning XML Languages http://www.w3.org/2001/tag/doc/versioning/ -- SKOS - SWAD Europe http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/ http://www.w3.org/2004/skos/core.rdf http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping -- W3C TAG on "What should a 'namespace document' look like? http://www.w3.org/2001/tag/issues.html#namespaceDocument-8 -- SWAD-E Thesaurus (wants "standard" thesaurus change management guidelines) http://lists.w3.org/Archives/Public/public-esw-thes/2004Apr/ -- Image Annotation meeting in Madrid http://rdfig.xmlhack.com/2004/06/07/2004-06-07.html#1086615887.400193 -- Tim Berners-Lee on Evolvability http://www.w3.org/DesignIssues/Evolution.html -- OASIS Published Subjects Technical Committee http://www.oasis-open.org/committees/download.php/3050/pubsubj-pt1-1.02-cs.pdf http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/issues.htm -- OASIS (ISO/TS 15000) ebXMLRegistry Semantic Content (Carl Mattocks) -- Libby and Dan work on RDF query http://www.ilrt.bris.ac.uk/discovery/2001/06/process/ -- Sandro's work on a vocabulary directory (reference needed) -- Alan: experience in medical contexts with large vocabularies -- Alistair: recommendations for change management -- CORES Resolution on Metadata Element Identifiers http://www.dlib.org/dlib/july03/baker/07baker.html -- Dr. Thomas Baker Thomas.Baker@izb.fraunhofer.de Institutszentrum Schloss Birlinghoven mobile +49-160-9664-2129 Fraunhofer-Gesellschaft work +49-30-8109-9027 53754 Sankt Augustin, Germany fax +49-2241-144-2352 Personal email: thbaker79@alumni.amherst.edu
Received on Thursday, 24 June 2004 07:19:26 UTC