- From: Thomas Baker <thomas.baker@izb.fraunhofer.de>
- Date: Thu, 2 Sep 2004 16:33:44 +0200
- To: SW Best Practices <public-swbp-wg@w3.org>
Dear Members of the VM Task Force (listed below),
I would appreciate if each of you could - before the next
conference call - respond to the six questions I pose below
with regard to the scope and objectives of the planned VM note.
I will then summarize the results and we can use that as the
basis for planning further steps.
Tom
----
SWBPD "Vocabulary Management"
Draft, 2004-09-02
NAME
Vocabulary Management - Scoping Draft
STATUS
Considered
COORDINATORS
Tom Baker (Fraunhofer Society)
MEMBERS
Libby Miller (University of Bristol)
Natasha Noy (Stanford University)
Dan Brickley (W3C)
Alistair Miles (CCL)
Alan Rector (University of Manchester)
James Hendler (University of Maryland)
Aldo Gangemi (CNR)
Bernard Vatant (Mondeca)
Ralph Swick (W3C)
QUESTIONS TO TASK FORCE MEMBERS
1. About the overall the scope, goal, and audience:
Do the following scope and objectives seem reasonable?
A paper addressing issues of this magnitude could be as
long as we would want to make it, but I personally think
it could be helpful to set out with an upper limit of
pages in mind (not counting the bibliography :-). Do you
agree? Does 15 to 20 pages seem like a reasonable target?
SCOPE
Guidelines and principles for the identification,
declaration, and management of Terms in Vocabularies
(Metadata Element Sets, Thesauri, Ontologies, Published
Subjects, and the like).
DELIVERABLE
A relatively concise (15-20 pages) technical note
summarizing principles of good practice, with pointers to
examples, about the identification of terms and term sets
with URIs, related policies and etiquette, and expectations
regarding documentation.
TARGET AUDIENCE
-- Maintainers of terms and term sets (vocabularies)
for use in a Semantic Web environment.
-- Anyone else wishing to declare terms reusably.
2. About the specific objectives -- the outline -- of the
VM note: Does the following seem reasonable as an overall
outline for the VM note? Questions about the individual
sections will follow below.
Section 1. Terminology
I assume that we would need to agree -- at least for
the purposes of the VM note -- on the meaning of some
basic terms. Section 1, then, would define a list of a
dozen or so basic terms such as "term" and "vocabulary".
Section 2. Vocabularies in the Semantic Web
I assume we need to characterize what it is we are
talking about in terms of standard buzzwords that
people will have heard, such as Metadata Element Sets,
Controlled Vocabularies, Taxonomies, and Ontologies.
I also assume we should not neglect to articulate
some really basic assumptions about the Semantic Web,
such as data merging and repurposing.
Section 3. Principles of Good Practice
I assume we will be able to agree on some really basic
principles, such as "Identify Terms with URIs (or
URIrefs)" or "Articulate any policies or assumptions
underlying the assignment of URIs". Beyond that, we
should see how far we can go. Personally, I believe
that if we could articulate half a dozen or so simple
principles and elaborate on each principle in two or
three paragraphs, with pointers to actual practice,
these principles could form the core contribution of
the VM note.
Section 4. Evolving issues
On many issues we will not be able to agree, whether
because the issues are controversial (e.g., the idea of
"ownership" of a namespace which surfaced on this list
in response to an earlier draft) or because they are
the object of ongoing discussion and experimentation.
We should try to distill these issues down to a
"manageable" number -- a dozen or so -- and discuss
each issue in one or two paragraphs which describe
the issue and characterize the main viewpoints, areas
of development, or controversies, with pointers to
the literature. In my opinion, a "manageable" number
is important not just to aid the reader, but also to
allow us to divide ownership of the issues among Task
Force members.
3. About Section 1 - Terminology: Is it reasonable to think
we could agree on a terminology (for the paper) roughly
of the following scope? Does it obviously go much too
far -- or not far enough? Should we specifically link
the terminology section to the issues about which we feel
prepared to articulate good-practice guidelines in Section 2
(e.g., Identity, Ownership, Versioning...)?
-- Term
-- Vocabulary or Term Set (a set of Terms)
-- Namespace (hmm...)
-- Namespace URI (identifies a Namespace)
-- Namespace Owner (controls a Namespace)
-- Language (uses and mixes Vocabularies)?
-- Versioning (identification of changes to a Term or Term Set)
-- Term Concept (notional)
-- Term URI (identifies a Term Concept)
-- Term Annotation (a representation of or gloss on a Term Concept)
-- Term Version (an identifiable state of a cluster of Term Annotations)
-- Term Version URI (identifies a Term Version)
-- Term Declaration (represents a term in a machine-processable schema
language)
-- Namespace Document (definitive material about a Namespace)
-- Namespace Schema (definitive material about a Namespace in a
machine-processable schema language).
4. About Section 2 - Vocabularies in the Semantic Web: Here
is a very rough list of assumptions and principles that
come to mind when one thinks of the Semantic Web. Does it
seem like we need a section articulating assumptions on
this very basic level (this depends of course on our
target audience)? I'm thinking two or three pages -
does that seem about right?
-- Open, loosely-coupled, mixed-language environments
("the Web").
-- Organizations or even individuals defining and publishing
vocabulary terms in an open, bottom-up, and distributed
process (as both desirable and de-facto).
-- The need to support processes of referencing,
repurposing, recombining, merging data from a diversity
of sources.
-- The need to support the inevitable evolution of languages
("evolvability").
-- The Must Ignore Principle: "If you find a language element
you don't understand, ignore it" (e.g., IETF practice,
Tim Berners-Lee, TAG Finding on Versioning).
-- The Principle of Free Extension: "Allow extensibility:
language designers should create extensible languages"
(TAG Finding on Versioning). Languages are extensible
if they can mix Vocabularies.
-- An emerging infrastructure (keyword "registries") for
holding or harvesting Vocabularies for display, search,
tool configuration, inferencing, or other such services.
4. About Section 3 - Principles of Good Practice: This is the
part which (I would hope) could form the core contribution
of the VM note, and here is a strawman attempt at
articulating a few things resembling the sort of principles
I have in mind. Are these the sorts of things about
which we should try to get agreement? Do you agree that
not all of the principles need to be "Must" principles --
that some could be "May" or "Also Consider" principles?
In particular -- since it has already come up on the list
-- do you think "Namespace Ownership" could belong here
or does it belong in Section 4?
-- Identify Terms using URIs.
-- Term URIs should remain stable within the limits of
"semantically compatible" change and evolution of the
Terms identified (where "semantically compatible"
is defined with respect to backwards and forward
compatibility, as in the TAG Finding on Versioning).
-- Associate URI-identified Terms with human-interpretable
Term Annotations -- usually, at a minimum, with text
defining the Term.
-- Consider associating the URI-identified Terms with
machine-processable Term Declarations in Namespace
Schemas.
-- Optionally, identify Term Versions using URIs.
Follow (by analogy) the W3C method of distinguishing
the timeless "Latest Version" from the date-stamped
"This Version" and "Previous Version" (is this method
formally described anywhere?).
-- Version Namespace Documents and Namespace Schemas the way
W3C versions documents and schemas.
-- The Namespace Owner should describe and publish a
description of the terms identified by URIs and of
policies governing their maintenance, e.g.: expectations
about persistence, institutional commitment, and
semantic stability.
-- Only a Namespace Owner should change the meaning of a Term
in a namespace (though non-owners may constrain meanings in
semantically compatible ways for use in specific contexts).
-- When making assertions about terms belonging to another
Namespace Owner, consider seeking their endorsement of
those assertions ("assertion etiquette" or "good neighbor"
policies).
5. About Section 4 - Evolving issues: Squinting very hard,
does this seem like a reasonable start at a list of more
experimental or controversial issues or of issues that
should be mentioned but for whatever reason should be out
of scope for Section 3? Might you want to take ownership of
any of these issues? Do any of the issues clearly overlap
with other SWBPD Task Forces?
-- The problem of resolving (dereferencing) Term URIs.
URI-identified Terms should be associated with or
resolve to what sort of human-interpretable Term
Annotations or machine-processable Term Declarations?
The VM note could summarize the state of discussion
about whether a URI resolves to anything at all, and if
so, whether to a Web page, a machine-processable schema
(of whatever flavor), or a resource directory, pointing
to examples in practice. If Terms are documented in
multiple ways, should a Namespace Owner distinguish
between "canonical" versus "derived" sources?
-- The problem of work-flow and tools for documenting
Terms. The VM note could point to tools and methods
for maintaining multiple documentation forms, such as
schemas and Web pages.
-- The problem of finding versus becoming a Namespace
Owner. People want to know: "If we want to declare
a term but lack the institutional context to support
a persistent namespace policy, how can we do it?
Should I use an existing term, get a Namespace Owner
(such as DCMI) to declare one, or declare my own?
If I were to coin my own URI, where could I put it?"
-- The problem of describing Terms. What are the properties
of a Term Annotation or Term Declaration? Besides
a Definition, what are some of the properties
more commonly in use? How important is it for
interoperability to use existing properties in Term
Annotations or Term Declarations?
-- The schema language of a Term Declaration: The
VM note should perhaps not take a stand on the use
of a particular flavor of OWL/RDF+S for declaring a
vocabulary but should simply point to documents which
focus on this issue.
-- The formation of URIs. The issues here include
"hash or slash", the implied semantics of language
strings and of implied directory hierarchies in URIs,
and the use of version numbers in URI strings.
-- Application profiles. Most vocabulary initiatives
end up having some notion of "profile" to designate
either a constrained subset of a vocabulary and/or
a language which mixes multiple vocabularies for
a particular purpose or application. The VM note
should characterize the nature of these constructs,
possibly referring to notions such as Term Usage (a
cluster of Term Annotations about a Term of which one
is not the Namespace Owner).
-- The problem of "semantic context". Terms may be
embedded in clusters of relations from which they
may be seen in part to derive their meaning. It may
therefore not always be sensible to use those terms out
of context. Examples include the terms of thesauri
or ontologies, as well as XML elements, which may
be defined with respect to parent elements and may
therefore not always be reusable as properties in an
RDF sense without violating their semantic intent.
6. Do you see any obvious gaps in the proto-bibliography below?
Which of the following do you already know well or are you
particularly interested in learning more about?
-- THES - SWBP Thesaurus Task Force
http://www.w3.org/2004/03/thes-tf/mission
-- FOAF
http://xmlns.com/foaf/0.1/
http://www.w3.org/2001/sw/Europe/events/foaf-galway/
-- Dublin Core - DCMI, for example:
http://dublincore.org/documents/dcmi-namespace/
http://dublincore.org/documents/dcmi-terms/
http://www.ukoln.ac.uk/metadata/dcmi/term-identifier-guidelines/ (first draft)
-- Dublin Core - CEN MMI-DC Working Group
http://www.bi.fhg.de/People/Thomas.Baker/Versioning-20040611.txt
http://www.cenorm.be/isss/cwa14855/
-- Proposed TAG Finding on Versioning XML Languages
http://www.w3.org/2001/tag/doc/versioning/
-- SKOS - SWAD Europe
http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/
http://www.w3.org/2004/skos/core.rdf
http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
-- W3C TAG on "What should a 'namespace document' look like?
http://www.w3.org/2001/tag/issues.html#namespaceDocument-8
http://www.w3.org/2003/09/15-tag-summary.html - TAG "consensus" on namespace documents
-- SWAD-E Thesaurus (wants "standard" thesaurus change management guidelines)
http://lists.w3.org/Archives/Public/public-esw-thes/2004Apr/
-- Image Annotation meeting in Madrid
http://rdfig.xmlhack.com/2004/06/07/2004-06-07.html#1086615887.400193
-- Tim Berners-Lee on Evolvability
http://www.w3.org/DesignIssues/Evolution.html
-- OASIS Published Subjects Technical Committee
http://www.oasis-open.org/committees/download.php/3050/pubsubj-pt1-1.02-cs.pdf
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj
http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/issues.htm
-- OASIS (ISO/TS 15000) ebXMLRegistry Semantic Content (Carl Mattocks)
-- Libby and Dan work on RDF query
http://www.ilrt.bris.ac.uk/discovery/2001/06/process/
-- Sandro's work on a vocabulary directory (reference needed)
-- Alan: experience in medical contexts with large vocabularies (reference needed)
-- Alistair: recommendations for change management (reference needed)
-- CORES Resolution on Metadata Element Identifiers
http://www.dlib.org/dlib/july03/baker/07baker.html
-- Mailing list addressing questions of "namespace ownership" (Jeremy)
http://lists.w3.org/Archives/Public/public-sw-meaning/2004Jun/
-- RDF Core discussion on issues related to social meaning (Jeremy)
http://www.w3.org/TR/2003/WD-rdf-concepts-20030123/#section-Meaning - had WG consensus
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0366 - got trashed
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0486 - and revised
http://www.w3.org/2001/sw/meetings/tech-200303/social-meaning
--
Dr. Thomas Baker Thomas.Baker@izb.fraunhofer.de
Institutszentrum Schloss Birlinghoven mobile +49-160-9664-2129
Fraunhofer-Gesellschaft work +49-30-8109-9027
53754 Sankt Augustin, Germany fax +49-2241-144-2352
Personal email: thbaker79@alumni.amherst.edu
Received on Thursday, 2 September 2004 14:30:20 UTC