RE: [VM] Scoping Draft with questions to TF members

See inline comments labelled by [MFU]

-----Original Message-----
From: Thomas Baker [mailto:thomas.baker@izb.fraunhofer.de] 
Sent: Thursday, September 02, 2004 7:34 AM
To: SW Best Practices
Subject: [VM] Scoping Draft with questions to TF members


Dear Members of the VM Task Force (listed below),

I would appreciate if each of you could - before the next
conference call - respond to the six questions I pose below
with regard to the scope and objectives of the planned VM note.

I will then summarize the results and we can use that as the
basis for planning further steps.

Tom

----

SWBPD "Vocabulary Management" 
Draft, 2004-09-02

NAME          
    Vocabulary Management - Scoping Draft

STATUS        
    Considered

COORDINATORS  
    Tom Baker (Fraunhofer Society)

MEMBERS
    Libby Miller (University of Bristol)
    Natasha Noy (Stanford University)
    Dan Brickley (W3C)
    Alistair Miles (CCL)
    Alan Rector (University of Manchester)
    James Hendler (University of Maryland)
    Aldo Gangemi (CNR)
    Bernard Vatant (Mondeca)
    Ralph Swick (W3C)

QUESTIONS TO TASK FORCE MEMBERS

1. About the overall the scope, goal, and audience:
   Do the following scope and objectives seem reasonable?
   A paper addressing issues of this magnitude could be as
   long as we would want to make it, but I personally think
   it could be helpful to set out with an upper limit of
   pages in mind (not counting the bibliography :-).  Do you
   agree? Does 15 to 20 pages seem like a reasonable target?

    SCOPE
        Guidelines and principles for the identification,
        declaration, and management of Terms in Vocabularies
        (Metadata Element Sets, Thesauri, Ontologies, Published
        Subjects, and the like).

    DELIVERABLE
        A relatively concise (15-20 pages) technical note
        summarizing principles of good practice, with pointers to
        examples, about .

[MFU] The phrase "the identification of terms and term sets
        with URIs, related policies and etiquette, and expectations
        regarding documentation" might be extended to include
representing things using OWL.  Perhaps that's implicit and obvious and
not needed to be said?  Here is one way to use OWL to represent a
thesaurus: create properties for the standard thesaurus relations:
narrower-than, related-to etc. This effectly encodes the thesaurus
language in OWL rather than using OWL directly. THis is a very important
issue that should be addressed. I have seen this kind of thing done. It
ahs the advantage of simplicity, but it also seems like a hacky way to
do something, and it tends to make minimal use of any of the built-in
features of OWL.

    TARGET AUDIENCE
        -- Maintainers of terms and term sets (vocabularies)
           for use in a Semantic Web environment.
        -- Anyone else wishing to declare terms reusably.

2. About the specific objectives -- the outline -- of the
   VM note:  Does the following seem reasonable as an overall
   outline for the VM note?  Questions about the individual
   sections will follow below.

    Section 1. Terminology

       I assume that we would need to agree -- at least for
       the purposes of the VM note -- on the meaning of some
       basic terms.  Section 1, then, would define a list of a
       dozen or so basic terms such as "term" and "vocabulary".
[MFU] Good idea, but it may be more challenging than you think.

    Section 2. Vocabularies in the Semantic Web

       I assume we need to characterize what it is we are
       talking about in terms of standard buzzwords that
       people will have heard, such as Metadata Element Sets,
       Controlled Vocabularies, Taxonomies, and Ontologies.
       I also assume we should not neglect to articulate
       some really basic assumptions about the Semantic Web,
       such as data merging and repurposing.
[MFU] See following link for some good input on this.
http://www.metamodel.com/article.php?story=20030115211223271&mode=print 

    Section 3. Principles of Good Practice

       I assume we will be able to agree on some really basic
       principles, such as "Identify Terms with URIs (or
       URIrefs)" or "Articulate any policies or assumptions
       underlying the assignment of URIs".  Beyond that, we
       should see how far we can go.  Personally, I believe
       that if we could articulate half a dozen or so simple
       principles and elaborate on each principle in two or
       three paragraphs, with pointers to actual practice,
       these principles could form the core contribution of
       the VM note.
[MFU] If possible, find a good example to drive this.

    Section 4. Evolving issues

       On many issues we will not be able to agree, whether
       because the issues are controversial (e.g., the idea of
       "ownership" of a namespace which surfaced on this list
       in response to an earlier draft) or because they are
       the object of ongoing discussion and experimentation.
       We should try to distill these issues down to a
       "manageable" number -- a dozen or so -- and discuss
       each issue in one or two paragraphs which describe
       the issue and characterize the main viewpoints, areas
       of development, or controversies, with pointers to
       the literature.  In my opinion, a "manageable" number
       is important not just to aid the reader, but also to
       allow us to divide ownership of the issues among Task
       Force members.
[MFU] It is important to keep scope within reason. In the event that too
many thing emerge to be addressed, some priorities will have to be set,
and criteria for them.  In this event, it would be useful to at least
MENTION that many other issues arose, name and discuss the issue in a
sentence or two, and comment on when/whether it might need to be
addressed in the future, how important is it? Why, why not?

3. About Section 1 - Terminology: Is it reasonable to think
   we could agree on a terminology (for the paper) roughly
   of the following scope?  Does it obviously go much too
   far -- or not far enough?  Should we specifically link
   the terminology section to the issues about which we feel
   prepared to articulate good-practice guidelines in Section 2
   (e.g., Identity, Ownership, Versioning...)?

[MFU] I think it makes sense to see if you can get agreement on at least
a handful of key terms, if only for the purpose of writing the note.
What is most likely to occur is that a given term will in fact refer to
a handful of distinct (though related) concepts/notions.  Focussing on
defining TERMS can lead to endless round-in-circles discussions. I find
it more productive to FIRST identify the important NOTIONS/CONCEPTS that
you need to talk about, choose the ones that you want terms for, and
then try and find term that everyone agrees to.    See END OF THIS
MESSAGE for an example of this technique to resolve a dispute on how to
define "role". You will never get agreement on ONE definition, rather it
refers to various distinct notions.

[MFU] You also will need to note somewhere that the terms have other
meanings that are commonly used, but that you are using them in ONE
PARTICULAR WAY.

   -- Term
   -- Vocabulary or Term Set (a set of Terms)
   -- Namespace (hmm...)
   -- Namespace URI (identifies a Namespace)
   -- Namespace Owner (controls a Namespace)
   -- Language (uses and mixes Vocabularies)?
   -- Versioning (identification of changes to a Term or Term Set)
   -- Term Concept (notional)
   -- Term URI (identifies a Term Concept)
   -- Term Annotation (a representation of or gloss on a Term Concept)
   -- Term Version (an identifiable state of a cluster of Term
Annotations)
   -- Term Version URI (identifies a Term Version)
   -- Term Declaration (represents a term in a machine-processable
schema 
      language)
   -- Namespace Document (definitive material about a Namespace)
   -- Namespace Schema (definitive material about a Namespace in a 
      machine-processable schema language).

4. About Section 2 - Vocabularies in the Semantic Web:  Here
   is a very rough list of assumptions and principles that
   come to mind when one thinks of the Semantic Web.  Does it
   seem like we need a section articulating assumptions on
   this very basic level (this depends of course on our
   target audience)?  I'm thinking two or three pages -
   does that seem about right?

   -- Open, loosely-coupled, mixed-language environments
      ("the Web").

   -- Organizations or even individuals defining and publishing
      vocabulary terms in an open, bottom-up, and distributed
      process (as both desirable and de-facto).

   -- The need to support processes of referencing,
      repurposing, recombining, merging data from a diversity
      of sources.

   -- The need to support the inevitable evolution of languages
      ("evolvability").

   -- The Must Ignore Principle: "If you find a language element 
      you don't understand, ignore it" (e.g., IETF practice, 
      Tim Berners-Lee, TAG Finding on Versioning).

   -- The Principle of Free Extension: "Allow extensibility:
      language designers should create extensible languages"
      (TAG Finding on Versioning).  Languages are extensible
      if they can mix Vocabularies.

   -- An emerging infrastructure (keyword "registries") for 
      holding or harvesting Vocabularies for display, search, 
      tool configuration, inferencing, or other such services.  

[MFU] Many of these are not vocabulary-specific, and pertain to other
areas.  Beware of scope getting too big - and/or keep the discussion of
the general issue specific to how it impacts on vocabulary management.

4. About Section 3 - Principles of Good Practice:  This is the
   part which (I would hope) could form the core contribution
   of the VM note, and here is a strawman attempt at
   articulating a few things resembling the sort of principles
   I have in mind.  Are these the sorts of things about
   which we should try to get agreement?  Do you agree that
   not all of the principles need to be "Must" principles --
   that some could be "May" or "Also Consider" principles?
[MFU] This seems like a good idea, modulo the long discussions we had
about not dictating what people should do, but rather saying if you do
this or that, these are the consequences of those decisions.

   In particular -- since it has already come up on the list
   -- do you think "Namespace Ownership" could belong here
   or does it belong in Section 4?

   -- Identify Terms using URIs.

   -- Term URIs should remain stable within the limits of
      "semantically compatible" change and evolution of the
      Terms identified (where "semantically compatible"
      is defined with respect to backwards and forward
      compatibility, as in the TAG Finding on Versioning).

   -- Associate URI-identified Terms with human-interpretable
      Term Annotations -- usually, at a minimum, with text
      defining the Term.

   -- Consider associating the URI-identified Terms with
      machine-processable Term Declarations in Namespace
      Schemas.

   -- Optionally, identify Term Versions using URIs.
      Follow (by analogy) the W3C method of distinguishing
      the timeless "Latest Version" from the date-stamped
      "This Version" and "Previous Version" (is this method
      formally described anywhere?).

   -- Version Namespace Documents and Namespace Schemas the way
      W3C versions documents and schemas.

   -- The Namespace Owner should describe and publish a
      description of the terms identified by URIs and of
      policies governing their maintenance, e.g.: expectations
      about persistence, institutional commitment, and
      semantic stability.

   -- Only a Namespace Owner should change the meaning of a Term 
      in a namespace (though non-owners may constrain meanings in
      semantically compatible ways for use in specific contexts).

   -- When making assertions about terms belonging to another 
      Namespace Owner, consider seeking their endorsement of 
      those assertions ("assertion etiquette" or "good neighbor" 
      policies).

[MFU] Again, some of these things are more general issues that pertain
outside vocabulary management. It would be helpful to identify when this
is the case, and feed that information back to the WG which could be
input to a more general note about things that apply across TFs.

5. About Section 4 - Evolving issues: Squinting very hard,
   does this seem like a reasonable start at a list of more
   experimental or controversial issues or of issues that
   should be mentioned but for whatever reason should be out
   of scope for Section 3?  Might you want to take ownership of
   any of these issues?  Do any of the issues clearly overlap
   with other SWBPD Task Forces?

    -- The problem of resolving (dereferencing) Term URIs.
       URI-identified Terms should be associated with or
       resolve to what sort of human-interpretable Term
       Annotations or machine-processable Term Declarations?
       The VM note could summarize the state of discussion
       about whether a URI resolves to anything at all, and if
       so, whether to a Web page, a machine-processable schema
       (of whatever flavor), or a resource directory, pointing
       to examples in practice.  If Terms are documented in
       multiple ways, should a Namespace Owner distinguish
       between "canonical" versus "derived" sources?

    -- The problem of work-flow and tools for documenting
       Terms.  The VM note could point to tools and methods
       for maintaining multiple documentation forms, such as
       schemas and Web pages.

    -- The problem of finding versus becoming a Namespace
       Owner.  People want to know: "If we want to declare
       a term but lack the institutional context to support
       a persistent namespace policy, how can we do it?
       Should I use an existing term, get a Namespace Owner
       (such as DCMI) to declare one, or declare my own?
       If I were to coin my own URI, where could I put it?"

    -- The problem of describing Terms. What are the properties
       of a Term Annotation or Term Declaration?  Besides
       a Definition, what are some of the properties
       more commonly in use?  How important is it for
       interoperability to use existing properties in Term
       Annotations or Term Declarations?

    -- The schema language of a Term Declaration: The
       VM note should perhaps not take a stand on the use
       of a particular flavor of OWL/RDF+S for declaring a
       vocabulary but should simply point to documents which
       focus on this issue.

    -- The formation of URIs.  The issues here include
       "hash or slash", the implied semantics of language
       strings and of implied directory hierarchies in URIs,
       and the use of version numbers in URI strings.

    -- Application profiles.  Most vocabulary initiatives
       end up having some notion of "profile" to designate
       either a constrained subset of a vocabulary and/or
       a language which mixes multiple vocabularies for
       a particular purpose or application.  The VM note
       should characterize the nature of these constructs,
       possibly referring to notions such as Term Usage (a
       cluster of Term Annotations about a Term of which one
       is not the Namespace Owner).

    -- The problem of "semantic context".  Terms may be
       embedded in clusters of relations from which they
       may be seen in part to derive their meaning.  It may
       therefore not always be sensible to use those terms out
       of context.  Examples include the terms of thesauri
       or ontologies, as well as XML elements, which may
       be defined with respect to parent elements and may
       therefore not always be reusable as properties in an
       RDF sense without violating their semantic intent.

6. Do you see any obvious gaps in the proto-bibliography below?
   Which of the following do you already know well or are you 
   particularly interested in learning more about?

    -- THES - SWBP Thesaurus Task Force
       http://www.w3.org/2004/03/thes-tf/mission
    -- FOAF
       http://xmlns.com/foaf/0.1/
       http://www.w3.org/2001/sw/Europe/events/foaf-galway/
    -- Dublin Core - DCMI, for example:
       http://dublincore.org/documents/dcmi-namespace/
       http://dublincore.org/documents/dcmi-terms/
       http://www.ukoln.ac.uk/metadata/dcmi/term-identifier-guidelines/
(first draft)
    -- Dublin Core - CEN MMI-DC Working Group
       http://www.bi.fhg.de/People/Thomas.Baker/Versioning-20040611.txt
       http://www.cenorm.be/isss/cwa14855/
    -- Proposed TAG Finding on Versioning XML Languages
       http://www.w3.org/2001/tag/doc/versioning/
    -- SKOS - SWAD Europe
       http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/
       http://www.w3.org/2004/skos/core.rdf
       http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
    -- W3C TAG on "What should a 'namespace document' look like?
       http://www.w3.org/2001/tag/issues.html#namespaceDocument-8
       http://www.w3.org/2003/09/15-tag-summary.html - TAG "consensus"
on namespace documents
    -- SWAD-E Thesaurus (wants "standard" thesaurus change management
guidelines)
       http://lists.w3.org/Archives/Public/public-esw-thes/2004Apr/
    -- Image Annotation meeting in Madrid
 
http://rdfig.xmlhack.com/2004/06/07/2004-06-07.html#1086615887.400193
    -- Tim Berners-Lee on Evolvability
       http://www.w3.org/DesignIssues/Evolution.html
    -- OASIS Published Subjects Technical Committee
 
http://www.oasis-open.org/committees/download.php/3050/pubsubj-pt1-1.02-
cs.pdf
 
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj
 
http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/iss
ues.htm
    -- OASIS (ISO/TS 15000) ebXMLRegistry Semantic Content (Carl
Mattocks)
    -- Libby and Dan work on RDF query
       http://www.ilrt.bris.ac.uk/discovery/2001/06/process/
    -- Sandro's work on a vocabulary directory (reference needed)
    -- Alan: experience in medical contexts with large vocabularies
(reference needed)
    -- Alistair: recommendations for change management (reference
needed)
    -- CORES Resolution on Metadata Element Identifiers
       http://www.dlib.org/dlib/july03/baker/07baker.html
    -- Mailing list addressing questions of "namespace ownership"
(Jeremy)
       http://lists.w3.org/Archives/Public/public-sw-meaning/2004Jun/
    -- RDF Core discussion on issues related to social meaning (Jeremy)
 
http://www.w3.org/TR/2003/WD-rdf-concepts-20030123/#section-Meaning -
had WG consensus
 
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0366 -
got trashed
 
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0486 -
and revised
       http://www.w3.org/2001/sw/meetings/tech-200303/social-meaning



-- 
Dr. Thomas Baker                        Thomas.Baker@izb.fraunhofer.de
Institutszentrum Schloss Birlinghoven         mobile +49-160-9664-2129
Fraunhofer-Gesellschaft                          work +49-30-8109-9027
53754 Sankt Augustin, Germany                    fax +49-2241-144-2352
Personal email: thbaker79@alumni.amherst.edu

[MFU] ====================================================
APPENDIX: NOTIONS ABOUT ROLES

NOTIONS ABOUT ROLES: NOt allowed to use the term 'role' since it means
to many different things. Using the term GETS IN THE WAY of
understanding. Hence invent a meaningless identifier, here we use foo1,
foo2, etc.
NB: this was the first brainstorming, not a tidy summary of distinct
notions.

Foo1: a set of things defined by being in a (possibly unary)
relationship with something in a certain way. e.g.  teacher =  {x |
exists x and y and teaching(x,y)}.  In general, a Foo1 is a class of
objects defined roughly like this: {x | R(x,y)}.
Place-in-relationship-defined class
*	a kind of class  (meta-class)
*	we're not concerend with this level of granularity, we are only
concerened with 

For the purpose of this exercise, we are not concerned with non-active
roles.

Foo2: special kind of  Foo1 such that the relationship, R is one to do
with active doing requiring a capability.  E.g R can be "teaching", but
not "to-the-left-of". We use Ra to denote this kind of R.
e.g.  a agent that is teaching  by virtue of being in the teaching
relationship.
Place-in-active-relationship-defined class
*	a kind of class  (meta-class)

Foo2-I a Foo2 where R is an 'interactive' relationship. This is
problematic. Not sure what an interactive relationship is. Selling? A
sale? The agreement to exchange goods for money, or the back and forth
leading up to the sale itself. 
Request-for-proposal:
Nub of the problem: process vs. predicate
Let R be "in-a-conversation"
Place-in-active-interactive-relationship-defined class
*	a kind of class  (meta-class)

Foo2Ib: ('potential' / 'inactive' / capable) the class of agents that
has the capability to be an instance of Foo2I, but may not actually be
one now.  This could be modeled as a Foo1 where R is something like
can-teach. So we might have R1=teaching, and R2=can_teach.
e.g.  an agent that may or may not be teaching (i.e. in the teaching
relationship), but has the required capabilities to do so. For example:
R-primary = teaching(x,y)
R-secondary = can-teach(x)
Foo2I(teaching): teacher = {x| ...} set of all teachers currently
teaching
Foo1(can-teach): {x | can-teach(x)} set of all entities that can
participate as a teacher in the teaching relationship.
In this case, a Foo2Ib is a Foo1.

Issue: is the knowledge that an agent has a capability explicitly
advertised or is it hardwired in the 'calling' program that knows to
call that agent to do the thing.

Foo2Ic:  (empowered) the class of agents that has been empowered to
enter into an active relationship (and thus be a Foo2) but may not
actually be doing so now. 
e.g.  an agent that is certified to teach but may or may not be teaching
(i.e. in the teaching relationship), and may or may not have the
required capabilities to do so. (e.g our president)

[optional] Foo2I-d: Foo2I-b and Foo2I-c (with same R). i.e. is both
capable and certified.


Foo3: The capacity to perform a required set of (one or more) actions
involved in a foo2I.  e.g.  can teach, can sell, 
Capability 
* 

Foo3': The capacity to perform a required set of (zero or more) actions
involved in a foo1.  e.g.  can teach, can sell, can be to the left of 
For the purpose of this exercise, we are not concerned with non-active
roles.


Foo4: the specific position an agent takes in an interaction protocol
(notion of role in interaction diagram)

Foo5: has a set of related actions (in a Foo3) that are bunched together
and (maybe) are bound by a single Ra. An agent is said to play a Foo5.
e.g. hostess is a t
NB: there may be multiple levels, a foo5 might also be a member of a
higher level foo5.
Issue: the higher level foo5's won't be actions any more. 
Issue:granularity of these things. Also, is every member action
required, or can some be left out? Can of worms.  
[MFU] ====================================================

Received on Thursday, 2 September 2004 15:49:02 UTC