RE: [VM] Scoping Draft with questions to TF members

Tom

Quite late, but hopefully better than never, some (well, quite a few) in-line comments on
the Scoping Draft

> QUESTIONS TO TASK FORCE MEMBERS

> 1. About the overall the scope, goal, and audience:
>    Do the following scope and objectives seem reasonable?
>    A paper addressing issues of this magnitude could be as
>    long as we would want to make it, but I personally think
>    it could be helpful to set out with an upper limit of
>    pages in mind (not counting the bibliography :-).  Do you
>    agree? Does 15 to 20 pages seem like a reasonable target?
>
>     SCOPE
>         Guidelines and principles for the identification,
>         declaration, and management of Terms in Vocabularies
>         (Metadata Element Sets, Thesauri, Ontologies, Published
>         Subjects, and the like).

BV : OK. As said before, seems that the trickiest thing to achieve is to sort out generic
guidelines relevant to any kind of vocabularies, which IMO should boil down to very few
(at least that was the conclusion of Published Subjects TC, but the scope seemed to be a
little wider than Vocabularies), and specific ones for the main specific use cases you
quote. And how deep to we want to go in the latter?

>     DELIVERABLE
>         A relatively concise (15-20 pages) technical note
>         summarizing principles of good practice, with pointers to
>         examples, about the identification of terms and term sets
>         with URIs, related policies and etiquette, and expectations
>         regarding documentation.

BV : Do we focus on publisher's side only, or do we venture in best practices for
Vocabulary users? (such as : How do I integrate several vocabularies? How do I deal with
already published terms vs my domain terms? etc ...)

>     TARGET AUDIENCE
>         -- Maintainers of terms and term sets (vocabularies)
>            for use in a Semantic Web environment.
>         -- Anyone else wishing to declare terms reusably.

BV : See above. Do we add end-users, in the sense of applications using the published
vocabularies for indexing, search, software and data integration ...

> 2. About the specific objectives -- the outline -- of the
>    VM note:  Does the following seem reasonable as an overall
>    outline for the VM note?  Questions about the individual
>    sections will follow below.
>
>     Section 1. Terminology
>
>        I assume that we would need to agree -- at least for
>        the purposes of the VM note -- on the meaning of some
>        basic terms.  Section 1, then, would define a list of a
>        dozen or so basic terms such as "term" and "vocabulary".

BV : Eating or own food, and to be nicely recursive, this section should be organized
following our own guidelines :)

>     Section 2. Vocabularies in the Semantic Web
>
>        I assume we need to characterize what it is we are
>        talking about in terms of standard buzzwords that
>        people will have heard, such as Metadata Element Sets,
>        Controlled Vocabularies, Taxonomies, and Ontologies.
>        I also assume we should not neglect to articulate
>        some really basic assumptions about the Semantic Web,
>        such as data merging and repurposing.

BV : Hmmm ... see below "About Section 2"

>     Section 3. Principles of Good Practice
>
>        I assume we will be able to agree on some really basic
>        principles, such as "Identify Terms with URIs (or
>        URIrefs)" or "Articulate any policies or assumptions
>        underlying the assignment of URIs".  Beyond that, we
>        should see how far we can go.  Personally, I believe
>        that if we could articulate half a dozen or so simple
>        principles and elaborate on each principle in two or
>        three paragraphs, with pointers to actual practice,
>        these principles could form the core contribution of
>        the VM note.

BV: There again, maybe make the line between generic guidelines and some specific to more
common types of vocabularies.

>     Section 4. Evolving issues
>
>        On many issues we will not be able to agree, whether
>        because the issues are controversial (e.g., the idea of
>        "ownership" of a namespace which surfaced on this list
>        in response to an earlier draft) or because they are
>        the object of ongoing discussion and experimentation.
>        We should try to distill these issues down to a
>        "manageable" number -- a dozen or so -- and discuss
>        each issue in one or two paragraphs which describe
>        the issue and characterize the main viewpoints, areas
>        of development, or controversies, with pointers to
>        the literature.  In my opinion, a "manageable" number
>        is important not just to aid the reader, but also to
>        allow us to divide ownership of the issues among Task
>        Force members.

BV: Maybe an extension of this section could be a wiki where those keep up being discussed
?

> 3. About Section 1 - Terminology: Is it reasonable to think
>    we could agree on a terminology (for the paper) roughly
>    of the following scope?  Does it obviously go much too
>    far -- or not far enough?  Should we specifically link
>    the terminology section to the issues about which we feel
>    prepared to articulate good-practice guidelines in Section 2
>    (e.g., Identity, Ownership, Versioning...)?
>
>    -- Term
>    -- Vocabulary or Term Set (a set of Terms)
>    -- Namespace (hmm...)
>    -- Namespace URI (identifies a Namespace)
>    -- Namespace Owner (controls a Namespace)
>    -- Language (uses and mixes Vocabularies)?
>    -- Versioning (identification of changes to a Term or Term Set)
>    -- Term Concept (notional)
>    -- Term URI (identifies a Term Concept)
>    -- Term Annotation (a representation of or gloss on a Term Concept)
>    -- Term Version (an identifiable state of a cluster of Term Annotations)
>    -- Term Version URI (identifies a Term Version)
>    -- Term Declaration (represents a term in a machine-processable schema
>       language)
>    -- Namespace Document (definitive material about a Namespace)
>    -- Namespace Schema (definitive material about a Namespace in a
>       machine-processable schema language).

BV: Agreement on what should or should not go there, and then on the definitions, could
take a lot of energy, so everyone involved should be ready to be consensual :))
I think this list is a good basis. I don't think it goes too far.

> 4. About Section 2 - Vocabularies in the Semantic Web:  Here
>    is a very rough list of assumptions and principles that
>    come to mind when one thinks of the Semantic Web.  Does it
>    seem like we need a section articulating assumptions on
>    this very basic level (this depends of course on our
>    target audience)?  I'm thinking two or three pages -
>    does that seem about right?

BV: Do we really need that kind of prose? Depends indeed if the document is intended for
evangelization, or to provide technical clues to those already more or less convinced,
knowing the Whys and Wheres, but asking for the Hows. IMO we should rather target the
latter, but maybe I'm overly optimistic :)
In any case, the shortest the best here.

>    -- Open, loosely-coupled, mixed-language environments
>       ("the Web").
>
>    -- Organizations or even individuals defining and publishing
>       vocabulary terms in an open, bottom-up, and distributed
>       process (as both desirable and de-facto).
>
>    -- The need to support processes of referencing,
>       repurposing, recombining, merging data from a diversity
>       of sources.
>
>    -- The need to support the inevitable evolution of languages
>       ("evolvability").
>
>    -- The Must Ignore Principle: "If you find a language element
>       you don't understand, ignore it" (e.g., IETF practice,
>       Tim Berners-Lee, TAG Finding on Versioning).
>
>    -- The Principle of Free Extension: "Allow extensibility:
>       language designers should create extensible languages"
>       (TAG Finding on Versioning).  Languages are extensible
>       if they can mix Vocabularies.
>
>    -- An emerging infrastructure (keyword "registries") for
>       holding or harvesting Vocabularies for display, search,
>       tool configuration, inferencing, or other such services.
>
> 4. About Section 3 - Principles of Good Practice:  This is the
>    part which (I would hope) could form the core contribution
>    of the VM note, and here is a strawman attempt at
>    articulating a few things resembling the sort of principles
>    I have in mind.  Are these the sorts of things about
>    which we should try to get agreement?  Do you agree that
>    not all of the principles need to be "Must" principles --
>    that some could be "May" or "Also Consider" principles?
>    In particular -- since it has already come up on the list
>    -- do you think "Namespace Ownership" could belong here
>    or does it belong in Section 4?
>
>    -- Identify Terms using URIs.

BV: This is a MUST ... assuming we agree on what "identify" means here. And, AFAIK, this
is *THE* issue.
Does a URI identify a Term as a linguistic resource? or as a concept? or anything else ...
?
In which context is this identification to be used? Which process is it supposed to
support? : indexing, vocabulary merging, data integration, search ... Do we say something
about those process, or are we agnostic about it? etc ...

>    -- Term URIs should remain stable within the limits of
>       "semantically compatible" change and evolution of the
>       Terms identified (where "semantically compatible"
>       is defined with respect to backwards and forward
>       compatibility, as in the TAG Finding on Versioning).

BV: This can be only a should, snce the quoted limits are notoriously difficult to define

>    -- Associate URI-identified Terms with human-interpretable
>       Term Annotations -- usually, at a minimum, with text
>       defining the Term.

BV: Which kind of "association" do we expect here? Should this annotation be retrievable
using the URI itself (as in PSI recommendation)?

>    -- Consider associating the URI-identified Terms with
>       machine-processable Term Declarations in Namespace
>       Schemas.

BV: Consider, indeed. Beware that any example given will be taken as a model by syntax
hackers ...

>    -- Optionally, identify Term Versions using URIs.
>       Follow (by analogy) the W3C method of distinguishing
>       the timeless "Latest Version" from the date-stamped
>       "This Version" and "Previous Version" (is this method
>       formally described anywhere?).

BV: This does not seem really practicable at the level of granularity of Term. I can
imagine identifying vocabulary versions by different namespaces, but with the same
fragment identifier for each persistent term
	http://www.example.org/myvocabulary/latest#widget
	http://www.example.org/myvocabulary/2004-10-05#widget
But I've hard time to figure different versions of the same term in a persistent
namespace.

>    -- Version Namespace Documents and Namespace Schemas the way
>       W3C versions documents and schemas.
>
>    -- The Namespace Owner should describe and publish a
>       description of the terms identified by URIs and of
>       policies governing their maintenance, e.g.: expectations
>       about persistence, institutional commitment, and
>       semantic stability.

BV: I would go as far as a MUST here :))

>    -- Only a Namespace Owner should change the meaning of a Term
>       in a namespace (though non-owners may constrain meanings in
>       semantically compatible ways for use in specific contexts).

BV: Although I agree with that in principle, but after following some treads on sw-meaning
list, I'm not sure I understand any more what it means :))

>    -- When making assertions about terms belonging to another
>       Namespace Owner, consider seeking their endorsement of
>       those assertions ("assertion etiquette" or "good neighbor"
>       policies).

BV: Dream on ...

> 5. About Section 4 - Evolving issues: Squinting very hard,
>    does this seem like a reasonable start at a list of more
>    experimental or controversial issues or of issues that
>    should be mentioned but for whatever reason should be out
>    of scope for Section 3?  Might you want to take ownership of
>    any of these issues?  Do any of the issues clearly overlap
>    with other SWBPD Task Forces?
>
>     -- The problem of resolving (dereferencing) Term URIs.
>        URI-identified Terms should be associated with or
>        resolve to what sort of human-interpretable Term
>        Annotations or machine-processable Term Declarations?
>        The VM note could summarize the state of discussion
>        about whether a URI resolves to anything at all, and if
>        so, whether to a Web page, a machine-processable schema
>        (of whatever flavor), or a resource directory, pointing
>        to examples in practice.  If Terms are documented in
>        multiple ways, should a Namespace Owner distinguish
>        between "canonical" versus "derived" sources?

BV: This is exactly the kind of recommendations the OASIS PubSubj TC was trying to achieve
... and it took about two years to come to a point where it does not seem that any
consensus was possible on general principles. But for Vocabularies, maybe the scope is
specific enough to answer those issues.

>     -- The problem of work-flow and tools for documenting
>        Terms.  The VM note could point to tools and methods
>        for maintaining multiple documentation forms, such as
>        schemas and Web pages.
>
>     -- The problem of finding versus becoming a Namespace
>        Owner.  People want to know: "If we want to declare
>        a term but lack the institutional context to support
>        a persistent namespace policy, how can we do it?
>        Should I use an existing term, get a Namespace Owner
>        (such as DCMI) to declare one, or declare my own?
>        If I were to coin my own URI, where could I put it?"

BV: This is very interesting. Do we go as far as introduce a notion of "vocabulary
hosting", or something like that?

>     -- The problem of describing Terms. What are the properties
>        of a Term Annotation or Term Declaration?  Besides
>        a Definition, what are some of the properties
>        more commonly in use?  How important is it for
>        interoperability to use existing properties in Term
>        Annotations or Term Declarations?

BV: This is also a very difficult issue on which we never achieve agreement in PubSubj TC.
Seems that identity of a Term somehow includes its semantic context. IOW, suppose I use
the URI of a Term A in a namespace X to identify the Term A' in my local namespace X',
through something like :

	X':A'  owl:sameAs  X:A

Whatever I declare about A' in X' should be "somehow consistent" with what is declared
about A in X, otherwise said the context of X in A.
Certainly there is no general answer to that, answers could depend on the type of
vocabulary declared in X and X'.

(Actually those remarks address rather the last question below)

>     -- The schema language of a Term Declaration: The
>        VM note should perhaps not take a stand on the use
>        of a particular flavor of OWL/RDF+S for declaring a
>        vocabulary but should simply point to documents which
>        focus on this issue.

BV: Certainly. But we should be consistent with whatever PORT TF will recommend (e.g.
recommendations about SKOS for Thesauri, etc ...)

>     -- The formation of URIs.  The issues here include
>        "hash or slash", the implied semantics of language
>        strings and of implied directory hierarchies in URIs,
>        and the use of version numbers in URI strings.

BV: Can of worms :))

>     -- Application profiles.  Most vocabulary initiatives
>        end up having some notion of "profile" to designate
>        either a constrained subset of a vocabulary and/or
>        a language which mixes multiple vocabularies for
>        a particular purpose or application.  The VM note
>        should characterize the nature of these constructs,
>        possibly referring to notions such as Term Usage (a
>        cluster of Term Annotations about a Term of which one
>        is not the Namespace Owner).

>     -- The problem of "semantic context".  Terms may be
>        embedded in clusters of relations from which they
>        may be seen in part to derive their meaning.  It may
>        therefore not always be sensible to use those terms out
>        of context.  Examples include the terms of thesauri
>        or ontologies, as well as XML elements, which may
>        be defined with respect to parent elements and may
>        therefore not always be reusable as properties in an
>        RDF sense without violating their semantic intent.

BV: See above the remarks about identification and context.

That's all for today ...


Bernard Vatant
Senior Consultant
Knowledge Engineering
Mondeca - www.mondeca.com
bernard.vatant@mondeca.com

Received on Wednesday, 6 October 2004 07:54:46 UTC