Re: Definition of "facet"

In message <AAEKLFPLCPPCFCOACDKIGEKMCJAA.aida@acorweb.net> on Mon, 1 Mar
2004, Aida Slavic <aida@acorweb.net> wrote
>
>Leonard,
>
>My concern with any definition we would accept is related to the functionality
>this may imply

Aida

Yes, I agree. We should probably practice what we preach, and define
these concepts in terms of their functionality. If in doing so we find
that we are talking about more than one distinct concept, then we need
to find distinct names for them.

>I have in mind the following:
>
>1) facets in expressing semantic (logical hierarchy and poly-hierarchy)
>here is where the issue of facet/array and inheritance comes
>
>- a little digression may be relevant here:
>Svenonius suggested that for m2m handling of vocabulary there should be
>provision for indicating the difference between hierarchy types: logical
>hierarchy (one concept-one hierarchy) and perspective hierarchy (one
>concept more than one hierarchy) first hierarchy type is good for broadening
>and narrowing search in IR when vocabulary covers single subject area (e.g.
>thesauri) second hierarchy type is paramount for disambiguation (when
>vocabulary covers universal knowledge area e.g. classifications)

I'm not sure whether any new distinction is being made here or whether
she is just making the distinction between mono- and poly-hierarchy.
I.e. whether or not a term can have more than one broader term.

>the question is: if we accept to declare that something is a facet in
>SKOS/OWL does this mean that only logical hierarchies are allowed... and
>that the same concept will not occur in other hierarchies within the same
>KOS irrespective whether the concepts are naturally context free and
>irrespective the coverage of KOS (thinking here of polysems (culture,
>organization, democracy) and other vague concepts such as water, marble,
>cell etc. and the way they may be treated in special and general KOS)

It seems to me that the only case where mono-hierarchies are required is
something like a formal biological taxonomy, where membership of a
parent concept is an essential part of the definition of a concept.
"Whales" are "mammals" _by definition_ and they cannot therefore have
any other parent concept such as "fishes" or "insects".

In any other kind of hierarchy a term can in principle have more than
one broader term, so that "whales" can be a narrower term of "mammals"
as well as being a narrower term of "aquatic creatures", where it may
have "fish" and "plankton" as sibling terms. This polyhierarchical
structure allows broadening of searches for "all mammals" or "all
aquatic creatures", so I'm puzzled by the suggestion you quote above
that a monohierarchy is desirable for this purpose.

The only restriction is that the parent concepts must belong to the same
fundamental category (which I call a facet). "Whales" is in the facet of
"organisms" or "living things" and cannot have a parent concept in the
facet of "disciplines", or "actions", or "places", for example.

Some thesauri are restricted to being mono-hierarchical because of
limitations in the software used to construct them, but that is not
something  that we should accept as a general principle.

I don't think that the issue of polysemes is relevant here, because we
are talking about the relationship between _concepts_ rather than words.
If a word can represent more than one concept within a controlled
vocabulary then it is not a good descriptor and needs to be qualified to
show which concept it represents. If it represents only one concept
within the vocabulary, though it can represent other concepts elsewhere,
then its scope note needs to show clearly that its meaning is
restricted.

>2) facets in expressing syntax/structure
>
>There is no agreement on the semantic of fundamental facets so pinning
>down the semantic can hardly be the ONLY reason for stating the facet.
>Thesauri usually declare facets for vocabulary building/control/management
>while classification systems, apart from this, exploit facets also for
>precision in indexing (i.e. building complex expressions). Hence, the first
>ones have only facets and the second one have both facets and roles
>attached to them

Yes, I think that this is the core of the problem. The rules for
combining descriptors to create a compound string to represent a
combination of concepts, are often called rules for the "citation order
of facets", but as I said in my last message this meaning of "facet" is
different from the "fundamental category" meaning. We are talking about
two different concepts, and I think we should give them different names.

When we build a string using a rule such as

>> >Thing/kind/part/property/material/process/operation/patient/product/by-
>> >product/agent/space/time

I would say that "we are combining concepts (or the terms which
represent concepts) according to their roles", with no mention of
facets.

In the strings

        boys kissing girls
and
        girls kissing boys

"boys" and "girls" both belong to the same facet of "people". The
citation order is determined by roles and not by facets.

>Outside traditional KOS and in the spectrum of different so called 'faceted'
>vocabularies created to support browsing on portals the reasons for
>encoding facets is the same. These vocabularies do not attach any
>'fundamental' meaning to the facets and yet they exploit them to achieve
>certain functionality in managing terminology and creating
>browsing/searching interface

Yes, this is another meaning again. When an interface allows you to
search for wine first by origin, then by colour, then by sweetness, it
is allowing you to apply successive characteristics of division in order
to reduce the number of entries in the arrays at the lowest level. This
is a fundamental feature of "faceted classification", but neither the
"characteristics of division" (origin, colour, sweetness) nor the
resulting arrays are "facets" in the sense of "fundamental categories",
and I think it misleading to call them that.

>In order to have roles  one has to have data structure to which to
>attach these roles (and later on the rules for processing the roles).

We don't need a structure other than well-defined concepts to attach
roles to. The fact that boys and girls are in the same facet in the
example about doesn't help in determining their roles.

>But the very fact that classification facets have their roles

I don't believe that they do, unless you are defining "classification
facets" to _mean_ roles. Doing that seems to introduce unnecessary
confusion..

> is *exactly* the reason why I would want to encode them for machine
>processing: I need to handle and automate syntax. For processing pre-
>coordinate vocabularies it is very important to know that one concept belong
>to a certain facet as this context will determine its place in a string, its role

Its role, not the facet to which it belongs, will determine its place in
a string.

> and its meaning in this particular facet as opposed to its meaning when it
>occurs in some other facet...

Meaning of concepts should be defined by scope notes. A single concept
should not occur in more than one facet, though it may occur in more
than one hierarchy within a single facet (see above).

>My understanding is that thesauri may as well 'pretend' that facets are
>fundamental categories of mutually exclusive terms and fix each term to
>occur only in one facet Thesauri have less need for disambiguation -
>because they are zooming down on the narrow subject area where one
>concept has only one broader concept and often only one role. Such is the
>case of materials in AAT... where stone or glass or leather is not discussed
>outside their role in the Art and Architecture.
>
>a) AAT, for instance,  does not have to accommodate 'marble deposits in
>geology' where the same concept may not be treated as 'material' .

I don't see the problem here. The concept of "marble" refers to "a
granular crystalline limestone", and this is always true (_pace_ any
pedantic geologists). It may be put into an array under the node label
<rock by composition> and into another array under the broader term
"materials for sculpture" (if that is its only use within the scope of
the thesaurus). It may also be combined into an indexing string with the
discipline of "geology" and the form "deposits". None of these affects
the nature of the concept or its membership of a "materials" facet.

>b) thesauri do not need to use facets to exercise the roles as they are used
>for single term indexing (post-coordinate indexing) They don't combine
>terms together in a complex expression. [having said that: if one chose to
>produce composite terms with thesaurus, one would need to attach role to
>the facets_
>
>Any analytico-synthetic classifications and other pre-coordinated indexing
>langauges have to exploit facet analysis for more than one purpose. This
>does not mean that facet in classification (Processes or Materials, Place) in
>the context of a given discipline are not classes in which essential properties
>are exhibited by all its members.
>
>(We can, for the purpose of this discussion, think of classification such as
>Bliss 2 to be a collection of thesauri for instance)
>
Yes, I am becoming more and more convinced that thesauri and
classification schemes are just alternative ways or arranging and
presenting lists and groups of concepts. I therefore am very keen to
help arrive at a single set of unambiguous terms which we can use to
discuss these things, rather than having to qualify statements by saying
that we are talking "in a thesaurus context" or "in a classification
context".

This is an interesting discussion - I wonder whether other people have
views on whether what we are saying makes sense.  Are we making any
progress towards a consensus of opinion?

Leonard
-- 
Willpower Information       (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants              Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will@Willpowerinfo.co.uk               Sheena.Will@Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------

Received on Monday, 1 March 2004 12:29:59 UTC