RE: rude categorisation

>Not sure I understand where this conversation is leading. I absolutely
>agree with all Leonard's points about using different metadata elements
>to describe the "subject" and the "type" of a document. Other important

The issue of redundancy of subject metadata is a bit out of
place here although I personally find it very relevant.
In my mind this is not the question of standards
but rather an issue of specific IR system implementation and indexing
and cataloguing guidelines. Standards should remain generous and open to
redundancy to a reasonable extent. In distributed resources discovery
(cross collection/cross language) redundancy may be seen as a 'safety
margin'
and may help reduce information loss. Especially with the respect to the
very fuzzy understanding of semantics of subject elements in
DCMES/LOM and other metadata schemes around.

The reason MARC21 has subject data scattered in several fields comes
from the bad data modelling and bad tradition in library formats not
making proper use of subject data.
Functional Requirements for Bibliographic Data make the following
distinction
between different entities of information resources (document)

WORK (intellectual creation) is realised through EXPRESSION (intellectual
realization)
is embodied in MANIFESTATION (physical embodiment), is exemplified by ITEM
(a single exemplar
of manifestation)

WORK (M-M relationships)
hasSubject - concept, event, object, place
hasSubject - person, corporate body
hasSubject - work, expression, manifestation, item

Worth considering for SKOS is that what is only a form of Work1 may become
subject
of study for Work2 in the same way the author of Work1 may become
subject of Work2.

Classifications systems usually make clear distinction between
resource content and  resource form/type/format and provide
possibility to index all four of them separately without any
confusion on what is subject and what is form/type/format
All classification systems created for
information organization and retrieval contain two different kind
of vocabulary and in more sophisticated systems these are kept
in separate vocabulary facets and can be easily managed and accessed
separately in IR.

1) facets of subject fields/discipline vocabulary - often called main tables
or main schedules)
2) facets of common concepts (form,place,time, persons, properties,
processes...) - often called
auxiliary tables or schedules

In synthetic classifications common concepts represent up to 1/5 of the
whole classification
vocabulary and they are freely combined with any subject.

In the vocabulary of FORM - one can find well organized hierarchies of
vocabulary that
can be used to denote different internal forms (e.g. teaching aid) and
external forms (e.g. text)
defend formats (e.g. digital) and different carriers(e.g. Web page).

The reason for this is the fact that in information organization -
classifications
are used for practical purpose of collocating information resources
according
to their content as well as the form in which the content is expressed. So
one can
choose to collocate all maps and within this class to make distinction
between history, politics, demography etc. Or collocate history documents
and
within this subject area organize  maps, textbooks, videos etc.

It is a matter of indexing policy and specific needs of a resource
collection and IR
system whether the attributes of form, place, language etc. will be added to
the main
subject concept in the process of classification.

Aida

Received on Wednesday, 2 March 2005 22:15:13 UTC