- From: Aida Slavic <aida@acorweb.net>
- Date: Wed, 2 Aug 2006 17:03:53 +0100
- To: <public-esw-thes@w3.org>
Hi,
If I remember correctly there was a similar discussion in 2004.
My understanding was that the problem of structured classification
notation was to be ignored by SKOS at the time, and that complex notation
ought to be treated as a simple text string.
Jakob Voss wrote
>SKOS should be able to express DDC, UDC and CC - but it must stay
>simple! So what do you suggest to express CC's "U:(W)" in the next SKOS?
I don't understand what does 'simple' mean in this case. DDC is simple
enumerative classification with largely non-expressive notation which is
used as text string to 'mark and park' books. UDC and CC are
analytico-synthetic classifications with fully expressive structured
notation the parts of which shoud be searchable using booleans.
To code an expressive notation one needs:
- way to encode facet indicators or separate parts of notation independent of
notation itself
- way to encode relationship between parts of the complex notation
- the way to encode correct notation hierarchy independently from notation (this
can be sorted out as BT/NT relationship or as hierarchy code)
If the first two are not possible in SKOS then you can not say that "SKOS expresses
classification" but rather "SKOS expresses enumerative classification"
For instance the examples Jakob gave for DDC (which is the only type of combination
DDC has)
<551.22>
<T2--551.22>
<T1--59827>
Does not solve the problem of 32:37 (relationship between education and politics) from UDC
where two main subjects are combined
I think that any generalisations based on DDC or LCC, which are enumerative systems for
linear shelf ordering - may be wrong. This certanly made MARC
classification format completely useless for classifications that are used
in IR (UDC in particular).
Also, there were some misinterpretations in Nabonita's mail that I would like to put
straight
<library classification schemes. No doubt that DDC and UDC are most popular schemes but
<they have some serious limitations. Just for e.g. each subject requires a definite place
<in the <array of subjects. But if we study carefully the notational system of DDC (UDC
<is based on DDC pattern) , we will find that 000-900 notations have been assigned to
<the subjects randomly. But due to the fixed notational systems, interpolation of
<newly emerging subjects between existing subjects becomes a serious issue.
-DDC and UDC have very different notational principle
-CC (Colon Classification), DDC and UDC are very different when it comes to the
fundamental principles they're built on and how this is expressed in notation:
DDC lists compound concepts and assigns them a simple notational symbol) and
it does not allow combination of two subjects from the main schedule while the
combination with auxiliary schedules is limited.
Most importantly - DDC does not contain a consistent set of facet indicators
in the notation i.e. its notation is not fully expressive. E.g. 551.220959827
does not show where one number starts and other begin. More importantly
"59827" from (T1--59827)does not have constant meaning i.e. its meaning changes
fepending on the number it is attached to.
UDC and CC are fully analytico-synthetic classifications and have fully
expressive notation. CC (being purely faceted) does not - and UDC (being
partially faceted) avoids - the use of simple notation to express compound concepts.
UDC & CC have consistent rules to combine notations from the main schedule
with auxiliary schedules or any two or more subjects or their facets from
the main schedules. In UDC it is literally possible to combine any two or more
concepts (from main and common auxiliary schedules) no matter where in the schedules
they appear because the notation has persiastent meaning. E.g. (73) means always USA
no matter to which number it is attached - (1/9) is facet indicator for concepts of
place. Two or more simple numbers from main schedules have to be connected with
relationship symbols.
Nabonita writes
>So, the number of CC says that the subject deals with the influence of political factors in a
>geographical area. Where as in UDC the nature of relationship between two subject components
>is not so explicit.
In this example the principle of order is applied i.e. the treated subject is
normally listed first and the subject of treatment second. In UDC 32:91 means the
influence of geography on politics. This principle in indexing is known as
"wall-picture principle". But there are other ways of saying this more precisely..
It is possible to express the type of relationship between two UDC numbers
in three ways:
a) in a limited way using four symbols and consistent principle of order: (relation),
:: (relation fixed order), [] (subsumes], / (extension)
b) in a complex and detailed way using common auxiliaries of phase
relationships (-042) - it contains dozen different relationships and their
subgroupings
c) in a very sophisticated way by applying Perrault's symbols for relationsahips
(from Perrault's "Towards the theory of UDC")
Aida
Received on Wednesday, 2 August 2006 16:03:15 UTC