Re: SKOS and MeSH qualifiers

In message <Pine.CYG.4.58.0507120554440.41764@johndrake> on Tue, 12 Jul 
2005, Robert Watkins <rwatkins@foo-bar.org> wrote
>
>While I certainly agree with Leonard's desire to find a solution (if 
>deemed appropriate) for the more general case of MeSH, LCSH and other 
>schemes, I do not agree with his statement (in the case of MeSH):
>
>> In general none of the terms need be considered as "qualifying" any
>> other - they are brought together as equals.
>
>With MeSH, while the "qualifiers" may be full MeSH descriptors in their 
>own right, they are when used as qualifiers playing the role of 
>qualifying a MeSH descriptor and are thus not equal to the term they 
>are qualifying.

Yes, I agree with this. When I said "in general" in the quotation above 
I was generalising beyond the specific and restricted way in which 
concepts are pre-coordinated in MeSH as descriptor + qualifier.

>A document indexed with term T and qualifier Q must be found
>if searching for
>
>       term T
>       term T with qualifier Q
>       qualifier Q
>
>but _not_ when searching for
>
>       term Q
>       term Q with qualifier T
>
>(admitting of course that the document might _also_ be indexed with term
>Q, etc. but, for the sake of argument, not in this case). Is this the
>same for LCSH? Taking one of Leonard's examples,
>
>       Leukemia -- Animal models -- Congresses.
>
>would it be appropriate to find a document so indexed if searching for
>"Leukemia -- Animal models", or would that be considered a different
>pre-coordinated term?

It should be an option at the time of search to specify whether an exact 
match is required or whether the search string is to be treated as 
ending with a wildcard.

>It might be useful to try to define the requirements of MeSH and LCSH in
>more abstract terms to see if indeed a general solution is appropriate.
>With MeSH, the qualifiers are only one level deep and have a minimum
>cardinality of 0 and no maximum cardinality at that level; with LCSH
>(and others?) it looks, from Leonard's examples, as if the level of
>depth is arbitrary but that at each level the minimum cardinality is 0
>and the maximum cardinality is 1. More graphically:
>
>MeSH                    LCSH
>----                    ----
>term T                  term T
>    refinement A            refinement A
>    refinement B                refinement B
>    refinement C                    refinement C
>    ...                                ...
>    refinement n                           refinement n
>
>Documents indexed with these refinements should be found if a search is
>done for term T alone or with any one refinement from each level of
>depth, with no restriction on depth of refinement. It's possible that a
>general solution might prove too complex, given that (if my analysis is
>correct) it would need to accommodate infinite levels of refinement and
>unrestricted cardinality at each level.

I don't feel very comfortable in talking about "refinements", because I 
still think that in general we are bringing together a series of 
potentially equal concepts in an indexing string, the order of which is 
to some extent arbitrary (though standardised by rules of citation 
order).

However, accepting Robert's terminology, I think that his analysis above 
is reasonably correct. I have three comments:

1. It would be possible to simplify the structure if the MeSH version 
were normalised to give the entries:

term T -- refinement A
term T -- refinement B
etc.

2. For those MeSH concepts that can be used both as a main term 
(descriptor) and as a qualifier, we should have only a single entry in 
the controlled vocabulary, with a note (a sub-property ?) specifying the 
ways in which that concept can be used. For those terms that can be used 
only as qualifiers, there needs to be a note specifying this.

In the non-MeSH case it should be assumed that terms can occupy any 
place in a pre-coordinated string, so long as the rules for citation 
order are observed.

3. I'm not sure how Robert's specification of "maximum cardinality 1" 
would apply to the example I gave where one of the concepts in the 
string incorporates an element of hierarchy, i.e.

>Leukemia -- Environmental aspects -- Massachusetts. Cape Cod.

In this case "Massachusetts" and "Cape Cod" are probably both present in 
the "place" facet of the controlled vocabulary, with the relationship

Massachusetts
NT Cape Cod

I suppose that in this case the substring "Massachusetts. Cape Cod" 
would be treated as a single occurrence of the concept. If there were 
another place to be indexed, it would be entered as a separate string, 
such as

Leukemia -- Environmental aspects -- New York.


I still wonder whether introducing the complication of handling strings 
and qualifiers is worthwhile or appropriate at this stage. I would 
prefer to see the SKOS specification for dealing with single concepts 
being accepted and adopted, and user-oriented software being made 
available soon, rather than risking delays by introducing these other 
issues at this stage. MeSH and LCSH are important, but they were not 
developed in accordance with the current standards for thesauri, and it 
will clearly be a non-trivial job to accommodate their distinct 
structures.

Leonard
-- 
Willpower Information       (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants              Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will@Willpowerinfo.co.uk               Sheena.Will@Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------

Received on Tuesday, 12 July 2005 11:57:41 UTC