- From: Leonard Will <L.Will@willpowerinfo.co.uk>
- Date: Tue, 12 Jul 2005 12:51:52 +0100
- To: public-esw-thes@w3.org
In message <Pine.CYG.4.58.0507120554440.41764@johndrake> on Tue, 12 Jul 2005, Robert Watkins <rwatkins@foo-bar.org> wrote > >While I certainly agree with Leonard's desire to find a solution (if >deemed appropriate) for the more general case of MeSH, LCSH and other >schemes, I do not agree with his statement (in the case of MeSH): > >> In general none of the terms need be considered as "qualifying" any >> other - they are brought together as equals. > >With MeSH, while the "qualifiers" may be full MeSH descriptors in their >own right, they are when used as qualifiers playing the role of >qualifying a MeSH descriptor and are thus not equal to the term they >are qualifying. Yes, I agree with this. When I said "in general" in the quotation above I was generalising beyond the specific and restricted way in which concepts are pre-coordinated in MeSH as descriptor + qualifier. >A document indexed with term T and qualifier Q must be found >if searching for > > term T > term T with qualifier Q > qualifier Q > >but _not_ when searching for > > term Q > term Q with qualifier T > >(admitting of course that the document might _also_ be indexed with term >Q, etc. but, for the sake of argument, not in this case). Is this the >same for LCSH? Taking one of Leonard's examples, > > Leukemia -- Animal models -- Congresses. > >would it be appropriate to find a document so indexed if searching for >"Leukemia -- Animal models", or would that be considered a different >pre-coordinated term? It should be an option at the time of search to specify whether an exact match is required or whether the search string is to be treated as ending with a wildcard. >It might be useful to try to define the requirements of MeSH and LCSH in >more abstract terms to see if indeed a general solution is appropriate. >With MeSH, the qualifiers are only one level deep and have a minimum >cardinality of 0 and no maximum cardinality at that level; with LCSH >(and others?) it looks, from Leonard's examples, as if the level of >depth is arbitrary but that at each level the minimum cardinality is 0 >and the maximum cardinality is 1. More graphically: > >MeSH LCSH >---- ---- >term T term T > refinement A refinement A > refinement B refinement B > refinement C refinement C > ... ... > refinement n refinement n > >Documents indexed with these refinements should be found if a search is >done for term T alone or with any one refinement from each level of >depth, with no restriction on depth of refinement. It's possible that a >general solution might prove too complex, given that (if my analysis is >correct) it would need to accommodate infinite levels of refinement and >unrestricted cardinality at each level. I don't feel very comfortable in talking about "refinements", because I still think that in general we are bringing together a series of potentially equal concepts in an indexing string, the order of which is to some extent arbitrary (though standardised by rules of citation order). However, accepting Robert's terminology, I think that his analysis above is reasonably correct. I have three comments: 1. It would be possible to simplify the structure if the MeSH version were normalised to give the entries: term T -- refinement A term T -- refinement B etc. 2. For those MeSH concepts that can be used both as a main term (descriptor) and as a qualifier, we should have only a single entry in the controlled vocabulary, with a note (a sub-property ?) specifying the ways in which that concept can be used. For those terms that can be used only as qualifiers, there needs to be a note specifying this. In the non-MeSH case it should be assumed that terms can occupy any place in a pre-coordinated string, so long as the rules for citation order are observed. 3. I'm not sure how Robert's specification of "maximum cardinality 1" would apply to the example I gave where one of the concepts in the string incorporates an element of hierarchy, i.e. >Leukemia -- Environmental aspects -- Massachusetts. Cape Cod. In this case "Massachusetts" and "Cape Cod" are probably both present in the "place" facet of the controlled vocabulary, with the relationship Massachusetts NT Cape Cod I suppose that in this case the substring "Massachusetts. Cape Cod" would be treated as a single occurrence of the concept. If there were another place to be indexed, it would be entered as a separate string, such as Leukemia -- Environmental aspects -- New York. I still wonder whether introducing the complication of handling strings and qualifiers is worthwhile or appropriate at this stage. I would prefer to see the SKOS specification for dealing with single concepts being accepted and adopted, and user-oriented software being made available soon, rather than risking delays by introducing these other issues at this stage. MeSH and LCSH are important, but they were not developed in accordance with the current standards for thesauri, and it will clearly be a non-trivial job to accommodate their distinct structures. Leonard -- Willpower Information (Partners: Dr Leonard D Will, Sheena E Will) Information Management Consultants Tel: +44 (0)20 8372 0092 27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276 L.Will@Willpowerinfo.co.uk Sheena.Will@Willpowerinfo.co.uk ---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------
Received on Tuesday, 12 July 2005 11:57:41 UTC