MeSH-related issues

Here is some information about the various MeSH-related issues raised 
this morning during the telecon.
I will be happy to answer specific questions.

-- Olivier


1) MeSH identifiers

The identifiers for main headings (descriptors) in MeSH are of the form: 
D followed by 6 digits (called DUI or D number).
e.g., D009474 for Neuron
(other kinds of MeSH entities have other kinds of identifiers, Q numbers 
for qualifiers [or subheadings] and C numbers for supplementary concept 
records)

These identifiers are unique identifiers, unlike the tree numbers, 
identifying the position of a main heading in MeSH hierarchies.
For example, Neuron (D009474) appears in 2 hierarchies and therefore has 
2 tree numbers: A08.663 and A11.671

Nervous System [A08]               
    [...]
    Neurons [A08.663]           
        Dendrites [A08.663.256]  +       
        [...]
 
Cells [A11]               
    [...]
    Neurons [A11.671]           
        Autonomic Fibers, Postganglionic [A11.671.078]  +       
        [...]

If I remember correctly, the version of MeSH that was available on OBO 
at some point mistakenly uses one tree number instead of the DUI as the 
unique identifier. (It also transforms into subClass relations all the 
parent/child relations in MeSH, most of which have a different 
semantics.) I would urge you to use the official files instead.

The official MeSH files can be downloaded here:
http://www.nlm.nih.gov/mesh/

The MeSH identifiers can be easily extracted from the XML (or ASCII) 
version of MeSH.
I have a DUI|MainHeading list available upon request.


2) MeSH URIs

There are currently no URIs supported by NLM for MeSH main headings.
I would recommend making up one based in the unique identifier (DUI) 
referred to above.
The URL for the MeSH browser:
http://www.nlm.nih.gov/mesh/2007/MBrowser.html
could be used as the base URI if you wanted to (but don't expect to be 
able to dereference it properly).

Alternatively, knowing that MeSH is one of the resources available as 
part of NCBI's Entrez system, you could use an Entrez-like URI. For 
neuron, this would be:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=mesh&list_uids=68009474&dopt=Full
Note that NCBI seems to reidentify the main heading 
(list_uids=68009474), but the reidentification seems to parallel the 
original DUI (D009474). I am just guessing here, but could find out.


3) MeSH in SKOS

The work Scott and I referred to this morning is:
http://thesauri.cs.vu.nl/eswc06/
For the record, I have some issues with the representation they came up 
with, but nothing like the reservations I have about the OWL version of 
MeSH distributed as part of OBO.
I haven't seen Matthias' version.


4) MeSH documentation

The following URL points to a suite of training materials for MeSH and 
its use in indexing Medline citations:
http://www.nlm.nih.gov/bsd/disted/mesh/index.html

One useful paper about MeSH is also:
Nelson, Stuart J.; Johnston, Douglas, Humphreys, Betsy L.
Relationships in Medical Subject Headings.
In: Bean, Carol A.; Green, Rebecca, editors. Relationships in the 
organization of knowledge. New York: Kluwer Academic Publishers; 2001. 
p.171-184.
http://www.nlm.nih.gov/mesh/meshrels.html


5) Subset of MeSH for neurosciences

The first place to start is probably in the Anatomy tree (A), with 
descriptors such as Central Nervous System or Brain. From this starting 
point, say Brain, you can easily get all the descendants (simply by 
traversing the trees). This set will include all the descriptors that 
should be searched for when somebody is interested in searching Medline 
on the start concept (i.e., Brain).

 From there you might want to explore the descriptors co-occurring with 
the start descriptor (or any concept in the set above). These 
descriptors are descriptors used in conjunction with Brain in Medline 
citations. The frequency of co-occurrence can be used as a surrogate for 
the salience of the association. Semantic types can be used as a filter 
(e.g., Disease or Syndrome).
Co-occurrence information of MeSH main headings is available as part of 
the UMLS (table MRCOC.RRF), restricted to starred main headings (main 
topics), in the past 10 years of Medline.

Alternatively, other MeSH hierarchies can be explored as well, e.g.:
- Brain Diseases (D001927), in the C tree
- Neurosciences (D009488), in the G tree
- ...

Finally, some of the pharmacological actions might be of interest
(e.g.,
        Central Nervous System Depressants [D27.505.696.277]  +       
        Central Nervous System Stimulants [D27.505.696.282]  +       
        Neurotransmitter Agents [D27.505.696.577]  +       )
In this case, the corresponding drugs are linked to these descriptors 
not  hierarchically, but through the specific relation "pharmacological 
action" (PA), accessible  in the XML file.
e.g.,:
Valproic Acid (D014635)
Pharm. Action    Anticonvulsants
Pharm. Action    Antimanic Agents
Pharm. Action    Enzyme Inhibitors
Pharm. Action    GABA Agents


6) MeSH and Medline

In the Medline files, MeSH descriptors are listed as strings, not DUIs.
However, the consistency of Medline and maintained by synchronizing both 
resources each year. All Medline records are updated to reflect changes 
in the names of the descriptors, if any.
As a corollary, a given version of Medline must be used with the 
corresponding version of MeSH.

We have developed code to transform Medline citations into lists of 
PMID|DUI, which we could shared. We did this last year for one year of 
Medline (2004 or 2005, I believe), but could rerun on a more recent 
version and/or a different subset of Medline.

Received on Monday, 19 March 2007 17:13:14 UTC