Automatic Glossary Extraction

Automatic Glossary Extraction: Beyond Terminology Identification
Youngja Park, Roy J Byrd and Branimir K Boguraev
IBM Thomas J. Watson Research Center
{pyoungja, roybyrd, bkb}@us.ibm.com

http://www.alphaworks.ibm.com/g/g.nsf/img/semanticsdocs/$file/glossaryext.pdf

Has some interesting techniques.

Abstract

"This paper describes a method for automatically extracting
domain-specific glossaries from large document collections.
We show that, compared with current text analysis methods
for extracting technical terminology form text, our extracted
glossaries more successfully support applications requiring
knowledge of domain concepts. After presenting our methods,
we illustrate our output of GlossEx, our glossary extraction tool,
and present an informal evaluation of its performance."


Regards/Harvey

Received on Tuesday, 5 October 2004 22:53:54 UTC