W3C home > Mailing lists > Public > w3c-wai-eo@w3.org > October to December 2004

Automatic Glossary Extraction

From: Harvey Bingham <hbingham@acm.org>
Date: Tue, 05 Oct 2004 09:36:41 -0400
Message-Id: <6.1.1.1.2.20041005092335.0e42a250@pop.rcn.com>
To: "WAI-EO" <w3c-wai-eo@w3.org>

Automatic Glossary Extraction: Beyond Terminology Identification
Youngja Park, Roy J Byrd and Branimir K Boguraev
IBM Thomas J. Watson Research Center
{pyoungja, roybyrd, bkb}@us.ibm.com

http://www.alphaworks.ibm.com/g/g.nsf/img/semanticsdocs/$file/glossaryext.pdf

Has some interesting techniques.

Abstract

"This paper describes a method for automatically extracting
domain-specific glossaries from large document collections.
We show that, compared with current text analysis methods
for extracting technical terminology form text, our extracted
glossaries more successfully support applications requiring
knowledge of domain concepts. After presenting our methods,
we illustrate our output of GlossEx, our glossary extraction tool,
and present an informal evaluation of its performance."


Regards/Harvey
Received on Tuesday, 5 October 2004 22:53:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 10:33:37 GMT