- From: Francesco Sclano <francesco_sclano@yahoo.it>
- Date: Sat, 21 Oct 2006 23:29:03 +0200 (CEST)
- To: semantic-web@w3.org
TermExtractor, my master thesis, is online at the address http://lcl2.di.uniroma1.it !!! TermExtractor is a software package for automatic building, validation and maintenance of glossaries in english language. TermExtractor extracts terminology consensually referred in a specific application domain. The package takes as input a corpus of domain documents, parses the documents, and extracts a list of "syntactically plausible" terms (e.g. compounds, adjective-nouns, etc.). Documents parsing assigns a greater importance to terms with text layouts (title, bold, italic, underlined, etc.). Two entropy-based measures, called Domain Relevance and Domain Consensus, are then used. Domain Consensus is used to select only the terms which are consensually referred throughout the corpus documents. Domain Relevance to select only the terms which are relevant to the domain of interest, Domain Relevance is computed with reference to a set of contrastive terminologies from different domains. Finally, extracted terms are further filtered using Lexical Cohesion, that measures the degree of association of all the words in a terminological string. Accept files formats are: txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and also zip archives. -- Francesco Sclano home page: http://lcl2.di.uniroma1.it/~sclano msn: francesco_sclano@yahoo.it skype: francesco978 __________________________________________________ Do You Yahoo!? Poco spazio e tanto spam? Yahoo! Mail ti protegge dallo spam e ti da tanto spazio gratuito per i tuoi file e i messaggi http://mail.yahoo.it
Received on Sunday, 22 October 2006 15:59:23 UTC