W3C home > Mailing lists > Public > semantic-web@w3.org > December 2005

Easy language engineering with ESpotter named entity recognition

From: J.Zhu <J.Zhu@open.ac.uk>
Date: Tue, 20 Dec 2005 11:59:42 -0000
Message-ID: <2AF05AF70A86A6438445BC3AC057F498085E90C5@mir.open.ac.uk>
To: semantic-web <semantic-web@w3.org>
[Apologies for cross-posting]

 
Dear Colleagues and Friends,
 
I would like to announce a new named entity recognition tool called
ESpotter, a .NET application. You can simply click one button to extract
entities of various types, e.g., "Open University" as an organization
and "Enrico Motta" as a person, from documents. You can select one or
multiple documents in plain text format or html format and save the
recognized entities in an XML file for further processing. 
 
The tool is based on the .NET framework and can be download from my
homepage at: http://kmi.open.ac.uk/people/jianhan/ESpotter/ESpotter.zip
<http://kmi.open.ac.uk/people/jianhan/ESpotter/ESpotter.zip>  Run the
ESpotter.msi file to install (you may need to install .net framework
1.0). The installation will create a shortcut for an ESpotter executable
file on your desktop. One example XML output as follows shows entities
of various types and their word offsets in a document. 
 
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ESpotter-Processed-Documents corpusSize="284">
  <Document id="0">
    <has-directory>D:\test.xml</has-directory>
    <has-url>D:\test.xml</has-url>
    <has-document-size>284</has-document-size>
    <mentions-location>
      <instance content="Australia" pos="108" />
    </mentions-location>
    <mentions-organization>
      <instance content="Monash University" pos="132" />
    </mentions-organization>
    <mentions-person>
      <instance content="Larry Stillman" pos="130" />
    </mentions-person>
    <mentions-research-area>
      <instance content="network" pos="238" alias="TechnologiesCommunity
Informatics Research Network" />
    </mentions-research-area>
    <pn>
      <instance content="ICT" pos="22" />
    </pn>
  </Document>
</ESpotter-Processed-Documents>
 
ESpotter uses an MS Access database file ESpotterResources.mdb to store
lexicon and pattern information. Currently ESpotter recognize People,
Organization, Location, Research Area, Email, Telephone, Postal Code,
and other Proper Names. You can easily customize the lexicon and
patterns in ESpotterResources.mdb file to recognize any type of entities
you are interested in by adding new lexicon and patterns. Lexicon and
patterns are grouped into different tables. When you add new lexicon or
patterns, you can create a new table, and register the new table in the
TableSchema table. New entity types need to be registered in the
TypeSchema table. Using precision for domain adaptation is not used in
the version of ESpotter and can be ignored in the database file.
 
For developers interested in ESpotter, the installation includes an DLL
file ESpotterClass.dll for easy inclusion in a .NET application for
language engineering. An example is given in the Class1.cs file. More
info on using ESpotter for development is coming soon.
 
Wish you find the tool useful and send me any comment.
 
Regards,
Jianhan Zhu
-------------------------------------------------
Dr. Jianhan Zhu (Research Fellow)
Knowledge Media Institute
The Open University
Milton Keynes
United Kingdom
 
Tel: +44 (0)1908652073
WWW: http://kmi.open.ac.uk/people/jianhan
<http://kmi.open.ac.uk/people/jianhan> 
Received on Tuesday, 20 December 2005 12:00:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 February 2013 14:24:52 GMT