- From: J.Zhu <J.Zhu@open.ac.uk>
- Date: Tue, 20 Dec 2005 11:59:42 -0000
- To: semantic-web <semantic-web@w3.org>
- Message-ID: <2AF05AF70A86A6438445BC3AC057F498085E90C5@mir.open.ac.uk>
[Apologies for cross-posting] Dear Colleagues and Friends, I would like to announce a new named entity recognition tool called ESpotter, a .NET application. You can simply click one button to extract entities of various types, e.g., "Open University" as an organization and "Enrico Motta" as a person, from documents. You can select one or multiple documents in plain text format or html format and save the recognized entities in an XML file for further processing. The tool is based on the .NET framework and can be download from my homepage at: http://kmi.open.ac.uk/people/jianhan/ESpotter/ESpotter.zip <http://kmi.open.ac.uk/people/jianhan/ESpotter/ESpotter.zip> Run the ESpotter.msi file to install (you may need to install .net framework 1.0). The installation will create a shortcut for an ESpotter executable file on your desktop. One example XML output as follows shows entities of various types and their word offsets in a document. <?xml version="1.0" encoding="utf-8" standalone="yes"?> <ESpotter-Processed-Documents corpusSize="284"> <Document id="0"> <has-directory>D:\test.xml</has-directory> <has-url>D:\test.xml</has-url> <has-document-size>284</has-document-size> <mentions-location> <instance content="Australia" pos="108" /> </mentions-location> <mentions-organization> <instance content="Monash University" pos="132" /> </mentions-organization> <mentions-person> <instance content="Larry Stillman" pos="130" /> </mentions-person> <mentions-research-area> <instance content="network" pos="238" alias="TechnologiesCommunity Informatics Research Network" /> </mentions-research-area> <pn> <instance content="ICT" pos="22" /> </pn> </Document> </ESpotter-Processed-Documents> ESpotter uses an MS Access database file ESpotterResources.mdb to store lexicon and pattern information. Currently ESpotter recognize People, Organization, Location, Research Area, Email, Telephone, Postal Code, and other Proper Names. You can easily customize the lexicon and patterns in ESpotterResources.mdb file to recognize any type of entities you are interested in by adding new lexicon and patterns. Lexicon and patterns are grouped into different tables. When you add new lexicon or patterns, you can create a new table, and register the new table in the TableSchema table. New entity types need to be registered in the TypeSchema table. Using precision for domain adaptation is not used in the version of ESpotter and can be ignored in the database file. For developers interested in ESpotter, the installation includes an DLL file ESpotterClass.dll for easy inclusion in a .NET application for language engineering. An example is given in the Class1.cs file. More info on using ESpotter for development is coming soon. Wish you find the tool useful and send me any comment. Regards, Jianhan Zhu ------------------------------------------------- Dr. Jianhan Zhu (Research Fellow) Knowledge Media Institute The Open University Milton Keynes United Kingdom Tel: +44 (0)1908652073 WWW: http://kmi.open.ac.uk/people/jianhan <http://kmi.open.ac.uk/people/jianhan>
Received on Tuesday, 20 December 2005 12:00:09 UTC