- From: James Cooper <jwcnmr@us.ibm.com>
- Date: Thu, 14 Oct 2004 10:58:05 -0400
- To: public-swls-ws@w3.org
- Message-ID: <OF98D69932.B1C85898-ON85256F2D.005210E5-85256F2D.005238FD@us.ibm.com>
Automatic discovery and annotation of organic chemical names in patents James W Cooper Stephen S Boyer and Anni R Coden We have designed a series of algorithms to recognize and annotate organic chemical names in technical documents, and have applied this system to one year of US patents. The system uses only two small dictionaries and is primarily rule-based. Once we have extracted these names, we can use one of several commercial products to convert these names to SMILES strings, which can then be loaded into a searchable database. We can then use this database to allow searches of the patents by chemical substructure rather than by chemical name, thus providing a much more thorough search of the compounds mentioned in the patents. James W. Cooper Advanced Information Retrieval and Analysis IBM T J Watson Research Center jwcnmr@watson.ibm.com 914-784-7285 http://flahdo.watson.ibm.com/ http://www.research.ibm.com/people/j/jwcnmr/
Received on Thursday, 14 October 2004 19:40:37 UTC