Brief position statement-Discovery of Chemical Names

Automatic discovery and annotation of organic chemical names in patents

James W Cooper Stephen S Boyer and Anni R Coden

We have designed a series of algorithms to recognize and annotate organic 
chemical names in technical documents, and have applied this system to one 

year of US patents.  The  system uses only two small dictionaries and is 
primarily rule-based.  Once we have extracted these names, we can use one 
of several commercial products to convert these names to SMILES strings, 
which can then be loaded into a searchable database.  We can then use this 
database 
to allow searches of the patents by chemical substructure rather than by 
chemical name, thus providing a much more thorough search of the compounds 

mentioned in the patents. 



James W. Cooper
Advanced Information Retrieval and Analysis
IBM T J Watson Research Center
jwcnmr@watson.ibm.com
914-784-7285
http://flahdo.watson.ibm.com/
http://www.research.ibm.com/people/j/jwcnmr/

Received on Thursday, 14 October 2004 19:40:37 UTC