- From: Eric Miller <em@w3.org>
- Date: Wed, 15 Mar 2006 13:43:32 -0500
- To: Brian Osborne <osborne1@optonline.net>
- Cc: Eric Neumann <eneumann@alum.mit.edu>, public-semweb-lifesci@w3.org
On Mar 15, 2006, at 12:49 PM, Brian Osborne wrote: > Eric et al., > > Working on writing up some use cases. Chembank is a nice compound > database > for demonstration purposes since it associates some fraction of its > compounds with MeSH Diseases terms ( > http://chembank.broad.harvard.edu/chemistry/search/input/ > ontology.htm), it > refers to this ontology as Therapeutic Indication. They also use GO > Biological Process. > > A year or so ago you could could access its pages by GET, now it > looks like > it's doing a POST - is this a problem for our programmers? No > description of > any API, as far as I can see. POST only access and no API certainly makes it more difficult to reuse any of this data :( Regarding when to use GET vs POST, I've found the following resource useful... [[ An important principle of Web architecture is that all important resources be identifiable by URI. The finding discusses the relationship between the URI addressability of a resource and the choice between HTTP GET and POST methods with HTTP URIs. HTTP GET promotes URI addressability so, designers should adopt it for safe operations such as simple queries. POST is appropriate for other types of applications where a user request has the potential to change the state of the resource (or of related resources). The finding explains how to choose between HTTP GET and POST for an application taking into account architectural, security, and practical considerations. ]] -- http://www.w3.org/2001/tag/doc/whenToUseGet.html A bit of browsing around looks like there are at least some GETable resources so there might be some data one could gleen e.g. http://chembank.broad.harvard.edu/chemistry/search/input/ moleculeName.htm search on '*sulfide*' and then hit 'search' to add Substructure. this yeilds for example the following search result disulfiram / ChemBankID: 2038 - http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=2038 which points to "find similar molecules" - http://chembank.broad.harvard.edu/chemistry/ findSimilarMolecules.htm?cbid=2038 The system seems session based, but at least parts of the data seem scrapeable. As you seem to be exploring more the Piggy-bank scraper idea (per the simile general list), the Open World cat scraper [1] is an example of a session-based, muti-page scraper than could be adapted to at least parts of the data on this site. [1] http://potlach.org/2005/10/scrapers/ -- eric miller http://www.w3.org/people/em/ semantic web activity lead http://www.w3.org/2001/sw/ w3c world wide web consortium http://www.w3.org/
Received on Wednesday, 15 March 2006 18:43:33 UTC