Re: Use-case detail from Eric Miller on 2006-03-15 (public-semweb-lifesci@w3.org from March 2006)

From: Eric Miller <em@w3.org>
Date: Wed, 15 Mar 2006 13:43:32 -0500
To: Brian Osborne <osborne1@optonline.net>
Cc: Eric Neumann <eneumann@alum.mit.edu>, public-semweb-lifesci@w3.org
Message-Id: <DC42506B-2F75-429F-8A5D-7E44F441C75E@w3.org>

On Mar 15, 2006, at 12:49 PM, Brian Osborne wrote:

> Eric et al.,
>
> Working on writing up some use cases. Chembank is a nice compound  
> database
> for demonstration purposes since it associates some fraction of its
> compounds with MeSH Diseases terms (
> http://chembank.broad.harvard.edu/chemistry/search/input/ 
> ontology.htm), it
> refers to this ontology as Therapeutic Indication. They also use GO
> Biological Process.
>
> A year or so ago you could could access its pages by GET, now it  
> looks like
> it's doing a POST - is this a problem for our programmers? No  
> description of
> any API, as far as I can see.

POST only access and no API certainly makes it more difficult to  
reuse any of this data :(

Regarding when to use GET vs POST, I've found the following resource  
useful...

[[
An important principle of Web architecture is that all important  
resources be identifiable by URI. The finding discusses the  
relationship between the URI addressability of a resource and the  
choice between HTTP GET and POST methods with HTTP URIs. HTTP GET  
promotes URI addressability so, designers should adopt it for safe  
operations such as simple queries. POST is appropriate for other  
types of applications where a user request has the potential to  
change the state of the resource (or of related resources). The  
finding explains how to choose between HTTP GET and POST for an  
application taking into account architectural, security, and  
practical considerations.
]]
-- http://www.w3.org/2001/tag/doc/whenToUseGet.html

A bit of browsing around looks like there are at least some GETable  
resources so there might be some data one could gleen

e.g.

http://chembank.broad.harvard.edu/chemistry/search/input/ 
moleculeName.htm

search on '*sulfide*' and then hit 'search' to add Substructure. this  
yeilds for example the following search result

disulfiram / ChemBankID: 2038
- http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=2038

which points to "find similar molecules"
- http://chembank.broad.harvard.edu/chemistry/ 
findSimilarMolecules.htm?cbid=2038

The system seems session based, but at least parts of the data seem  
scrapeable.

As you seem to be exploring more the Piggy-bank scraper idea (per the  
simile general list), the Open World cat scraper [1] is an example of  
a session-based, muti-page scraper than could be adapted to at least  
parts of the data on this site.

[1] http://potlach.org/2005/10/scrapers/

--
eric miller                              http://www.w3.org/people/em/
semantic web activity lead               http://www.w3.org/2001/sw/
w3c world wide web consortium            http://www.w3.org/

Received on Wednesday, 15 March 2006 18:43:33 UTC