Re: Google Summer of Code from Justin Clark-Casey on 2018-01-19 (public-bioschemas@w3.org from January 2018)

From: Justin Clark-Casey <jc955@cam.ac.uk>
Date: Fri, 19 Jan 2018 17:05:36 +0000
To: public-bioschemas@w3.org
Message-ID: <262bd008-0e70-228b-9e29-f1190345fd1b@cam.ac.uk>

On 19/01/18 14:04, Gray, Alasdair J G wrote:
>>
>> Perhaps there could be 3 proposed projects.
>>
> Please add these to the document.

Okay, I added "Bioschemas Common Crawl" and "SPARQL over Bioschemas" to the doc and somewhat rewrote the Buzzbang description.  Federico, I put you down as a 
mentor of SPARQL (of course, please change the name and everything else as necessary).  I put myself down as one potential mentor of "Bioschemas Common Crawl".

There is a demarcation question here.  In my description I have assumed that the crawl will put data into something with triplestore functionality and produce 
an RDF dump, but do no further processing (i.e. it is not attempting to construct a 'knowledge graph').  Further processing is left to downstream projects such 
as Buzzbang and SPARQL for now.  One could argue that there should be one common knowledge graph but I see this as a very complex undertaking beyond the scope 
of a GSOC project at this point (to be honest, all these projects are already hard, imo).

I put skills such as OWL, SPARQL, etc. down for SPARQL over bioschemas but as these are going to be pretty rare, I suggest making them optional if gsoc allows.

--
Justin Clark-Casey
Research Software Engineer, InterMine life sciences data integration, U of Cambridge
http://twitter.com/justincc http://justincc.org

Received on Friday, 19 January 2018 17:06:22 UTC