What to do at the Biohackathon Re: [Urgent] Proposed changes to Bioschemas

Dear Bioschemas Community,

In approximate one week, on November 12th, the Biohackathon in Paris 
will start. The latest proposal has raised very long comment threads, 
and key discussions had to be moved from the google doc into e-mail 
lists. The proposal contains necessary but drastic changes from drafts 
0.3-0.5 for the Protein part, that have not been tested yet.

I believe we did not have enough time to finalize the discussions and 
think about all consequences by the deadline of yesterday. We still 
don't know if we will have a Datarecord or not and how things are going 
to be placed in the schema.org hierarchy.

Most Core Data Resources cannot afford to invest efforts in unfunded 
projects, we must therefore try to keep their investments low. If we ask 
them now to implement this proposal, we cannot ask them in a few months 
to change the implementation again.

This leaves the question of what should be done at the Biohackathon:

- Should we ask people to implement first pure schema.org (e.g. 
DataCatalog, DataSet, Action etc...) and not yet any of the bioschema 
gene,protein ones?

- Should we take the opportunity to present the proposal and its open 
questions to gather feedback and make a reality check?


Regards.
Jerven

On 10/25/2018 05:27 PM, Gray, Alasdair J G wrote:
> Dear Bioschemas Community
> 
> It has been a little over a year since we had the last community 
> face-to-face, during which time we have achieved a lot, with 44 
> resources publishing markup on over 6 million web pages [1]. Just over a 
> month ago we also saw the launch of Google's Dataset Search [2] through 
> which we should see some of the promised benefit from the Bioschemas markup.
> 
> In the last few weeks, there have been several active discussions on 
> github issues (210, 215, 217, 218, 220, 221, 222, 223) [3] relating to 
> the extension of schema.org <http://schema.org> types and properties for 
> life sciences. Bioschemas is intended as a simple markup mechanism that 
> should be easy to implement for providers and consume by tools. In 
> practice we have made things hard.
> 
> Based upon our experiences, including those from running several 
> tutorials with groups outside of Bioschemas “family”, we have identified 
> two main problems with our current community approach of using existing 
> life sciences ontology classes.
> 
>   *
> 
>     The first problem is that generic consumers of the markup, e.g.
>     search engines such as Google, will not understand the life sciences
>     ontology classes; these services only understand types and
>     properties in the schema.org <http://schema.org> vocabulary and this
>     will not change. Consequently, under the current approach, these
>     generic services will not be able to distinguish between a
>     BioChemEntity that is a Protein or a Gene, they will just understand
>     them all as BioChemEntity. Thus there will be no benefit to the
>     resources (eg. individual databases) in these services (Google)
>     consuming the markup.
> 
>   *
>     The second is that the choice of classes and terms from specific
>     life sciences ontologies are not compatible with the nature of the
>     schema.org <http://schema.org> vocabulary. This leads to logical
>     inconsistencies for services that consume the markup.
> 
> To overcome these challenges, we propose that a limited number of new 
> types and properties should be added to schema.org <http://schema.org> 
> as hosted extensions. These have been developed in discussion with Dan 
> Brickley (chair of the schema.org <http://schema.org> community group) 
> and will serve as bridging terms between the generic schema.org 
> <http://schema.org> vocabulary and the more specific life sciences 
> ontologies. We anticipate that there will be further types proposed in 
> the future, e.g. chemical.
> 
> The proposal is available in the following google document. Only comment 
> permissions have been granted so that the original proposal is unchanged.
> 
> https://docs.google.com/document/d/1Cw9K25N1l-Lbet1cahJuFtYgNKiF76apGcCqJPSeuZg/edit?usp=sharing
> 
> In order that these changes can be in place by the biohackathon we 
> request any comments on these proposals are made by 1 November.
> 
> Best regards
> 
> Alasdair, Leyla, Sarala, Nick, Carole, and Rafa
> 
> [1] http://bioschemas.org/liveDeploys/
> 
> [2] https://toolbox.google.com/datasetsearch
> 
> [3] https://github.com/BioSchemas/specifications/issues/
> 
> --
> Alasdair J G Gray
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
> 
> Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
> 
> Untitled Document
> ------------------------------------------------------------------------
> 
> */Heriot-Watt University is The Times & The Sunday Times International 
> University of the Year 2018/*
> 
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With 
> campuses and students across the entire globe we span the world, 
> delivering innovation and educational excellence in business, 
> engineering, design and the physical, social and life sciences.
> 
> This email is generated from the Heriot-Watt University Group, which 
> includes:
> 
>  1. Heriot-Watt University, a Scottish charity registered under number
>     SC000278
>  2. Edinburgh Business School a Charity Registered in Scotland,
>     SC026900. Edinburgh Business School is a company limited by
>     guarantee, registered in Scotland with registered number SC173556
>     and registered office at Heriot-Watt University Finance Office,
>     Riccarton, Currie, Midlothian, EH14 4AS
>  3. Heriot- Watt Services Limited (Oriam), Scotland's national
>     performance centre for sport. Heriot-Watt Services Limited is a
>     private limited company registered is Scotland with registered
>     number SC271030 and registered office at Research & Enterprise
>     Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS. 
> 
> The contents (including any attachments) are confidential. If you are 
> not the intended recipient of this e-mail, any disclosure, copying, 
> distribution or use of its contents is strictly prohibited, and you 
> should please notify the sender immediately and then delete it 
> (including any attachments) from your system.
> 

Received on Friday, 2 November 2018 13:52:12 UTC