W3C home > Mailing lists > Public > public-bioschemas@w3.org > November 2017

Re: Protein representation with and without BioChemEntity

From: Melanie Courtot <mcourtot@ebi.ac.uk>
Date: Fri, 3 Nov 2017 11:44:45 +0000
To: public-bioschemas@w3.org
Message-ID: <1ed19606-6f8a-ad6e-b9b7-1e2a07db5402@ebi.ac.uk>
Hi Michel,

By "a step further" I was trying to make 2 points (and looking back my 
email was probably not explicit enough):

- Bioschemas should probably steer clear of debating the best choice of 
ontology. In general, and as I mentioned during the last meeting, we 
probably don't want to constrain which resources are being used as 
different communities/users/groups will have their favorite ones, and 
will probably not want to recode. Leyla mentions exisiting mappings, and 
we could rely on the OxO tool [1] as well. It'd be interesting to see if 
we could make this work with validation tools.

- I don't think we should declare many Bioschemas specific properties 
provided there is a mechanism for reusing existing relations. In the 
current example at 
there is a "isContainedin" which I don't think Bioschemas should 
redefine for example (and I see both SIO and RO have the relation already)


[1] https://www.ebi.ac.uk/spot/oxo/index

On 03/11/2017 09:52, Michel Dumontier wrote:
> SIO is a perfectly acceptable choice, if I say so myself :)
> m.
> On Thu, Nov 2, 2017 at 11:02 AM, Melanie Courtot <mcourtot@ebi.ac.uk 
> <mailto:mcourtot@ebi.ac.uk>> wrote:
>     Hi,
>     I'm wondering if we could take it a step further, and instead of
>     defining specific properties we could just reuse terms from RO (or
>     else)?
>     For example,
>     "http://semanticscience.org/resource/is-transcribed-from"
>     <http://semanticscience.org/resource/is-transcribed-from> could be
>     replaced by http://purl.obolibrary.org/obo/RO_0002510
>     <http://purl.obolibrary.org/obo/RO_0002510>, "transcribed from",
>     and "isContainedIn" could be
>     http://purl.obolibrary.org/obo/RO_0001018
>     <http://purl.obolibrary.org/obo/RO_0001018>, "contained in".
>     Cheers,
>     Melanie
>     -- 
>     Mélanie Courtot, PhD
>     GA4GH/BioSamples Project lead
>     European Bioinformatics Institute (EMBL-EBI)
>     On 01/11/2017 16:18, Justin Clark-Casey wrote:
>>     Direct term reuse sounds like a good choice to me, especially as
>>     a) it's the mechanism that schema.org <http://schema.org>
>>     themselves have to add existing ontology classes and terms to the
>>     structured data
>>     b) will make applications much easier to write as they can use
>>     existing general tooling
>>     c) allows us to do everything we were doing with
>>     AdditionalProperty and
>>     d) still allows us to define profiles without having to move
>>     everything through schema.org <http://schema.org>
>>     -- Justin
>>     On Wed, Nov 1, 2017 at 3:56 PM, Gray, Alasdair J G
>>     <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote:
>>         Hi All,
>>         Apologies for the delay in sending this email. I have been
>>         working with Carole on submitting an Implementation Study
>>         proposal to the Data Platform for more work on Bioschemas.
>>         For representing a specific bioscience type, e.g. a protein,
>>         we currently have a proposal for using a generic wrapper
>>         approach that we then specialise, e.g. BioChemEntity
>>         specialised with a Protein profile.
>>         Protein profile
>>         http://bioschemas.org/specifications/Protein/specification/
>>         <http://bioschemas.org/specifications/Protein/specification/>
>>         BioChemEntity type
>>         http://bioschemas.org/specifications/BioChemEntity/specification/
>>         <http://bioschemas.org/specifications/BioChemEntity/specification/>
>>         To help understand the various advantages and disadvantages
>>         of this approach, Kenneth and I have drawn up an example of
>>         marking up a specific protein first using the current
>>         proposal and second  if we were to do the same with a
>>         specific ProteinEntity. Below are the examples and some
>>         analysis of them.
>>         *BioChemEntity Example*
>>         Minimum markup using BioChemEntity
>>         https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min.jsonld
>>         <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min.jsonld>
>>         Minimum + Recommended markup using BioChemEntity
>>         https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min%2Brec.jsonld
>>         <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min+rec.jsonld>
>>         One thing to note is that the minimum + recommended markup is
>>         not an additive extension of the minimum markup. Due to the
>>         use of the AdditionalProperty relationship, you need to use
>>         an JSON array and add the properties from the recommended
>>         level within the existing array.
>>         An advantage of this approach is that it reuses terms from
>>         existing ontologies and we can represent types that do not
>>         currently exist in Schema.org <http://schema.org>, e.g.
>>         Genes, Chemicals, etc.
>>         *ProteinEntity example*
>>         Minimum markup using ProteinEntity
>>         https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min.jsonld
>>         <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min.jsonld>
>>         Minimum + Recommended markup using ProteinEntity
>>         https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min%2Brec.jsonld
>>         <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min+rec.jsonld>
>>         While the markup in these examples using ProteinEntity is
>>         easier to interpret, the number of items that need to be
>>         changed to markup another protein is the same as in the
>>         BioChemEntity approach. The simplified markup should enable
>>         easier adoption, although we could help the current proposal
>>         of using BioChemEntity by using highlighting on the
>>         Bioschemas site to show which terms need to be changed.
>>         A major downside of this approach is that we would need to
>>         add all the types to Schema.org <http://schema.org> or host
>>         them at Bioschemas.org <http://bioschemas.org>. While these
>>         could be mapped to existing terms, we would be accused of
>>         duplicating existing ontology terms.
>>         *Direct term reuse example*
>>         Last week, I showed the above examples to Dan (we were at
>>         ISWC together). He pointed out that the additionalProperty
>>         relation was added to allow the use of property/value pairs
>>         where the properties do not exist in an ontology. We are in
>>         the situation where the properties we are using come from
>>         ontologies. Dan suggested that we just use them directly.
>>         Note that the example also exploits the fact that you can
>>         define multiple types.
>>         Minimum markup using BioChemEntity and term reuse
>>         https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld
>>         <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld>
>>         Minimum + Recommended markup using BioChemEntity and term reuse
>>         https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld
>>         <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld>
>>         As you will see, this seems to have the advantages of both
>>         the above approaches. The markup is more straightforward than
>>         the additionalProperty approach, but exploits reusing
>>         existing domain ontologies. The tooling and exploitation will
>>         be much more straightforward.
>>         I invite you all to review and comment on these different
>>         examples. Do we believe that the BioChemEntity with term
>>         reuse (the third set of examples) is an appropriate path
>>         going forward?
>>         Best regards
>>         Alasdair
>>         PS Sorry for the long email
>>         Alasdair J G Gray
>>         Fellow of the Higher Education Academy
>>         Assistant Professor in Computer Science,
>>         School of Mathematical and Computer Sciences
>>         (Athena SWAN Bronze Award)
>>         Heriot-Watt University, Edinburgh UK.
>>         Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
>>         Web: http://www.macs.hw.ac.uk/~ajg33
>>         <http://www.macs.hw.ac.uk/%7Eajg33>
>>         ORCID: http://orcid.org/0000-0002-5711-4872
>>         <http://orcid.org/0000-0002-5711-4872>
>>         Office: Earl Mountbatten Building 1.39
>>         Twitter: @gray_alasdair
>>         ------------------------------------------------------------------------
>>         */Heriot-Watt University is The Times & The Sunday Times
>>         International University of the Year 2018/*
>>         Founded in 1821, Heriot-Watt is a leader in ideas and
>>         solutions. With campuses and students across the entire globe
>>         we span the world, delivering innovation and educational
>>         excellence in business, engineering, design and the physical,
>>         social and life sciences.
>>         This email is generated from the Heriot-Watt University
>>         Group, which includes:
>>          1. Heriot-Watt University, a Scottish charity registered
>>             under number SC000278
>>          2. Edinburgh Business School a Charity Registered in
>>             Scotland, SC026900. Edinburgh Business School is a
>>             company limited by guarantee, registered in Scotland with
>>             registered number SC173556 and registered office at
>>             Heriot-Watt University Finance Office, Riccarton, Currie,
>>             Midlothian, EH14 4AS
>>          3. Heriot- Watt Services Limited (Oriam), Scotland's
>>             national performance centre for sport. Heriot-Watt
>>             Services Limited is a private limited company registered
>>             is Scotland with registered number SC271030 and
>>             registered office at Research & Enterprise Services
>>             Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>         The contents (including any attachments) are confidential. If
>>         you are not the intended recipient of this e-mail, any
>>         disclosure, copying, distribution or use of its contents is
>>         strictly prohibited, and you should please notify the sender
>>         immediately and then delete it (including any attachments)
>>         from your system.
> -- 
> Michel Dumontier
> Distinguished Professor of Data Science
> Maastricht University
> http://dumontierlab.com
Received on Friday, 3 November 2017 11:45:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:00 UTC