W3C home > Mailing lists > Public > public-bioschemas@w3.org > November 2017

Re: Protein representation with and without BioChemEntity

From: Melanie Courtot <mcourtot@ebi.ac.uk>
Date: Fri, 3 Nov 2017 13:43:52 +0000
To: Michel Dumontier <michel.dumontier@gmail.com>, Leyla Garcia <ljgarcia@ebi.ac.uk>
Cc: public-bioschemas@w3.org
Message-ID: <bb809c34-ee01-a8df-411b-46f06cb1a9f8@ebi.ac.uk>
+1
As said below, either existing mappings or some tools - OxO, BioPortal 
or else; I didn't mean to add a constraint there.

For the relations at least users will need to make sure there is a way 
to get to a "known" entity for validation. Do we know how that could 
look like?
Let's say for example some datasets use 
"http://semanticscience.org/resource/is-transcribed-from" 
<http://semanticscience.org/resource/is-transcribed-from> and some use 
"http://purl.obolibrary.org/obo/RO_0002510 
<http://purl.obolibrary.org/obo/RO_0002510>" (and I'm sure there are 
other relations than SIO and RO, using those as working examples, not 
saying either SIO or RO should be required), how would that work for the 
protein profile?

Cheers,
Melanie

On 03/11/2017 13:29, Michel Dumontier wrote:
> yes of course, i was equally surprised to see SIO there in the first 
> place.
>
> m.
>
> On Fri, Nov 3, 2017 at 1:50 PM, Leyla Garcia <ljgarcia@ebi.ac.uk 
> <mailto:ljgarcia@ebi.ac.uk>> wrote:
>
>     Hi all,
>
>     IMHO, if we want to be as inclusive as possible, we should not
>     impose anything outside the scope of type and profile
>     specifications. So, if we are not to impose RO, SIO, PRO, SO or
>     any other, we should not impose either OxO. We might find out that
>     other groups also have their own preferences there (Bioportal
>     mapping, for instance).
>
>     Regards,
>
>
>
>     On 03/11/2017 11:44, Melanie Courtot wrote:
>>     Hi Michel,
>>
>>     By "a step further" I was trying to make 2 points (and looking
>>     back my email was probably not explicit enough):
>>
>>     - Bioschemas should probably steer clear of debating the best
>>     choice of ontology. In general, and as I mentioned during the
>>     last meeting, we probably don't want to constrain which resources
>>     are being used as different communities/users/groups will have
>>     their favorite ones, and will probably not want to recode. Leyla
>>     mentions exisiting mappings, and we could rely on the OxO tool
>>     [1] as well. It'd be interesting to see if we could make this
>>     work with validation tools.
>>
>>     - I don't think we should declare many Bioschemas specific
>>     properties provided there is a mechanism for reusing existing
>>     relations. In the current example at
>>     https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld
>>     <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld>
>>     there is a "isContainedin" which I don't think Bioschemas should
>>     redefine for example (and I see both SIO and RO have the relation
>>     already)
>>
>>     Cheers,
>>     Melanie
>>
>>     [1] https://www.ebi.ac.uk/spot/oxo/index
>>     <https://www.ebi.ac.uk/spot/oxo/index>
>>
>>     On 03/11/2017 09:52, Michel Dumontier wrote:
>>>     SIO is a perfectly acceptable choice, if I say so myself :)
>>>
>>>     m.
>>>
>>>     On Thu, Nov 2, 2017 at 11:02 AM, Melanie Courtot
>>>     <mcourtot@ebi.ac.uk <mailto:mcourtot@ebi.ac.uk>> wrote:
>>>
>>>         Hi,
>>>
>>>         I'm wondering if we could take it a step further, and
>>>         instead of defining specific properties we could just reuse
>>>         terms from RO (or else)?
>>>
>>>         For example,
>>>         "http://semanticscience.org/resource/is-transcribed-from"
>>>         <http://semanticscience.org/resource/is-transcribed-from>
>>>         could be replaced by
>>>         http://purl.obolibrary.org/obo/RO_0002510
>>>         <http://purl.obolibrary.org/obo/RO_0002510>, "transcribed
>>>         from", and "isContainedIn" could be
>>>         http://purl.obolibrary.org/obo/RO_0001018
>>>         <http://purl.obolibrary.org/obo/RO_0001018>, "contained in".
>>>
>>>         Cheers,
>>>         Melanie
>>>
>>>         -- 
>>>         Mélanie Courtot, PhD
>>>         GA4GH/BioSamples Project lead
>>>         European Bioinformatics Institute (EMBL-EBI)
>>>
>>>
>>>
>>>         On 01/11/2017 16:18, Justin Clark-Casey wrote:
>>>>         Direct term reuse sounds like a good choice to me,
>>>>         especially as
>>>>
>>>>         a) it's the mechanism that schema.org <http://schema.org>
>>>>         themselves have to add existing ontology classes and terms
>>>>         to the structured data
>>>>         b) will make applications much easier to write as they can
>>>>         use existing general tooling
>>>>         c) allows us to do everything we were doing with
>>>>         AdditionalProperty and
>>>>         d) still allows us to define profiles without having to
>>>>         move everything through schema.org <http://schema.org>
>>>>
>>>>         -- Justin
>>>>
>>>>         On Wed, Nov 1, 2017 at 3:56 PM, Gray, Alasdair J G
>>>>         <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote:
>>>>
>>>>             Hi All,
>>>>
>>>>             Apologies for the delay in sending this email. I have
>>>>             been working with Carole on submitting an
>>>>             Implementation Study proposal to the Data Platform for
>>>>             more work on Bioschemas.
>>>>
>>>>             For representing a specific bioscience type, e.g. a
>>>>             protein, we currently have a proposal for using a
>>>>             generic wrapper approach that we then specialise, e.g.
>>>>             BioChemEntity specialised with a Protein profile.
>>>>
>>>>             Protein profile
>>>>             http://bioschemas.org/specifications/Protein/specification/
>>>>             <http://bioschemas.org/specifications/Protein/specification/>
>>>>             BioChemEntity type
>>>>             http://bioschemas.org/specifications/BioChemEntity/specification/
>>>>             <http://bioschemas.org/specifications/BioChemEntity/specification/>
>>>>
>>>>             To help understand the various advantages and
>>>>             disadvantages of this approach, Kenneth and I have
>>>>             drawn up an example of marking up a specific protein
>>>>             first using the current proposal and second  if we were
>>>>             to do the same with a specific ProteinEntity. Below are
>>>>             the examples and some analysis of them.
>>>>
>>>>             *BioChemEntity Example*
>>>>             Minimum markup using BioChemEntity
>>>>             https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min.jsonld
>>>>             <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min.jsonld>
>>>>
>>>>             Minimum + Recommended markup using BioChemEntity
>>>>             https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min%2Brec.jsonld
>>>>             <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min+rec.jsonld>
>>>>
>>>>             One thing to note is that the minimum + recommended
>>>>             markup is not an additive extension of the minimum
>>>>             markup. Due to the use of the AdditionalProperty
>>>>             relationship, you need to use an JSON array and add the
>>>>             properties from the recommended level within the
>>>>             existing array.
>>>>
>>>>             An advantage of this approach is that it reuses terms
>>>>             from existing ontologies and we can represent types
>>>>             that do not currently exist in Schema.org
>>>>             <http://schema.org>, e.g. Genes, Chemicals, etc.
>>>>
>>>>             *ProteinEntity example*
>>>>             Minimum markup using ProteinEntity
>>>>             https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min.jsonld
>>>>             <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min.jsonld>
>>>>
>>>>             Minimum + Recommended markup using ProteinEntity
>>>>             https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min%2Brec.jsonld
>>>>             <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min+rec.jsonld>
>>>>
>>>>             While the markup in these examples using ProteinEntity
>>>>             is easier to interpret, the number of items that need
>>>>             to be changed to markup another protein is the same as
>>>>             in the BioChemEntity approach. The simplified markup
>>>>             should enable easier adoption, although we could help
>>>>             the current proposal of using BioChemEntity by using
>>>>             highlighting on the Bioschemas site to show which terms
>>>>             need to be changed.
>>>>
>>>>             A major downside of this approach is that we would need
>>>>             to add all the types to Schema.org <http://schema.org>
>>>>             or host them at Bioschemas.org <http://bioschemas.org>.
>>>>             While these could be mapped to existing terms, we would
>>>>             be accused of duplicating existing ontology terms.
>>>>
>>>>             *Direct term reuse example*
>>>>             Last week, I showed the above examples to Dan (we were
>>>>             at ISWC together). He pointed out that the
>>>>             additionalProperty relation was added to allow the use
>>>>             of property/value pairs where the properties do not
>>>>             exist in an ontology. We are in the situation where the
>>>>             properties we are using come from ontologies. Dan
>>>>             suggested that we just use them directly. Note that the
>>>>             example also exploits the fact that you can define
>>>>             multiple types.
>>>>
>>>>             Minimum markup using BioChemEntity and term reuse
>>>>             https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld
>>>>             <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld>
>>>>
>>>>             Minimum + Recommended markup using BioChemEntity and
>>>>             term reuse
>>>>             https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld
>>>>             <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld>
>>>>
>>>>             As you will see, this seems to have the advantages of
>>>>             both the above approaches. The markup is more
>>>>             straightforward than the additionalProperty approach,
>>>>             but exploits reusing existing domain ontologies. The
>>>>             tooling and exploitation will be much more straightforward.
>>>>
>>>>             I invite you all to review and comment on these
>>>>             different examples. Do we believe that the
>>>>             BioChemEntity with term reuse (the third set of
>>>>             examples) is an appropriate path going forward?
>>>>
>>>>             Best regards
>>>>
>>>>             Alasdair
>>>>
>>>>             PS Sorry for the long email
>>>>
>>>>             Alasdair J G Gray
>>>>
>>>>             Fellow of the Higher Education Academy
>>>>             Assistant Professor in Computer Science,
>>>>             School of Mathematical and Computer Sciences
>>>>             (Athena SWAN Bronze Award)
>>>>             Heriot-Watt University, Edinburgh UK.
>>>>
>>>>             Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
>>>>             Web: http://www.macs.hw.ac.uk/~ajg33
>>>>             <http://www.macs.hw.ac.uk/%7Eajg33>
>>>>             ORCID: http://orcid.org/0000-0002-5711-4872
>>>>             <http://orcid.org/0000-0002-5711-4872>
>>>>             Office: Earl Mountbatten Building 1.39
>>>>             Twitter: @gray_alasdair
>>>>
>>>>             ------------------------------------------------------------------------
>>>>
>>>>             */Heriot-Watt University is The Times & The Sunday
>>>>             Times International University of the Year 2018/*
>>>>
>>>>             Founded in 1821, Heriot-Watt is a leader in ideas and
>>>>             solutions. With campuses and students across the entire
>>>>             globe we span the world, delivering innovation and
>>>>             educational excellence in business, engineering, design
>>>>             and the physical, social and life sciences.
>>>>
>>>>             This email is generated from the Heriot-Watt University
>>>>             Group, which includes:
>>>>
>>>>              1. Heriot-Watt University, a Scottish charity
>>>>                 registered under number SC000278
>>>>              2. Edinburgh Business School a Charity Registered in
>>>>                 Scotland, SC026900. Edinburgh Business School is a
>>>>                 company limited by guarantee, registered in
>>>>                 Scotland with registered number SC173556 and
>>>>                 registered office at Heriot-Watt University Finance
>>>>                 Office, Riccarton, Currie, Midlothian, EH14 4AS
>>>>              3. Heriot- Watt Services Limited (Oriam), Scotland's
>>>>                 national performance centre for sport. Heriot-Watt
>>>>                 Services Limited is a private limited company
>>>>                 registered is Scotland with registered number
>>>>                 SC271030 and registered office at Research &
>>>>                 Enterprise Services Heriot-Watt University,
>>>>                 Riccarton, Edinburgh, EH14 4AS.
>>>>
>>>>             The contents (including any attachments) are
>>>>             confidential. If you are not the intended recipient of
>>>>             this e-mail, any disclosure, copying, distribution or
>>>>             use of its contents is strictly prohibited, and you
>>>>             should please notify the sender immediately and then
>>>>             delete it (including any attachments) from your system.
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Michel Dumontier
>>>     Distinguished Professor of Data Science
>>>     Maastricht University
>>>     http://dumontierlab.com
>>
>
>
>
>
> -- 
> Michel Dumontier
> Distinguished Professor of Data Science
> Maastricht University
> http://dumontierlab.com
Received on Friday, 3 November 2017 13:44:19 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:00 UTC