W3C home > Mailing lists > Public > public-bioschemas@w3.org > November 2017

Re: Protein representation with and without BioChemEntity

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Fri, 3 Nov 2017 10:52:55 +0100
Message-ID: <CALcEXf51KdzCcsQDuZnvzy0CFJ-B7qqV9ciFyDEAB2E4MNP_-w@mail.gmail.com>
To: Melanie Courtot <mcourtot@ebi.ac.uk>
Cc: Justin Clark-Casey <justinccdev@gmail.com>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
SIO is a perfectly acceptable choice, if I say so myself :)

m.

On Thu, Nov 2, 2017 at 11:02 AM, Melanie Courtot <mcourtot@ebi.ac.uk> wrote:

> Hi,
>
> I'm wondering if we could take it a step further, and instead of defining
> specific properties we could just reuse terms from RO (or else)?
>
> For example, "http://semanticscience.org/resource/is-transcribed-from"
> <http://semanticscience.org/resource/is-transcribed-from> could be
> replaced by http://purl.obolibrary.org/obo/RO_0002510, "transcribed
> from", and "isContainedIn" could be  http://purl.obolibrary.org/
> obo/RO_0001018, "contained in".
>
> Cheers,
> Melanie
>
> --
> Mélanie Courtot, PhD
> GA4GH/BioSamples Project lead
> European Bioinformatics Institute (EMBL-EBI)
>
>
>
> On 01/11/2017 16:18, Justin Clark-Casey wrote:
>
> Direct term reuse sounds like a good choice to me, especially as
>
> a) it's the mechanism that schema.org themselves have to add existing
> ontology classes and terms to the structured data
> b) will make applications much easier to write as they can use existing
> general tooling
> c) allows us to do everything we were doing with AdditionalProperty and
> d) still allows us to define profiles without having to move everything
> through schema.org
>
> -- Justin
>
> On Wed, Nov 1, 2017 at 3:56 PM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
> wrote:
>
>> Hi All,
>>
>> Apologies for the delay in sending this email. I have been working with
>> Carole on submitting an Implementation Study proposal to the Data Platform
>> for more work on Bioschemas.
>>
>> For representing a specific bioscience type, e.g. a protein, we currently
>> have a proposal for using a generic wrapper approach that we then
>> specialise, e.g. BioChemEntity specialised with a Protein profile.
>>
>> Protein profile
>> http://bioschemas.org/specifications/Protein/specification/
>> BioChemEntity type
>> http://bioschemas.org/specifications/BioChemEntity/specification/
>>
>> To help understand the various advantages and disadvantages of this
>> approach, Kenneth and I have drawn up an example of marking up a specific
>> protein first using the current proposal and second  if we were to do the
>> same with a specific ProteinEntity. Below are the examples and some
>> analysis of them.
>>
>> *BioChemEntity Example*
>> Minimum markup using BioChemEntity
>> https://github.com/BioSchemas/specifications/blob/master/Phy
>> sicalEntity/examples/BioChemEntity-min.jsonld
>>
>> Minimum + Recommended markup using BioChemEntity
>> https://github.com/BioSchemas/specifications/blob/master/Phy
>> sicalEntity/examples/BioChemEntity-min%2Brec.jsonld
>> <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min+rec.jsonld>
>>
>> One thing to note is that the minimum + recommended markup is not an
>> additive extension of the minimum markup. Due to the use of the
>> AdditionalProperty relationship, you need to use an JSON array and add the
>> properties from the recommended level within the existing array.
>>
>> An advantage of this approach is that it reuses terms from existing
>> ontologies and we can represent types that do not currently exist in
>> Schema.org <http://schema.org>, e.g. Genes, Chemicals, etc.
>>
>> *ProteinEntity example*
>> Minimum markup using ProteinEntity
>> https://github.com/BioSchemas/specifications/blob/master/Phy
>> sicalEntity/examples/ProteinEntity-min.jsonld
>>
>> Minimum + Recommended markup using ProteinEntity
>> https://github.com/BioSchemas/specifications/blob/master/Phy
>> sicalEntity/examples/ProteinEntity-min%2Brec.jsonld
>> <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min+rec.jsonld>
>>
>> While the markup in these examples using ProteinEntity is easier to
>> interpret, the number of items that need to be changed to markup another
>> protein is the same as in the BioChemEntity approach. The simplified markup
>> should enable easier adoption, although we could help the current proposal
>> of using BioChemEntity by using highlighting on the Bioschemas site to show
>> which terms need to be changed.
>>
>> A major downside of this approach is that we would need to add all the
>> types to Schema.org <http://schema.org> or host them at Bioschemas.org
>> <http://bioschemas.org>. While these could be mapped to existing terms,
>> we would be accused of duplicating existing ontology terms.
>>
>> *Direct term reuse example*
>> Last week, I showed the above examples to Dan (we were at ISWC together).
>> He pointed out that the additionalProperty relation was added to allow the
>> use of property/value pairs where the properties do not exist in an
>> ontology. We are in the situation where the properties we are using come
>> from ontologies. Dan suggested that we just use them directly. Note that
>> the example also exploits the fact that you can define multiple types.
>>
>> Minimum markup using BioChemEntity and term reuse
>> https://github.com/BioSchemas/specifications/blob/master/Phy
>> sicalEntity/examples/BioChemEntityAlt-min.jsonld
>>
>> Minimum + Recommended markup using BioChemEntity and term reuse
>> https://github.com/BioSchemas/specifications/blob/master/Phy
>> sicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld
>> <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld>
>>
>> As you will see, this seems to have the advantages of both the above
>> approaches. The markup is more straightforward than the additionalProperty
>> approach, but exploits reusing existing domain ontologies. The tooling and
>> exploitation will be much more straightforward.
>>
>> I invite you all to review and comment on these different examples. Do we
>> believe that the BioChemEntity with term reuse (the third set of examples)
>> is an appropriate path going forward?
>>
>> Best regards
>>
>> Alasdair
>>
>> PS Sorry for the long email
>>
>> Alasdair J G Gray
>>
>> Fellow of the Higher Education Academy
>> Assistant Professor in Computer Science,
>> School of Mathematical and Computer Sciences
>> (Athena SWAN Bronze Award)
>> Heriot-Watt University, Edinburgh UK.
>>
>> Email: A.J.G.Gray@hw.ac.uk
>> Web: http://www.macs.hw.ac.uk/~ajg33
>> ORCID: http://orcid.org/0000-0002-5711-4872
>> Office: Earl Mountbatten Building 1.39
>> Twitter: @gray_alasdair
>>
>> ------------------------------
>>
>> *Heriot-Watt University is The Times & The Sunday Times International
>> University of the Year 2018*
>>
>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
>> campuses and students across the entire globe we span the world, delivering
>> innovation and educational excellence in business, engineering, design and
>> the physical, social and life sciences.
>>
>> This email is generated from the Heriot-Watt University Group, which
>> includes:
>>
>>    1. Heriot-Watt University, a Scottish charity registered under number
>>    SC000278
>>    2. Edinburgh Business School a Charity Registered in Scotland,
>>    SC026900. Edinburgh Business School is a company limited by guarantee,
>>    registered in Scotland with registered number SC173556 and registered
>>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>>    Midlothian, EH14 4AS
>>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>>    performance centre for sport. Heriot-Watt Services Limited is a private
>>    limited company registered is Scotland with registered number SC271030 and
>>    registered office at Research & Enterprise Services Heriot-Watt University,
>>    Riccarton, Edinburgh, EH14 4AS.
>>
>> The contents (including any attachments) are confidential. If you are not
>> the intended recipient of this e-mail, any disclosure, copying,
>> distribution or use of its contents is strictly prohibited, and you should
>> please notify the sender immediately and then delete it (including any
>> attachments) from your system.
>>
>
>
>


-- 
Michel Dumontier
Distinguished Professor of Data Science
Maastricht University
http://dumontierlab.com
Received on Friday, 3 November 2017 09:53:41 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:00 UTC