W3C home > Mailing lists > Public > public-bioschemas@w3.org > November 2017

Re: Protein representation with and without BioChemEntity

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Fri, 3 Nov 2017 14:29:04 +0100
Message-ID: <CALcEXf50KDBjkYYBGPxmyiySnzouEZXR6_7pxevbp+0=Auvd3g@mail.gmail.com>
To: Leyla Garcia <ljgarcia@ebi.ac.uk>
Cc: Melanie Courtot <mcourtot@ebi.ac.uk>, public-bioschemas@w3.org
yes of course, i was equally surprised to see SIO there in the first place.


On Fri, Nov 3, 2017 at 1:50 PM, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote:

> Hi all,
> IMHO, if we want to be as inclusive as possible, we should not impose
> anything outside the scope of type and profile specifications. So, if we
> are not to impose RO, SIO, PRO, SO or any other, we should not impose
> either OxO. We might find out that other groups also have their own
> preferences there (Bioportal mapping, for instance).
> Regards,
> On 03/11/2017 11:44, Melanie Courtot wrote:
> Hi Michel,
> By "a step further" I was trying to make 2 points (and looking back my
> email was probably not explicit enough):
> - Bioschemas should probably steer clear of debating the best choice of
> ontology. In general, and as I mentioned during the last meeting, we
> probably don't want to constrain which resources are being used as
> different communities/users/groups will have their favorite ones, and will
> probably not want to recode. Leyla mentions exisiting mappings, and we
> could rely on the OxO tool [1] as well. It'd be interesting to see if we
> could make this work with validation tools.
> - I don't think we should declare many Bioschemas specific properties
> provided there is a mechanism for reusing existing relations. In the
> current example at https://github.com/BioSchemas/
> specifications/blob/master/PhysicalEntity/examples/BioChemEn
> tityAlt-min.jsonld there is a "isContainedin" which I don't think
> Bioschemas should redefine for example (and I see both SIO and RO have the
> relation already)
> Cheers,
> Melanie
> [1] https://www.ebi.ac.uk/spot/oxo/index
> On 03/11/2017 09:52, Michel Dumontier wrote:
> SIO is a perfectly acceptable choice, if I say so myself :)
> m.
> On Thu, Nov 2, 2017 at 11:02 AM, Melanie Courtot <mcourtot@ebi.ac.uk>
> wrote:
>> Hi,
>> I'm wondering if we could take it a step further, and instead of defining
>> specific properties we could just reuse terms from RO (or else)?
>> For example, "http://semanticscience.org/resource/is-transcribed-from"
>> <http://semanticscience.org/resource/is-transcribed-from> could be
>> replaced by http://purl.obolibrary.org/obo/RO_0002510, "transcribed
>> from", and "isContainedIn" could be  http://purl.obolibrary.org/obo
>> /RO_0001018, "contained in".
>> Cheers,
>> Melanie
>> --
>> Mélanie Courtot, PhD
>> GA4GH/BioSamples Project lead
>> European Bioinformatics Institute (EMBL-EBI)
>> On 01/11/2017 16:18, Justin Clark-Casey wrote:
>> Direct term reuse sounds like a good choice to me, especially as
>> a) it's the mechanism that schema.org themselves have to add existing
>> ontology classes and terms to the structured data
>> b) will make applications much easier to write as they can use existing
>> general tooling
>> c) allows us to do everything we were doing with AdditionalProperty and
>> d) still allows us to define profiles without having to move everything
>> through schema.org
>> -- Justin
>> On Wed, Nov 1, 2017 at 3:56 PM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
>> wrote:
>>> Hi All,
>>> Apologies for the delay in sending this email. I have been working with
>>> Carole on submitting an Implementation Study proposal to the Data Platform
>>> for more work on Bioschemas.
>>> For representing a specific bioscience type, e.g. a protein, we
>>> currently have a proposal for using a generic wrapper approach that we then
>>> specialise, e.g. BioChemEntity specialised with a Protein profile.
>>> Protein profile
>>> http://bioschemas.org/specifications/Protein/specification/
>>> BioChemEntity type
>>> http://bioschemas.org/specifications/BioChemEntity/specification/
>>> To help understand the various advantages and disadvantages of this
>>> approach, Kenneth and I have drawn up an example of marking up a specific
>>> protein first using the current proposal and second  if we were to do the
>>> same with a specific ProteinEntity. Below are the examples and some
>>> analysis of them.
>>> *BioChemEntity Example*
>>> Minimum markup using BioChemEntity
>>> https://github.com/BioSchemas/specifications/blob/master/Phy
>>> sicalEntity/examples/BioChemEntity-min.jsonld
>>> Minimum + Recommended markup using BioChemEntity
>>> https://github.com/BioSchemas/specifications/blob/master/Phy
>>> sicalEntity/examples/BioChemEntity-min%2Brec.jsonld
>>> <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min+rec.jsonld>
>>> One thing to note is that the minimum + recommended markup is not an
>>> additive extension of the minimum markup. Due to the use of the
>>> AdditionalProperty relationship, you need to use an JSON array and add the
>>> properties from the recommended level within the existing array.
>>> An advantage of this approach is that it reuses terms from existing
>>> ontologies and we can represent types that do not currently exist in
>>> Schema.org <http://schema.org>, e.g. Genes, Chemicals, etc.
>>> *ProteinEntity example*
>>> Minimum markup using ProteinEntity
>>> https://github.com/BioSchemas/specifications/blob/master/Phy
>>> sicalEntity/examples/ProteinEntity-min.jsonld
>>> Minimum + Recommended markup using ProteinEntity
>>> https://github.com/BioSchemas/specifications/blob/master/Phy
>>> sicalEntity/examples/ProteinEntity-min%2Brec.jsonld
>>> <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min+rec.jsonld>
>>> While the markup in these examples using ProteinEntity is easier to
>>> interpret, the number of items that need to be changed to markup another
>>> protein is the same as in the BioChemEntity approach. The simplified markup
>>> should enable easier adoption, although we could help the current proposal
>>> of using BioChemEntity by using highlighting on the Bioschemas site to show
>>> which terms need to be changed.
>>> A major downside of this approach is that we would need to add all the
>>> types to Schema.org <http://schema.org> or host them at Bioschemas.org
>>> <http://bioschemas.org>. While these could be mapped to existing terms,
>>> we would be accused of duplicating existing ontology terms.
>>> *Direct term reuse example*
>>> Last week, I showed the above examples to Dan (we were at ISWC
>>> together). He pointed out that the additionalProperty relation was added to
>>> allow the use of property/value pairs where the properties do not exist in
>>> an ontology. We are in the situation where the properties we are using come
>>> from ontologies. Dan suggested that we just use them directly. Note that
>>> the example also exploits the fact that you can define multiple types.
>>> Minimum markup using BioChemEntity and term reuse
>>> https://github.com/BioSchemas/specifications/blob/master/Phy
>>> sicalEntity/examples/BioChemEntityAlt-min.jsonld
>>> Minimum + Recommended markup using BioChemEntity and term reuse
>>> https://github.com/BioSchemas/specifications/blob/master/Phy
>>> sicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld
>>> <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld>
>>> As you will see, this seems to have the advantages of both the above
>>> approaches. The markup is more straightforward than the additionalProperty
>>> approach, but exploits reusing existing domain ontologies. The tooling and
>>> exploitation will be much more straightforward.
>>> I invite you all to review and comment on these different examples. Do
>>> we believe that the BioChemEntity with term reuse (the third set of
>>> examples) is an appropriate path going forward?
>>> Best regards
>>> Alasdair
>>> PS Sorry for the long email
>>> Alasdair J G Gray
>>> Fellow of the Higher Education Academy
>>> Assistant Professor in Computer Science,
>>> School of Mathematical and Computer Sciences
>>> (Athena SWAN Bronze Award)
>>> Heriot-Watt University, Edinburgh UK.
>>> Email: A.J.G.Gray@hw.ac.uk
>>> Web: http://www.macs.hw.ac.uk/~ajg33
>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>> Office: Earl Mountbatten Building 1.39
>>> Twitter: @gray_alasdair
>>> ------------------------------
>>> *Heriot-Watt University is The Times & The Sunday Times International
>>> University of the Year 2018*
>>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
>>> campuses and students across the entire globe we span the world, delivering
>>> innovation and educational excellence in business, engineering, design and
>>> the physical, social and life sciences.
>>> This email is generated from the Heriot-Watt University Group, which
>>> includes:
>>>    1. Heriot-Watt University, a Scottish charity registered under
>>>    number SC000278
>>>    2. Edinburgh Business School a Charity Registered in Scotland,
>>>    SC026900. Edinburgh Business School is a company limited by guarantee,
>>>    registered in Scotland with registered number SC173556 and registered
>>>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>>>    Midlothian, EH14 4AS
>>>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>>>    performance centre for sport. Heriot-Watt Services Limited is a private
>>>    limited company registered is Scotland with registered number SC271030 and
>>>    registered office at Research & Enterprise Services Heriot-Watt University,
>>>    Riccarton, Edinburgh, EH14 4AS.
>>> The contents (including any attachments) are confidential. If you are
>>> not the intended recipient of this e-mail, any disclosure, copying,
>>> distribution or use of its contents is strictly prohibited, and you should
>>> please notify the sender immediately and then delete it (including any
>>> attachments) from your system.
> --
> Michel Dumontier
> Distinguished Professor of Data Science
> Maastricht University
> http://dumontierlab.com

Michel Dumontier
Distinguished Professor of Data Science
Maastricht University
Received on Friday, 3 November 2017 13:29:50 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:00 UTC