Re: Protein representation with and without BioChemEntity

Hi Andrers,

The proposal I make is about how to structure the markup for a protein.

We hope to use the same approach for other biological entities in the future. For that we would need to identify what the appropriate type is and the key properties required for discoverability.

Best regards

Alasdair

On 1 Nov 2017, at 18:05, Anders Riutta <anders.riutta@gladstone.ucsf.edu<mailto:anders.riutta@gladstone.ucsf.edu>> wrote:

Hi Alasdair,

Some biologists speak in terms of gene products, for example, describing FMO3 (ncbigene:2328) as catalyzing the conversion of nicotine to nicotine-N-oxide<https://www.wikipathways.org/index.php/Pathway:WP1600>, even though it's actually the FMO3 enzyme, not the gene, that does the catalysis.

Could I describe this using the scheme you're proposing?

Thanks,
Anders Riutta

On Wed, Nov 1, 2017 at 8:56 AM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>> wrote:
Hi All,

Apologies for the delay in sending this email. I have been working with Carole on submitting an Implementation Study proposal to the Data Platform for more work on Bioschemas.

For representing a specific bioscience type, e.g. a protein, we currently have a proposal for using a generic wrapper approach that we then specialise, e.g. BioChemEntity specialised with a Protein profile.

Protein profile
http://bioschemas.org/specifications/Protein/specification/
BioChemEntity type
http://bioschemas.org/specifications/BioChemEntity/specification/

To help understand the various advantages and disadvantages of this approach, Kenneth and I have drawn up an example of marking up a specific protein first using the current proposal and second  if we were to do the same with a specific ProteinEntity. Below are the examples and some analysis of them.

BioChemEntity Example
Minimum markup using BioChemEntity
https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min.jsonld

Minimum + Recommended markup using BioChemEntity
https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min%2Brec.jsonld<https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntity-min+rec.jsonld>

One thing to note is that the minimum + recommended markup is not an additive extension of the minimum markup. Due to the use of the AdditionalProperty relationship, you need to use an JSON array and add the properties from the recommended level within the existing array.

An advantage of this approach is that it reuses terms from existing ontologies and we can represent types that do not currently exist in Schema.org<http://schema.org/>, e.g. Genes, Chemicals, etc.

ProteinEntity example
Minimum markup using ProteinEntity
https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min.jsonld

Minimum + Recommended markup using ProteinEntity
https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min%2Brec.jsonld<https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/ProteinEntity-min+rec.jsonld>

While the markup in these examples using ProteinEntity is easier to interpret, the number of items that need to be changed to markup another protein is the same as in the BioChemEntity approach. The simplified markup should enable easier adoption, although we could help the current proposal of using BioChemEntity by using highlighting on the Bioschemas site to show which terms need to be changed.

A major downside of this approach is that we would need to add all the types to Schema.org<http://schema.org/> or host them at Bioschemas.org<http://bioschemas.org/>. While these could be mapped to existing terms, we would be accused of duplicating existing ontology terms.

Direct term reuse example
Last week, I showed the above examples to Dan (we were at ISWC together). He pointed out that the additionalProperty relation was added to allow the use of property/value pairs where the properties do not exist in an ontology. We are in the situation where the properties we are using come from ontologies. Dan suggested that we just use them directly. Note that the example also exploits the fact that you can define multiple types.

Minimum markup using BioChemEntity and term reuse
https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min.jsonld

Minimum + Recommended markup using BioChemEntity and term reuse
https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld<https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld>

As you will see, this seems to have the advantages of both the above approaches. The markup is more straightforward than the additionalProperty approach, but exploits reusing existing domain ontologies. The tooling and exploitation will be much more straightforward.

I invite you all to review and comment on these different examples. Do we believe that the BioChemEntity with term reuse (the third set of examples) is an appropriate path going forward?

Best regards

Alasdair

PS Sorry for the long email

Alasdair J G Gray

Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair

________________________________

Heriot-Watt University is The Times & The Sunday Times International University of the Year 2018

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.

This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Edinburgh Business School a Charity Registered in Scotland, SC026900. Edinburgh Business School is a company limited by guarantee, registered in Scotland with registered number SC173556 and registered office at Heriot-Watt University Finance Office, Riccarton, Currie, Midlothian, EH14 4AS
  3.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.


Alasdair J G Gray

Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair

Received on Thursday, 2 November 2017 09:49:50 UTC