Re: Protein representation with a Bioschemas context ()

I'm kind of agreeing here that fixing IRIs is perhaps too extreme. These can be frozen, thawed,  modified  and replaced as more data improves their information value and genes and their products get better described. Properties, what they belong to, do need to be fixed. But I also see that one needs to identify the right IRIs and stick to them to move forwards in a snapshot kind of way...imho.
-------- Original message --------From: Andra Waagmeester <andra@micelio.be> Date: 15/11/2017  06:04  (GMT+00:00) To: Leyla Garcia <ljgarcia@ebi.ac.uk> Cc: Justin Clark-Casey <jc955@cam.ac.uk>, public-bioschemas@w3.org Subject: Re: Protein representation with a Bioschemas context () 
Hi Leyla,
     I think I like your suggestion which seems to allow reuse of external IRIs. What I don't understand is why the IRIs will be agreed upon and then fixed. Isn't this rather limiting the potential use cases? 
Take for example the property "associatedDisease" which is now linked to "http://semanticscience.org/resource/SIO_000983.rdf" which is labelled as gene-disease association and not as protein-disease association. I understand the rationale here, but pedantically speaking the association here is quite implicit, since technically the disease association is with an underlying gene and not the protein.  The point is that if the bioschemas protein community agrees on this IRI to be explicitly linked to the minted "associatedDisease"  we will not be able to use more expressive properties if they exist or emerge. 
Wouldn't the best option simply be to be strict on the type Protein, but for the remaining properties use the complete ontological space out there, without any limitations. 
Andra
 

 "transcribedFrom" . With this choice, we only are able to map protein-coding genes. So if we want to map non-protein coding genes, and there is a Gene entity introduced in bioschemas do we also introduce the property "transribes"?
Likewise with the associatedDisease. Why is this not associatedPhenotype? 




On Tue, Nov 14, 2017 at 2:57 PM, Leyla Garcia <ljgarcia@ebi..ac.uk> wrote:
Hi,



Nice to get that many comments!



So, it looks like we are talking about something like https://github.com/BioSchemas/specifications/blob/master/Protein/examples/ProteinEntity-with-context.json where the context containing Gene and so will become the Bioschemas context and the IRIs will be agreed and then fixed. That example includes a third-party property which is always possible whenever schema.org or Bioschemas do not provide a better option.



Regards





On 14/11/2017 12:41, Justin Clark-Casey wrote:


I agree.  As Alasdair and Franck say, I feel that a major benefit of schema.org is in providing agreed upon minimal terms that aid findability.



Pragmatically, data sources would always be free to use their own terms and additionalTypes (I don't think that bioschemas can or should forbid this), but they should be aware that there are agreed upon terms that will make their data findable/usable by a distributed community, rather than only by a few applications that are especially aware of their markup.



I also agree with Stephen that relying on a central collator is too much overhead.  To me, this introduces a single point of failure that conflicts with the spirit of the web.



-- Justin Clark-Casey



On 14/11/17 12:02, Gray, Alasdair J G wrote:


Dear All,



I think Franck’s email clearly explains the situation here.



Schema.org <http://schema.org> is about everyone buying in to use a common set of terms to markup their content. If they buy-in to that then they get the benefit. Otherwise you are just on the linked data web.



Bioschemas is about making Schema.org <http://schema.org> relevant for the life sciences. We have agreed as a community that we prefer to reuse an existing ontology term than mint our own. However, to me, it means that we do need to select a single ontology term. It is through this agreement that we will see benefit whilst also keeping the route to adoption straightforward.



Alasdair




On 14 Nov 2017, at 10:21, Franck Michel <franck.michel@cnrs.fr <mailto:franck.michel@cnrs.fr>> wrote:



Dear all,



I'd like to bring a few elements into the discussion wrt. aliases.



In JSON-LD, aliases are just a handy short-cut notation with a local scope: an alias just applies within the scope of the context where it is defined. And more importantly, an alias should not bear any meaning. The first thing a consumer app does with JSON-LD is to expand all terms, which immediately removes all aliases.



Hence, if I use theBioschemas.org <http://bioschemas.org/>default context:

@context {  "Gene": { "@id":"http://purl.obolibrary.org/obo/SO_0000704"} ... }

I will typically write:  "@type": [ "BioChemEntity", "Gene" ]



But I may well write a document with a custom alias:

@context {  "GeneAlias": { "@id":"http://purl.obolibrary.org/obo/SO_0000704"} ... }

and write:   "@type": [ "BioChemEntity", "GeneAlias" ]

With:

@context {  "obo": {  "@id":"http://purl.obolibrary.org/obo/"} ... }

I would write:   "@type": [ "BioChemEntity", "obo:SO_0000704" ]

Or I could even not use any alias:   "@type": [ "BioChemEntity","http://purl.obolibrary.org/obo/SO_0000704"]



These are all equivalent from the point of view of a data consumer.



In my view, the default context should be a useful guide for those annotating data withBioschemas.org <http://bioschemas.org/>markup, but alias names should not matter at all. What matters is the URIs to which aliases resolve.



I feel like the solution of agreed pre-defined URIs, whatever the aliases used, is more sustainable. After all,schema.org <http://schema.org/>advocates for the use of specific agreed-upton terms. If one uses them, their pages are more likely to be discoverable. They can chose to use other terms if this is convenient for them, but then there is not guarantee that the pages will be discovered as easily.



Franck.




Alasdair J G Gray

Fellow of the Higher Education Academy

Assistant Professor in Computer Science,

School of Mathematical and Computer Sciences

(Athena SWAN Bronze Award)

Heriot-Watt University, Edinburgh UK.



Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>

Web: http://www.macs.hw.ac.uk/~ajg33

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39

Twitter: @gray_alasdair





















Untitled Document

---------------------------------------------------------------------------------------------------------------------------------------------------------------- 



*/Heriot-Watt University is The Times & The Sunday Times International University of the Year 2018/*



Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.



This email is generated from the Heriot-Watt University Group, which includes:



 1. Heriot-Watt University, a Scottish charity registered under number SC000278

 2. Edinburgh Business School a Charity Registered in Scotland, SC026900. Edinburgh Business School is a company limited by guarantee, registered in Scotland

    with registered number SC173556 and registered office at Heriot-Watt University Finance Office, Riccarton, Currie, Midlothian, EH14 4AS

 3. Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company

    registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh,

    EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

Received on Wednesday, 15 November 2017 06:55:07 UTC