Re: Protein representation with a Bioschemas context () from Stephen Anyango on 2017-11-14 (public-bioschemas@w3.org from November 2017)

From: Stephen Anyango <anyango@ebi.ac.uk>
Date: Tue, 14 Nov 2017 11:56:31 +0000
To: public-bioschemas@w3.org
Message-ID: <1ec48755-cd19-801f-4ced-e2213b7fb4c5@ebi.ac.uk>
Hello,

As a data provider, we are not very specific on ontology/IRI. The 
important thing I believe would be consistency which is easy to document 
and hence provide reference. Option 2 appears to be overhead on the tool 
developers/data consumers, and not necessarily on the data providers.

Kind regards,

Stephen Anyango
PDBe
EMBL-EBI

On 14-Nov-17 11:03 AM, Leyla Garcia wrote:
> Hi all,
>
> Would it be right to summarize our current options as the following?
>
> 1. There will be a unique Bioschemas context which will define 
> recommended aliases together with officially mandatory agreed IRIs for 
> all different profiles. If data providers want to use alternative 
> IRIs, they can do so via additionalType. Data consumers can go to any 
> data provider and directly get the mark up from them as all data 
> providers will use the agreed IRIs.
>
> 2. There will be a Bioschemas context template that will state the 
> officially mandatory agreed aliases and the recommended canonical 
> IRIs. Data providers can use alternative IRIs. Bioschemas will provide 
> a translation service that will take any mark up to the canonical 
> form. Data consumers would retrieve the mark up via Bioschemas 
> translator (otherwise they will end up with all sort of IRIs).
>
> I would say option 1 is what schema.org actually does. Rather than 
> using myOntology:citation, if I want to comply with schema.org, I use 
> schema:citation and so on.
>
> I still would like to hear some thoughts from schema.org people. as 
> well as from other data providers. As a data provider, we are happy 
> either way, we can accommodate.
>
> I would also suggest to Governance group to propose a data for voting 
> in this matter as, at some point, we have to make a decision. No 
> pressure, but it would be great if such decision can be reached before 
> the first week of December so it can be included in the poster we will 
> have at SWAT4LS.
>
> Regards,
>
>
> On 14/11/2017 10:21, Franck Michel wrote:
>> Dear all,
>>
>> I'd like to bring a few elements into the discussion wrt. aliases.
>>
>> In JSON-LD, aliases are just a handy short-cut notation with a local 
>> scope: an alias just applies within the scope of the context where it 
>> is defined. And more importantly, an alias should not bear any 
>> meaning. The first thing a consumer app does with JSON-LD is to 
>> expand all terms, which immediately removes all aliases.
>>
>> Hence, if I use the Bioschemas.org default context:
>>    @context {  "Gene": {  "@id": 
>> "http://purl.obolibrary.org/obo/SO_0000704" } ... }
>> I will typically write:  "@type": [ "BioChemEntity", "Gene" ]
>>
>> But I may well write a document with a custom alias:
>>     @context {  "GeneAlias": {  "@id": 
>> "http://purl.obolibrary.org/obo/SO_0000704" } ... }
>> and write:   "@type": [ "BioChemEntity", "GeneAlias" ]
>> With:
>>     @context {  "obo": {  "@id": "http://purl.obolibrary.org/obo/" } 
>> ... }
>> I would write:   "@type": [ "BioChemEntity", "obo:SO_0000704" ]
>> Or I could even not use any alias:   "@type": [ "BioChemEntity", 
>> "http://purl.obolibrary.org/obo/SO_0000704" ]
>>
>> These are all equivalent from the point of view of a data consumer.
>>
>> In my view, the default context should be a useful guide for those 
>> annotating data with Bioschemas.org markup, but alias names should 
>> not matter at all. What matters is the URIs to which aliases resolve.
>>
>> I feel like the solution of agreed pre-defined URIs, whatever the 
>> aliases used, is more sustainable. After all, schema.org advocates 
>> for the use of specific agreed-upton terms. If one uses them, their 
>> pages are more likely to be discoverable. They can chose to use other 
>> terms if this is convenient for them, but then there is not guarantee 
>> that the pages will be discovered as easily.
>>
>> Franck.
>>
>>
>> Le 13/11/2017 à 19:02, Leyla Garcia a écrit :
>>> Hi all,
>>>
>>> Rather that relying on Bioschemas clients to do the hard work on 
>>> mapping, I was thinking to leave this to Bioschemas itself. So, if a 
>>> client wants to retrieve the, let's say, "canonical" Bioschemas 
>>> markup (which will use the recommended ontology terms as defined by 
>>> main providers for recommended and minimum properties) then this 
>>> client will use a Bioschemas provided tool. If a client is happy 
>>> with a customized Bioschemas mark up (using whichever preferred 
>>> ontology terms but always the predefined aliases) then this client 
>>> will go directly to the source. Any optional property with no alias 
>>> will remain as provided. Whenever possible, data providers will 
>>> prefer schema.org and Bioschemas named properties.
>>>
>>> In this way we support freedom of ontology terms choice, but also 
>>> support collation of information from multiple sources (soft way to 
>>> refer to data integration).
>>>
>>> How does it sound? How would that work for Bioschemas? A canonical 
>>> transforming tool/web service should be provided as well as servers 
>>> and maintenance. How would this work for schema.org/Google? Dan, via 
>>> Alasdair, kind of proposed the use of third-party properties. How 
>>> this alias-based way?
>>>
>>> Regards,
>>>
>>> On 13/11/2017 16:00, Melanie Courtot wrote:
>>>> How does that currently work for schema.org, and could the same be 
>>>> used with Bioschemas?
>>>>
>>>> Looking at Bioschemas as a markup language for existing data, we 
>>>> should aim for the lower adoption threshold possible, including 
>>>> unconstrained ontology terms, keeping required properties minimal, 
>>>> and not having an overly complicated structure with many new 
>>>> properties; I worry that otherwise people will just not use it.
>>>>
>>>>
>>>>
>>>> On 10/11/2017 18:10, Justin Clark-Casey wrote:
>>>>> 'Data integration' is probably too strong a phrase for what I have 
>>>>> in mind.  I'm really thinking about discovery and how a search 
>>>>> engine (for example) may know/integrate that 2 different data 
>>>>> sources are talking about the same thing, so that the user gets 
>>>>> the a useful/linked set of search results.
>>>>>
>>>>> If a user wanted to find proteins transcribed by gene 'ABL1' 
>>>>> (following the examples), then I think it would be a lot simpler 
>>>>> if all the JSON-LD uses the term
>>>>> "http://semanticscience.org/resource/is-transcribed-from".  
>>>>> Otherwise a search engine and maybe other applications would need 
>>>>> to be aware of all the mappings to other terms (I know OLS 
>>>>> can/will provide this but this will increase application complexity).
>>>>>
>>>>> I should be clear that this is thought programming on my part, I 
>>>>> haven't actually tried to implement anything yet :)  It could well 
>>>>> be that there's a lot of value in sources using whatever terms are 
>>>>> optimal for them, and that costs of trying to co-ordinate IRIs are 
>>>>> too high.  But I do want to debate the possible tradeoffs.
>>>>>
>>>>> On Fri, Nov 10, 2017 at 5:39 PM, Melanie Courtot 
>>>>> <mcourtot@ebi.ac.uk <mailto:mcourtot@ebi.ac.uk>> wrote:
>>>>>
>>>>>     Is data integration really a use case for Bioschemas? The
>>>>>     stated goal of Bioschemas is to extend schema.org
>>>>>     <http://schema.org> to provide markup for pages, and IIRC the
>>>>>     use cases discussed at the last meeting were about discovery
>>>>>     and retrieval.
>>>>>
>>>>>     Cheers,
>>>>>     Melanie
>>>>>
>>>>>
>>>>>
>>>>>     On 10/11/2017 16:30, Justin Clark-Casey wrote:
>>>>>
>>>>>
>>>>>
>>>>>         On 10/11/17 14:21, ljgarcia wrote:
>>>>>
>>>>>             Hi,
>>>>>
>>>>>                     I thought we did not want to impose any IRI.
>>>>>                     Is there any reason why
>>>>>                     we should?
>>>>>
>>>>>
>>>>>                 But then we sacrifice the interoperability and
>>>>>                 understanding that we
>>>>>                 are striving for. If you look at the n-quads for
>>>>>                 the two examples
>>>>>                 (included at the end of this email) then you will
>>>>>                 see a different set
>>>>>                 of triples.
>>>>>
>>>>>
>>>>>             If there are mappings between the terms, that
>>>>>             interoperability we want to achieve could still be
>>>>>             achieved, could not it? With mappings, we still can
>>>>>             transform any n-quads to the, let's say, canonical
>>>>>             Bioschemas defined form. Would this not be a way? If a
>>>>>             mapping cannot be found, then validation fails.
>>>>>             Bioschemas should then use mapping tools and clearly
>>>>>             state what the use mappings tool is.
>>>>>
>>>>>
>>>>>         If consuming applications have to use term mappings then
>>>>>         this will make them much harder to write, and in some
>>>>>         cases might make it impossible to integrate some
>>>>>         information. This might only be a problem for code that is
>>>>>         trying to integrate data across websites, but this is an
>>>>>         important use case.
>>>>>
>>>>>         At least for mandatory properties and types, and major
>>>>>         profiles (gene, protein, etc.), I would like to see
>>>>>         pre-agreed IRIs, rather than free choice or emerging
>>>>>         consensus.  In some ways, I don't think this is so
>>>>>         different from what we are doing with DataCatalog, Sample,
>>>>>         TrainingMaterial, etc.
>>>>>
>>>>>
>>>>>             Regards,
>>>>>
>>>>>             On 2017-11-10 14:07, Gray, Alasdair J G wrote:
>>>>>
>>>>>                     On 10 Nov 2017, at 13:28, Leyla Garcia
>>>>>                     <ljgarcia@ebi.ac.uk
>>>>>                     <mailto:ljgarcia@ebi.ac.uk>> wrote:
>>>>>                     I was under the same impression than Melanie.
>>>>>                     We agree on aliases
>>>>>                     but providers can decide what is their
>>>>>                     preferred IRI for any of
>>>>>                     them. A Bioschemas Protein context would just
>>>>>                     provide a default
>>>>>                     context that can also be used as a template
>>>>>                     where IRIs (but not
>>>>>                     aliases) can be modified. And of course,
>>>>>                     anyone could add more
>>>>>                     aliases, Bioschemas will just not parse those
>>>>>                     outside the
>>>>>                     default/template provided context.
>>>>>
>>>>>                     I thought we did not want to impose any IRI.
>>>>>                     Is there any reason why
>>>>>                     we should?
>>>>>
>>>>>
>>>>>                 But then we sacrifice the interoperability and
>>>>>                 understanding that we
>>>>>                 are striving for. If you look at the n-quads for
>>>>>                 the two examples
>>>>>                 (included at the end of this email) then you will
>>>>>                 see a different set
>>>>>                 of triples. Aliases are only defined within the
>>>>>                 document. When you
>>>>>                 interpret them they give you different meanings.
>>>>>                 If we go down this
>>>>>                 route, we would need to make our tooling with
>>>>>                 knowledge of either all
>>>>>                 possible terms that will be used or mapping aware.
>>>>>
>>>>>                 Alasdair
>>>>>
>>>>>                 http://tinyurl.com/y9mu423y
>>>>>
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>>                 <http://schema.org/name> "ABL1" .
>>>>>
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://purl.obolibrary.org/obo/SO_0000704
>>>>>                 <http://purl.obolibrary.org/obo/SO_0000704>> .
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://schema.org/BioChemEntity
>>>>>                 <http://schema.org/BioChemEntity>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/alternateName
>>>>>                 <http://schema.org/alternateName>> "ABL" .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/alternateName
>>>>>                 <http://schema.org/alternateName>> "JTK7" .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/description> "Non-receptor
>>>>>                 tyrosine-protein kinase
>>>>>                 that plays a role..." .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/name>
>>>>>                 "ABL1" .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://semanticscience.org/resource/SIO_000001
>>>>>                 <http://semanticscience.org/resource/SIO_000001>>
>>>>>                 <http://pfam.xfam.org/clan/CL0001
>>>>>                 <http://pfam.xfam.org/clan/CL0001>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://semanticscience.org/resource/SIO_010081
>>>>>                 <http://semanticscience.org/resource/SIO_010081>>
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://purl.obolibrary.org/obo/PR_000000001
>>>>>                 <http://purl.obolibrary.org/obo/PR_000000001>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://schema.org/BioChemEntity
>>>>>                 <http://schema.org/BioChemEntity>> .
>>>>>
>>>>>                 http://tinyurl.com/yd5snze2
>>>>>
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>>                 <http://schema.org/name> "ABL1" .
>>>>>
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://purl.obolibrary.org/obo/OGI_0000004
>>>>>                 <http://purl.obolibrary.org/obo/OGI_0000004>> .
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://schema.org/BioChemEntity
>>>>>                 <http://schema.org/BioChemEntity>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://purl.obolibrary.org/obo/RO_0002510
>>>>>                 <http://purl.obolibrary.org/obo/RO_0002510>>
>>>>>                 <http://identifiers.org/ncbigene/25
>>>>>                 <http://identifiers.org/ncbigene/25>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/alternateName
>>>>>                 <http://schema.org/alternateName>> "ABL" .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/alternateName
>>>>>                 <http://schema.org/alternateName>> "JTK7" .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/description> "Non-receptor
>>>>>                 tyrosine-protein kinase
>>>>>                 that plays a role..." .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://schema.org/name>
>>>>>                 "ABL1" .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://semanticscience.org/resource/SIO_000001
>>>>>                 <http://semanticscience.org/resource/SIO_000001>>
>>>>>                 <http://pfam.xfam.org/clan/CL0001
>>>>>                 <http://pfam.xfam.org/clan/CL0001>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://purl.obolibrary.org/obo/NCIT_C17021
>>>>>                 <http://purl.obolibrary.org/obo/NCIT_C17021>> .
>>>>>                 <http://identifiers.org/uniprot/P00519
>>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>>                 <http://schema.org/BioChemEntity
>>>>>                 <http://schema.org/BioChemEntity>> .
>>>>>
>>>>>                 Alasdair J G Gray
>>>>>
>>>>>                  Fellow of the Higher Education Academy
>>>>>                 Assistant Professor in Computer Science,
>>>>>                 School of Mathematical and Computer Sciences
>>>>>                 (Athena SWAN Bronze Award)
>>>>>                 Heriot-Watt University, Edinburgh UK.
>>>>>
>>>>>                 Email: A.J.G.Gray@hw.ac.uk
>>>>>                 <mailto:A.J.G.Gray@hw.ac.uk>
>>>>>                 Web: http://www.macs.hw.ac.uk/~ajg33
>>>>>                 <http://www.macs.hw.ac.uk/%7Eajg33>
>>>>>                 ORCID: http://orcid.org/0000-0002-5711-4872
>>>>>                 <http://orcid.org/0000-0002-5711-4872>
>>>>>                 Office: Earl Mountbatten Building 1.39
>>>>>                 Twitter: @gray_alasdair
>>>>>
>>>>>                  Untitled Document .fsize { font-family: Arial,
>>>>>                 Helvetica Neue,
>>>>>                 Helvetica, sans-serif; font-size: 10px; }
>>>>>
>>>>>                 -------------------------
>>>>>
>>>>>                 _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY
>>>>>                 TIMES INTERNATIONAL
>>>>>                 UNIVERSITY OF THE YEAR 2018_
>>>>>
>>>>>                 Founded in 1821, Heriot-Watt is a leader in ideas
>>>>>                 and solutions. With
>>>>>                 campuses and students across the entire globe we
>>>>>                 span the world,
>>>>>                 delivering innovation and educational excellence
>>>>>                 in business,
>>>>>                 engineering, design and the physical, social and
>>>>>                 life sciences.
>>>>>
>>>>>                 This email is generated from the Heriot-Watt
>>>>>                 University Group, which
>>>>>                 includes:
>>>>>
>>>>>                      * Heriot-Watt University, a Scottish charity
>>>>>                 registered under number
>>>>>                 SC000278
>>>>>                     * Edinburgh Business School a Charity
>>>>>                 Registered in Scotland,
>>>>>                 SC026900. Edinburgh Business School is a company
>>>>>                 limited by guarantee,
>>>>>                 registered in Scotland with registered number
>>>>>                 SC173556 and registered
>>>>>                 office at Heriot-Watt University Finance Office,
>>>>>                 Riccarton, Currie,
>>>>>                 Midlothian, EH14 4AS
>>>>>                     * Heriot- Watt Services Limited (Oriam),
>>>>>                 Scotland's national
>>>>>                 performance centre for sport. Heriot-Watt Services
>>>>>                 Limited is a
>>>>>                 private limited company registered is Scotland
>>>>>                 with registered number
>>>>>                 SC271030 and registered office at Research &
>>>>>                 Enterprise Services
>>>>>                 Heriot-Watt University, Riccarton, Edinburgh, EH14
>>>>>                 4AS.
>>>>>
>>>>>                 The contents (including any attachments) are
>>>>>                 confidential. If you are
>>>>>                 not the intended recipient of this e-mail, any
>>>>>                 disclosure, copying,
>>>>>                 distribution or use of its contents is strictly
>>>>>                 prohibited, and you
>>>>>                 should please notify the sender immediately and
>>>>>                 then delete it
>>>>>                 (including any attachments) from your system.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Received on Tuesday, 14 November 2017 11:56:59 UTC