Re: Protein representation with a Bioschemas context () from Leyla Garcia on 2017-11-14 (public-bioschemas@w3.org from November 2017)

From: Leyla Garcia <ljgarcia@ebi.ac.uk>
Date: Tue, 14 Nov 2017 11:03:36 +0000
To: Franck Michel <franck.michel@cnrs.fr>, public-bioschemas@w3.org, Andra Waagmeester <andra@micelio.be>, Melanie Courtot <mcourtot@ebi.ac.uk>, Justin Clark-Casey <justinccdev@gmail.com>
Message-ID: <0489f072-e8c0-c171-aa8a-7f2149c0746f@ebi.ac.uk>
Hi all,

Would it be right to summarize our current options as the following?

1. There will be a unique Bioschemas context which will define 
recommended aliases together with officially mandatory agreed IRIs for 
all different profiles. If data providers want to use alternative IRIs, 
they can do so via additionalType. Data consumers can go to any data 
provider and directly get the mark up from them as all data providers 
will use the agreed IRIs.

2. There will be a Bioschemas context template that will state the 
officially mandatory agreed aliases and the recommended canonical IRIs. 
Data providers can use alternative IRIs. Bioschemas will provide a 
translation service that will take any mark up to the canonical form. 
Data consumers would retrieve the mark up via Bioschemas translator 
(otherwise they will end up with all sort of IRIs).

I would say option 1 is what schema.org actually does. Rather than using 
myOntology:citation, if I want to comply with schema.org, I use 
schema:citation and so on.

I still would like to hear some thoughts from schema.org people. as well 
as from other data providers. As a data provider, we are happy either 
way, we can accommodate.

I would also suggest to Governance group to propose a data for voting in 
this matter as, at some point, we have to make a decision. No pressure, 
but it would be great if such decision can be reached before the first 
week of December so it can be included in the poster we will have at 
SWAT4LS.

Regards,


On 14/11/2017 10:21, Franck Michel wrote:
> Dear all,
>
> I'd like to bring a few elements into the discussion wrt. aliases.
>
> In JSON-LD, aliases are just a handy short-cut notation with a local 
> scope: an alias just applies within the scope of the context where it 
> is defined. And more importantly, an alias should not bear any 
> meaning. The first thing a consumer app does with JSON-LD is to expand 
> all terms, which immediately removes all aliases.
>
> Hence, if I use the Bioschemas.org default context:
>    @context {  "Gene": {  "@id": 
> "http://purl.obolibrary.org/obo/SO_0000704" } ... }
> I will typically write:  "@type": [ "BioChemEntity", "Gene" ]
>
> But I may well write a document with a custom alias:
>     @context {  "GeneAlias": {  "@id": 
> "http://purl.obolibrary.org/obo/SO_0000704" } ... }
> and write:   "@type": [ "BioChemEntity", "GeneAlias" ]
> With:
>     @context {  "obo": {  "@id": "http://purl.obolibrary.org/obo/" } ... }
> I would write:   "@type": [ "BioChemEntity", "obo:SO_0000704" ]
> Or I could even not use any alias:   "@type": [ "BioChemEntity", 
> "http://purl.obolibrary.org/obo/SO_0000704" ]
>
> These are all equivalent from the point of view of a data consumer.
>
> In my view, the default context should be a useful guide for those 
> annotating data with Bioschemas.org markup, but alias names should not 
> matter at all. What matters is the URIs to which aliases resolve.
>
> I feel like the solution of agreed pre-defined URIs, whatever the 
> aliases used, is more sustainable. After all, schema.org advocates for 
> the use of specific agreed-upton terms. If one uses them, their pages 
> are more likely to be discoverable. They can chose to use other terms 
> if this is convenient for them, but then there is not guarantee that 
> the pages will be discovered as easily.
>
> Franck.
>
>
> Le 13/11/2017 à 19:02, Leyla Garcia a écrit :
>> Hi all,
>>
>> Rather that relying on Bioschemas clients to do the hard work on 
>> mapping, I was thinking to leave this to Bioschemas itself. So, if a 
>> client wants to retrieve the, let's say, "canonical" Bioschemas 
>> markup (which will use the recommended ontology terms as defined by 
>> main providers for recommended and minimum properties) then this 
>> client will use a Bioschemas provided tool. If a client is happy with 
>> a customized Bioschemas mark up (using whichever preferred ontology 
>> terms but always the predefined aliases) then this client will go 
>> directly to the source. Any optional property with no alias will 
>> remain as provided. Whenever possible, data providers will prefer 
>> schema.org and Bioschemas named properties.
>>
>> In this way we support freedom of ontology terms choice, but also 
>> support collation of information from multiple sources (soft way to 
>> refer to data integration).
>>
>> How does it sound? How would that work for Bioschemas? A canonical 
>> transforming tool/web service should be provided as well as servers 
>> and maintenance. How would this work for schema.org/Google? Dan, via 
>> Alasdair, kind of proposed the use of third-party properties. How 
>> this alias-based way?
>>
>> Regards,
>>
>> On 13/11/2017 16:00, Melanie Courtot wrote:
>>> How does that currently work for schema.org, and could the same be 
>>> used with Bioschemas?
>>>
>>> Looking at Bioschemas as a markup language for existing data, we 
>>> should aim for the lower adoption threshold possible, including 
>>> unconstrained ontology terms, keeping required properties minimal, 
>>> and not having an overly complicated structure with many new 
>>> properties; I worry that otherwise people will just not use it.
>>>
>>>
>>>
>>> On 10/11/2017 18:10, Justin Clark-Casey wrote:
>>>> 'Data integration' is probably too strong a phrase for what I have 
>>>> in mind.  I'm really thinking about discovery and how a search 
>>>> engine (for example) may know/integrate that 2 different data 
>>>> sources are talking about the same thing, so that the user gets the 
>>>> a useful/linked set of search results.
>>>>
>>>> If a user wanted to find proteins transcribed by gene 'ABL1' 
>>>> (following the examples), then I think it would be a lot simpler if 
>>>> all the JSON-LD uses the term
>>>> "http://semanticscience.org/resource/is-transcribed-from".  
>>>> Otherwise a search engine and maybe other applications would need 
>>>> to be aware of all the mappings to other terms (I know OLS can/will 
>>>> provide this but this will increase application complexity).
>>>>
>>>> I should be clear that this is thought programming on my part, I 
>>>> haven't actually tried to implement anything yet :)  It could well 
>>>> be that there's a lot of value in sources using whatever terms are 
>>>> optimal for them, and that costs of trying to co-ordinate IRIs are 
>>>> too high.  But I do want to debate the possible tradeoffs.
>>>>
>>>> On Fri, Nov 10, 2017 at 5:39 PM, Melanie Courtot 
>>>> <mcourtot@ebi.ac.uk <mailto:mcourtot@ebi.ac.uk>> wrote:
>>>>
>>>>     Is data integration really a use case for Bioschemas? The
>>>>     stated goal of Bioschemas is to extend schema.org
>>>>     <http://schema.org> to provide markup for pages, and IIRC the
>>>>     use cases discussed at the last meeting were about discovery
>>>>     and retrieval.
>>>>
>>>>     Cheers,
>>>>     Melanie
>>>>
>>>>
>>>>
>>>>     On 10/11/2017 16:30, Justin Clark-Casey wrote:
>>>>
>>>>
>>>>
>>>>         On 10/11/17 14:21, ljgarcia wrote:
>>>>
>>>>             Hi,
>>>>
>>>>                     I thought we did not want to impose any IRI. Is
>>>>                     there any reason why
>>>>                     we should?
>>>>
>>>>
>>>>                 But then we sacrifice the interoperability and
>>>>                 understanding that we
>>>>                 are striving for. If you look at the n-quads for
>>>>                 the two examples
>>>>                 (included at the end of this email) then you will
>>>>                 see a different set
>>>>                 of triples.
>>>>
>>>>
>>>>             If there are mappings between the terms, that
>>>>             interoperability we want to achieve could still be
>>>>             achieved, could not it? With mappings, we still can
>>>>             transform any n-quads to the, let's say, canonical
>>>>             Bioschemas defined form. Would this not be a way? If a
>>>>             mapping cannot be found, then validation fails.
>>>>             Bioschemas should then use mapping tools and clearly
>>>>             state what the use mappings tool is.
>>>>
>>>>
>>>>         If consuming applications have to use term mappings then
>>>>         this will make them much harder to write, and in some cases
>>>>         might make it impossible to integrate some information. 
>>>>         This might only be a problem for code that is trying to
>>>>         integrate data across websites, but this is an important
>>>>         use case.
>>>>
>>>>         At least for mandatory properties and types, and major
>>>>         profiles (gene, protein, etc.), I would like to see
>>>>         pre-agreed IRIs, rather than free choice or emerging
>>>>         consensus.  In some ways, I don't think this is so
>>>>         different from what we are doing with DataCatalog, Sample,
>>>>         TrainingMaterial, etc.
>>>>
>>>>
>>>>             Regards,
>>>>
>>>>             On 2017-11-10 14:07, Gray, Alasdair J G wrote:
>>>>
>>>>                     On 10 Nov 2017, at 13:28, Leyla Garcia
>>>>                     <ljgarcia@ebi.ac.uk
>>>>                     <mailto:ljgarcia@ebi.ac.uk>> wrote:
>>>>                     I was under the same impression than Melanie.
>>>>                     We agree on aliases
>>>>                     but providers can decide what is their
>>>>                     preferred IRI for any of
>>>>                     them. A Bioschemas Protein context would just
>>>>                     provide a default
>>>>                     context that can also be used as a template
>>>>                     where IRIs (but not
>>>>                     aliases) can be modified. And of course, anyone
>>>>                     could add more
>>>>                     aliases, Bioschemas will just not parse those
>>>>                     outside the
>>>>                     default/template provided context.
>>>>
>>>>                     I thought we did not want to impose any IRI. Is
>>>>                     there any reason why
>>>>                     we should?
>>>>
>>>>
>>>>                 But then we sacrifice the interoperability and
>>>>                 understanding that we
>>>>                 are striving for. If you look at the n-quads for
>>>>                 the two examples
>>>>                 (included at the end of this email) then you will
>>>>                 see a different set
>>>>                 of triples. Aliases are only defined within the
>>>>                 document. When you
>>>>                 interpret them they give you different meanings. If
>>>>                 we go down this
>>>>                 route, we would need to make our tooling with
>>>>                 knowledge of either all
>>>>                 possible terms that will be used or mapping aware.
>>>>
>>>>                 Alasdair
>>>>
>>>>                 http://tinyurl.com/y9mu423y
>>>>
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>                 <http://schema.org/name> "ABL1" .
>>>>
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://purl.obolibrary.org/obo/SO_0000704
>>>>                 <http://purl.obolibrary.org/obo/SO_0000704>> .
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://schema.org/BioChemEntity
>>>>                 <http://schema.org/BioChemEntity>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/alternateName
>>>>                 <http://schema.org/alternateName>> "ABL" .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/alternateName
>>>>                 <http://schema.org/alternateName>> "JTK7" .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/description> "Non-receptor
>>>>                 tyrosine-protein kinase
>>>>                 that plays a role..." .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/name>
>>>>                 "ABL1" .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://semanticscience.org/resource/SIO_000001
>>>>                 <http://semanticscience.org/resource/SIO_000001>>
>>>>                 <http://pfam.xfam.org/clan/CL0001
>>>>                 <http://pfam.xfam.org/clan/CL0001>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://semanticscience.org/resource/SIO_010081
>>>>                 <http://semanticscience.org/resource/SIO_010081>>
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://purl.obolibrary.org/obo/PR_000000001
>>>>                 <http://purl.obolibrary.org/obo/PR_000000001>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://schema.org/BioChemEntity
>>>>                 <http://schema.org/BioChemEntity>> .
>>>>
>>>>                 http://tinyurl.com/yd5snze2
>>>>
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>                 <http://schema.org/name> "ABL1" .
>>>>
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://purl.obolibrary.org/obo/OGI_0000004
>>>>                 <http://purl.obolibrary.org/obo/OGI_0000004>> .
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://schema.org/BioChemEntity
>>>>                 <http://schema.org/BioChemEntity>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://purl.obolibrary.org/obo/RO_0002510
>>>>                 <http://purl.obolibrary.org/obo/RO_0002510>>
>>>>                 <http://identifiers.org/ncbigene/25
>>>>                 <http://identifiers.org/ncbigene/25>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/alternateName
>>>>                 <http://schema.org/alternateName>> "ABL" .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/alternateName
>>>>                 <http://schema.org/alternateName>> "JTK7" .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/description> "Non-receptor
>>>>                 tyrosine-protein kinase
>>>>                 that plays a role..." .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://schema.org/name>
>>>>                 "ABL1" .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://semanticscience.org/resource/SIO_000001
>>>>                 <http://semanticscience.org/resource/SIO_000001>>
>>>>                 <http://pfam.xfam.org/clan/CL0001
>>>>                 <http://pfam.xfam.org/clan/CL0001>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://purl.obolibrary.org/obo/NCIT_C17021
>>>>                 <http://purl.obolibrary.org/obo/NCIT_C17021>> .
>>>>                 <http://identifiers.org/uniprot/P00519
>>>>                 <http://identifiers.org/uniprot/P00519>>
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>                 <http://schema.org/BioChemEntity
>>>>                 <http://schema.org/BioChemEntity>> .
>>>>
>>>>                 Alasdair J G Gray
>>>>
>>>>                  Fellow of the Higher Education Academy
>>>>                 Assistant Professor in Computer Science,
>>>>                 School of Mathematical and Computer Sciences
>>>>                 (Athena SWAN Bronze Award)
>>>>                 Heriot-Watt University, Edinburgh UK.
>>>>
>>>>                 Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
>>>>                 Web: http://www.macs.hw.ac.uk/~ajg33
>>>>                 <http://www.macs.hw.ac.uk/%7Eajg33>
>>>>                 ORCID: http://orcid.org/0000-0002-5711-4872
>>>>                 <http://orcid.org/0000-0002-5711-4872>
>>>>                 Office: Earl Mountbatten Building 1.39
>>>>                 Twitter: @gray_alasdair
>>>>
>>>>                  Untitled Document .fsize { font-family: Arial,
>>>>                 Helvetica Neue,
>>>>                 Helvetica, sans-serif; font-size: 10px; }
>>>>
>>>>                 -------------------------
>>>>
>>>>                 _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY
>>>>                 TIMES INTERNATIONAL
>>>>                 UNIVERSITY OF THE YEAR 2018_
>>>>
>>>>                 Founded in 1821, Heriot-Watt is a leader in ideas
>>>>                 and solutions. With
>>>>                 campuses and students across the entire globe we
>>>>                 span the world,
>>>>                 delivering innovation and educational excellence in
>>>>                 business,
>>>>                 engineering, design and the physical, social and
>>>>                 life sciences.
>>>>
>>>>                 This email is generated from the Heriot-Watt
>>>>                 University Group, which
>>>>                 includes:
>>>>
>>>>                      * Heriot-Watt University, a Scottish charity
>>>>                 registered under number
>>>>                 SC000278
>>>>                     * Edinburgh Business School a Charity
>>>>                 Registered in Scotland,
>>>>                 SC026900. Edinburgh Business School is a company
>>>>                 limited by guarantee,
>>>>                 registered in Scotland with registered number
>>>>                 SC173556 and registered
>>>>                 office at Heriot-Watt University Finance Office,
>>>>                 Riccarton, Currie,
>>>>                 Midlothian, EH14 4AS
>>>>                     * Heriot- Watt Services Limited (Oriam),
>>>>                 Scotland's national
>>>>                 performance centre for sport. Heriot-Watt Services
>>>>                 Limited is a
>>>>                 private limited company registered is Scotland with
>>>>                 registered number
>>>>                 SC271030 and registered office at Research &
>>>>                 Enterprise Services
>>>>                 Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>>>
>>>>                 The contents (including any attachments) are
>>>>                 confidential. If you are
>>>>                 not the intended recipient of this e-mail, any
>>>>                 disclosure, copying,
>>>>                 distribution or use of its contents is strictly
>>>>                 prohibited, and you
>>>>                 should please notify the sender immediately and
>>>>                 then delete it
>>>>                 (including any attachments) from your system.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
Received on Tuesday, 14 November 2017 11:04:13 UTC