Re: Protein representation with a Bioschemas context ()

Dear all,

I'd like to bring a few elements into the discussion wrt. aliases.

In JSON-LD, aliases are just a handy short-cut notation with a local 
scope: an alias just applies within the scope of the context where it is 
defined. And more importantly, an alias should not bear any meaning. The 
first thing a consumer app does with JSON-LD is to expand all terms, 
which immediately removes all aliases.

Hence, if I use the Bioschemas.org default context:
    @context {  "Gene": {  "@id": 
"http://purl.obolibrary.org/obo/SO_0000704"  } ... }
I will typically write:  "@type": [ "BioChemEntity", "Gene" ]

But I may well write a document with a custom alias:
     @context {  "GeneAlias": {  "@id": 
"http://purl.obolibrary.org/obo/SO_0000704"  } ... }
and write:   "@type": [ "BioChemEntity", "GeneAlias" ]
With:
     @context {  "obo": {  "@id": "http://purl.obolibrary.org/obo/" } ... }
I would write:   "@type": [ "BioChemEntity", "obo:SO_0000704" ]
Or I could even not use any alias:   "@type": [ "BioChemEntity", 
"http://purl.obolibrary.org/obo/SO_0000704" ]

These are all equivalent from the point of view of a data consumer.

In my view, the default context should be a useful guide for those 
annotating data with Bioschemas.org markup, but alias names should not 
matter at all. What matters is the URIs to which aliases resolve.

I feel like the solution of agreed pre-defined URIs, whatever the 
aliases used, is more sustainable. After all, schema.org advocates for 
the use of specific agreed-upton terms. If one uses them, their pages 
are more likely to be discoverable. They can chose to use other terms if 
this is convenient for them, but then there is not guarantee that the 
pages will be discovered as easily.

Franck.


Le 13/11/2017 à 19:02, Leyla Garcia a écrit :
> Hi all,
>
> Rather that relying on Bioschemas clients to do the hard work on 
> mapping, I was thinking to leave this to Bioschemas itself. So, if a 
> client wants to retrieve the, let's say, "canonical" Bioschemas markup 
> (which will use the recommended ontology terms as defined by main 
> providers for recommended and minimum properties) then this client 
> will use a Bioschemas provided tool. If a client is happy with a 
> customized Bioschemas mark up (using whichever preferred ontology 
> terms but always the predefined aliases) then this client will go 
> directly to the source. Any optional property with no alias will 
> remain as provided. Whenever possible, data providers will prefer 
> schema.org and Bioschemas named properties.
>
> In this way we support freedom of ontology terms choice, but also 
> support collation of information from multiple sources (soft way to 
> refer to data integration).
>
> How does it sound? How would that work for Bioschemas? A canonical 
> transforming tool/web service should be provided as well as servers 
> and maintenance. How would this work for schema.org/Google? Dan, via 
> Alasdair, kind of proposed the use of third-party properties. How this 
> alias-based way?
>
> Regards,
>
> On 13/11/2017 16:00, Melanie Courtot wrote:
>> How does that currently work for schema.org, and could the same be 
>> used with Bioschemas?
>>
>> Looking at Bioschemas as a markup language for existing data, we 
>> should aim for the lower adoption threshold possible, including 
>> unconstrained ontology terms, keeping required properties minimal, 
>> and not having an overly complicated structure with many new 
>> properties; I worry that otherwise people will just not use it.
>>
>>
>>
>> On 10/11/2017 18:10, Justin Clark-Casey wrote:
>>> 'Data integration' is probably too strong a phrase for what I have 
>>> in mind.  I'm really thinking about discovery and how a search 
>>> engine (for example) may know/integrate that 2 different data 
>>> sources are talking about the same thing, so that the user gets the 
>>> a useful/linked set of search results.
>>>
>>> If a user wanted to find proteins transcribed by gene 'ABL1' 
>>> (following the examples), then I think it would be a lot simpler if 
>>> all the JSON-LD uses the term
>>> "http://semanticscience.org/resource/is-transcribed-from".  
>>> Otherwise a search engine and maybe other applications would need to 
>>> be aware of all the mappings to other terms (I know OLS can/will 
>>> provide this but this will increase application complexity).
>>>
>>> I should be clear that this is thought programming on my part, I 
>>> haven't actually tried to implement anything yet :)  It could well 
>>> be that there's a lot of value in sources using whatever terms are 
>>> optimal for them, and that costs of trying to co-ordinate IRIs are 
>>> too high.  But I do want to debate the possible tradeoffs.
>>>
>>> On Fri, Nov 10, 2017 at 5:39 PM, Melanie Courtot <mcourtot@ebi.ac.uk 
>>> <mailto:mcourtot@ebi.ac.uk>> wrote:
>>>
>>>     Is data integration really a use case for Bioschemas?  The
>>>     stated goal of Bioschemas is to extend schema.org
>>>     <http://schema.org> to provide markup for pages, and IIRC the
>>>     use cases discussed at the last meeting were about discovery and
>>>     retrieval.
>>>
>>>     Cheers,
>>>     Melanie
>>>
>>>
>>>
>>>     On 10/11/2017 16:30, Justin Clark-Casey wrote:
>>>
>>>
>>>
>>>         On 10/11/17 14:21, ljgarcia wrote:
>>>
>>>             Hi,
>>>
>>>                     I thought we did not want to impose any IRI. Is
>>>                     there any reason why
>>>                     we should?
>>>
>>>
>>>                 But then we sacrifice the interoperability and
>>>                 understanding that we
>>>                 are striving for. If you look at the n-quads for the
>>>                 two examples
>>>                 (included at the end of this email) then you will
>>>                 see a different set
>>>                 of triples.
>>>
>>>
>>>             If there are mappings between the terms, that
>>>             interoperability we want to achieve could still be
>>>             achieved, could not it? With mappings, we still can
>>>             transform any n-quads to the, let's say, canonical
>>>             Bioschemas defined form. Would this not be a way? If a
>>>             mapping cannot be found, then validation fails.
>>>             Bioschemas should then use mapping tools and clearly
>>>             state what the use mappings tool is.
>>>
>>>
>>>         If consuming applications have to use term mappings then
>>>         this will make them much harder to write, and in some cases
>>>         might make it impossible to integrate some information. 
>>>         This might only be a problem for code that is trying to
>>>         integrate data across websites, but this is an important use
>>>         case.
>>>
>>>         At least for mandatory properties and types, and major
>>>         profiles (gene, protein, etc.), I would like to see
>>>         pre-agreed IRIs, rather than free choice or emerging
>>>         consensus.  In some ways, I don't think this is so different
>>>         from what we are doing with DataCatalog, Sample,
>>>         TrainingMaterial, etc.
>>>
>>>
>>>             Regards,
>>>
>>>             On 2017-11-10 14:07, Gray, Alasdair J G wrote:
>>>
>>>                     On 10 Nov 2017, at 13:28, Leyla Garcia
>>>                     <ljgarcia@ebi.ac.uk <mailto:ljgarcia@ebi.ac.uk>>
>>>                     wrote:
>>>                     I was under the same impression than Melanie. We
>>>                     agree on aliases
>>>                     but providers can decide what is their preferred
>>>                     IRI for any of
>>>                     them. A Bioschemas Protein context would just
>>>                     provide a default
>>>                     context that can also be used as a template
>>>                     where IRIs (but not
>>>                     aliases) can be modified. And of course, anyone
>>>                     could add more
>>>                     aliases, Bioschemas will just not parse those
>>>                     outside the
>>>                     default/template provided context.
>>>
>>>                     I thought we did not want to impose any IRI. Is
>>>                     there any reason why
>>>                     we should?
>>>
>>>
>>>                 But then we sacrifice the interoperability and
>>>                 understanding that we
>>>                 are striving for. If you look at the n-quads for the
>>>                 two examples
>>>                 (included at the end of this email) then you will
>>>                 see a different set
>>>                 of triples. Aliases are only defined within the
>>>                 document. When you
>>>                 interpret them they give you different meanings. If
>>>                 we go down this
>>>                 route, we would need to make our tooling with
>>>                 knowledge of either all
>>>                 possible terms that will be used or mapping aware.
>>>
>>>                 Alasdair
>>>
>>>                 http://tinyurl.com/y9mu423y
>>>
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>>
>>>                 <http://schema.org/name> "ABL1" .
>>>
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://purl.obolibrary.org/obo/SO_0000704
>>>                 <http://purl.obolibrary.org/obo/SO_0000704>> .
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://schema.org/BioChemEntity
>>>                 <http://schema.org/BioChemEntity>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/alternateName
>>>                 <http://schema.org/alternateName>> "ABL" .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/alternateName
>>>                 <http://schema.org/alternateName>> "JTK7" .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/description> "Non-receptor
>>>                 tyrosine-protein kinase
>>>                 that plays a role..." .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/name>
>>>                 "ABL1" .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://semanticscience.org/resource/SIO_000001
>>>                 <http://semanticscience.org/resource/SIO_000001>>
>>>                 <http://pfam.xfam.org/clan/CL0001
>>>                 <http://pfam.xfam.org/clan/CL0001>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://semanticscience.org/resource/SIO_010081
>>>                 <http://semanticscience.org/resource/SIO_010081>>
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://purl.obolibrary.org/obo/PR_000000001
>>>                 <http://purl.obolibrary.org/obo/PR_000000001>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://schema.org/BioChemEntity
>>>                 <http://schema.org/BioChemEntity>> .
>>>
>>>                 http://tinyurl.com/yd5snze2
>>>
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>>
>>>                 <http://schema.org/name> "ABL1" .
>>>
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://purl.obolibrary.org/obo/OGI_0000004
>>>                 <http://purl.obolibrary.org/obo/OGI_0000004>> .
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://schema.org/BioChemEntity
>>>                 <http://schema.org/BioChemEntity>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://purl.obolibrary.org/obo/RO_0002510
>>>                 <http://purl.obolibrary.org/obo/RO_0002510>>
>>>                 <http://identifiers.org/ncbigene/25
>>>                 <http://identifiers.org/ncbigene/25>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/alternateName
>>>                 <http://schema.org/alternateName>> "ABL" .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/alternateName
>>>                 <http://schema.org/alternateName>> "JTK7" .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/description> "Non-receptor
>>>                 tyrosine-protein kinase
>>>                 that plays a role..." .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://schema.org/name>
>>>                 "ABL1" .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://semanticscience.org/resource/SIO_000001
>>>                 <http://semanticscience.org/resource/SIO_000001>>
>>>                 <http://pfam.xfam.org/clan/CL0001
>>>                 <http://pfam.xfam.org/clan/CL0001>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://purl.obolibrary.org/obo/NCIT_C17021
>>>                 <http://purl.obolibrary.org/obo/NCIT_C17021>> .
>>>                 <http://identifiers.org/uniprot/P00519
>>>                 <http://identifiers.org/uniprot/P00519>>
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>                 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>                 <http://schema.org/BioChemEntity
>>>                 <http://schema.org/BioChemEntity>> .
>>>
>>>                 Alasdair J G Gray
>>>
>>>                  Fellow of the Higher Education Academy
>>>                 Assistant Professor in Computer Science,
>>>                 School of Mathematical and Computer Sciences
>>>                 (Athena SWAN Bronze Award)
>>>                 Heriot-Watt University, Edinburgh UK.
>>>
>>>                 Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
>>>                 Web: http://www.macs.hw.ac.uk/~ajg33
>>>                 <http://www.macs.hw.ac.uk/%7Eajg33>
>>>                 ORCID: http://orcid.org/0000-0002-5711-4872
>>>                 <http://orcid.org/0000-0002-5711-4872>
>>>                 Office: Earl Mountbatten Building 1.39
>>>                 Twitter: @gray_alasdair
>>>
>>>                  Untitled Document .fsize { font-family: Arial,
>>>                 Helvetica Neue,
>>>                 Helvetica, sans-serif; font-size: 10px; }
>>>
>>>                 -------------------------
>>>
>>>                 _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY
>>>                 TIMES INTERNATIONAL
>>>                 UNIVERSITY OF THE YEAR 2018_
>>>
>>>                 Founded in 1821, Heriot-Watt is a leader in ideas
>>>                 and solutions. With
>>>                 campuses and students across the entire globe we
>>>                 span the world,
>>>                 delivering innovation and educational excellence in
>>>                 business,
>>>                 engineering, design and the physical, social and
>>>                 life sciences.
>>>
>>>                 This email is generated from the Heriot-Watt
>>>                 University Group, which
>>>                 includes:
>>>
>>>                      * Heriot-Watt University, a Scottish charity
>>>                 registered under number
>>>                 SC000278
>>>                     * Edinburgh Business School a Charity Registered
>>>                 in Scotland,
>>>                 SC026900. Edinburgh Business School is a company
>>>                 limited by guarantee,
>>>                 registered in Scotland with registered number
>>>                 SC173556 and registered
>>>                 office at Heriot-Watt University Finance Office,
>>>                 Riccarton, Currie,
>>>                 Midlothian, EH14 4AS
>>>                     * Heriot- Watt Services Limited (Oriam),
>>>                 Scotland's national
>>>                 performance centre for sport. Heriot-Watt Services
>>>                 Limited is a
>>>                 private limited company registered is Scotland with
>>>                 registered number
>>>                 SC271030 and registered office at Research &
>>>                 Enterprise Services
>>>                 Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>>
>>>                 The contents (including any attachments) are
>>>                 confidential. If you are
>>>                 not the intended recipient of this e-mail, any
>>>                 disclosure, copying,
>>>                 distribution or use of its contents is strictly
>>>                 prohibited, and you
>>>                 should please notify the sender immediately and then
>>>                 delete it
>>>                 (including any attachments) from your system.
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Received on Tuesday, 14 November 2017 10:22:33 UTC