Re: Protein representation with a Bioschemas context () from Justin Clark-Casey on 2017-11-10 (public-bioschemas@w3.org from November 2017)

From: Justin Clark-Casey <justinccdev@gmail.com>
Date: Fri, 10 Nov 2017 18:10:31 +0000
To: Melanie Courtot <mcourtot@ebi.ac.uk>
Cc: public-bioschemas@w3.org
Message-ID: <CAME9NR9xOV=LO26wZmB5E5pP36XxpVGd1fMFnWBnWGHM-oRXhw@mail.gmail.com>
'Data integration' is probably too strong a phrase for what I have in
mind.  I'm really thinking about discovery and how a search engine (for
example) may know/integrate that 2 different data sources are talking about
the same thing, so that the user gets the a useful/linked set of search
results.

If a user wanted to find proteins transcribed by gene 'ABL1' (following the
examples), then I think it would be a lot simpler if all the JSON-LD uses
the term
"http://semanticscience.org/resource/is-transcribed-from".  Otherwise a
search engine and maybe other applications would need to be aware of all
the mappings to other terms (I know OLS can/will provide this but this will
increase application complexity).

I should be clear that this is thought programming on my part, I haven't
actually tried to implement anything yet :)  It could well be that there's
a lot of value in sources using whatever terms are optimal for them, and
that costs of trying to co-ordinate IRIs are too high.  But I do want to
debate the possible tradeoffs.

On Fri, Nov 10, 2017 at 5:39 PM, Melanie Courtot <mcourtot@ebi.ac.uk> wrote:

> Is data integration really a use case for Bioschemas?  The stated goal of
> Bioschemas is to extend schema.org to provide markup for pages, and IIRC
> the use cases discussed at the last meeting were about discovery and
> retrieval.
>
> Cheers,
> Melanie
>
>
>
> On 10/11/2017 16:30, Justin Clark-Casey wrote:
>
>>
>>
>> On 10/11/17 14:21, ljgarcia wrote:
>>
>>> Hi,
>>>
>>> I thought we did not want to impose any IRI. Is there any reason why
>>>>> we should?
>>>>>
>>>>
>>>> But then we sacrifice the interoperability and understanding that we
>>>> are striving for. If you look at the n-quads for the two examples
>>>> (included at the end of this email) then you will see a different set
>>>> of triples.
>>>>
>>>
>>> If there are mappings between the terms, that interoperability we want
>>> to achieve could still be achieved, could not it? With mappings, we still
>>> can transform any n-quads to the, let's say, canonical Bioschemas defined
>>> form. Would this not be a way? If a mapping cannot be found, then
>>> validation fails. Bioschemas should then use mapping tools and clearly
>>> state what the use mappings tool is.
>>>
>>
>> If consuming applications have to use term mappings then this will make
>> them much harder to write, and in some cases might make it impossible to
>> integrate some information.  This might only be a problem for code that is
>> trying to integrate data across websites, but this is an important use case.
>>
>> At least for mandatory properties and types, and major profiles (gene,
>> protein, etc.), I would like to see pre-agreed IRIs, rather than free
>> choice or emerging consensus.  In some ways, I don't think this is so
>> different from what we are doing with DataCatalog, Sample,
>> TrainingMaterial, etc.
>>
>>
>>> Regards,
>>>
>>> On 2017-11-10 14:07, Gray, Alasdair J G wrote:
>>>
>>>> On 10 Nov 2017, at 13:28, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote:
>>>>> I was under the same impression than Melanie. We agree on aliases
>>>>> but providers can decide what is their preferred IRI for any of
>>>>> them. A Bioschemas Protein context would just provide a default
>>>>> context that can also be used as a template where IRIs (but not
>>>>> aliases) can be modified. And of course, anyone could add more
>>>>> aliases, Bioschemas will just not parse those outside the
>>>>> default/template provided context.
>>>>>
>>>>> I thought we did not want to impose any IRI. Is there any reason why
>>>>> we should?
>>>>>
>>>>
>>>> But then we sacrifice the interoperability and understanding that we
>>>> are striving for. If you look at the n-quads for the two examples
>>>> (included at the end of this email) then you will see a different set
>>>> of triples. Aliases are only defined within the document. When you
>>>> interpret them they give you different meanings. If we go down this
>>>> route, we would need to make our tooling with knowledge of either all
>>>> possible terms that will be used or mapping aware.
>>>>
>>>> Alasdair
>>>>
>>>> http://tinyurl.com/y9mu423y
>>>>
>>>> <http://identifiers.org/ncbigene/25> <http://schema.org/name> "ABL1" .
>>>>
>>>> <http://identifiers.org/ncbigene/25>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.obolibrary.org/obo/SO_0000704> .
>>>> <http://identifiers.org/ncbigene/25>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://schema.org/BioChemEntity> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://schema.org/alternateName> "ABL" .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://schema.org/alternateName> "JTK7" .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://schema.org/description> "Non-receptor tyrosine-protein kinase
>>>> that plays a role..." .
>>>> <http://identifiers.org/uniprot/P00519> <http://schema.org/name>
>>>> "ABL1" .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://semanticscience.org/resource/SIO_000001>
>>>> <http://pfam.xfam.org/clan/CL0001> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://semanticscience.org/resource/SIO_010081>
>>>> <http://identifiers.org/ncbigene/25> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.obolibrary.org/obo/PR_000000001> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://schema.org/BioChemEntity> .
>>>>
>>>> http://tinyurl.com/yd5snze2
>>>>
>>>> <http://identifiers.org/ncbigene/25> <http://schema.org/name> "ABL1" .
>>>>
>>>> <http://identifiers.org/ncbigene/25>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.obolibrary.org/obo/OGI_0000004> .
>>>> <http://identifiers.org/ncbigene/25>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://schema.org/BioChemEntity> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://purl.obolibrary.org/obo/RO_0002510>
>>>> <http://identifiers.org/ncbigene/25> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://schema.org/alternateName> "ABL" .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://schema.org/alternateName> "JTK7" .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://schema.org/description> "Non-receptor tyrosine-protein kinase
>>>> that plays a role..." .
>>>> <http://identifiers.org/uniprot/P00519> <http://schema.org/name>
>>>> "ABL1" .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://semanticscience.org/resource/SIO_000001>
>>>> <http://pfam.xfam.org/clan/CL0001> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://purl.obolibrary.org/obo/NCIT_C17021> .
>>>> <http://identifiers.org/uniprot/P00519>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://schema.org/BioChemEntity> .
>>>>
>>>> Alasdair J G Gray
>>>>
>>>>  Fellow of the Higher Education Academy
>>>> Assistant Professor in Computer Science,
>>>> School of Mathematical and Computer Sciences
>>>> (Athena SWAN Bronze Award)
>>>> Heriot-Watt University, Edinburgh UK.
>>>>
>>>> Email: A.J.G.Gray@hw.ac.uk
>>>> Web: http://www.macs.hw.ac.uk/~ajg33
>>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>>> Office: Earl Mountbatten Building 1.39
>>>> Twitter: @gray_alasdair
>>>>
>>>>  Untitled Document .fsize { font-family: Arial, Helvetica Neue,
>>>> Helvetica, sans-serif; font-size: 10px; }
>>>>
>>>> -------------------------
>>>>
>>>> _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY TIMES INTERNATIONAL
>>>> UNIVERSITY OF THE YEAR 2018_
>>>>
>>>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
>>>> campuses and students across the entire globe we span the world,
>>>> delivering innovation and educational excellence in business,
>>>> engineering, design and the physical, social and life sciences.
>>>>
>>>> This email is generated from the Heriot-Watt University Group, which
>>>> includes:
>>>>
>>>>      * Heriot-Watt University, a Scottish charity registered under
>>>> number
>>>> SC000278
>>>>     * Edinburgh Business School a Charity Registered in Scotland,
>>>> SC026900. Edinburgh Business School is a company limited by guarantee,
>>>> registered in Scotland with registered number SC173556 and registered
>>>> office at Heriot-Watt University Finance Office, Riccarton, Currie,
>>>> Midlothian, EH14 4AS
>>>>     * Heriot- Watt Services Limited (Oriam), Scotland's national
>>>> performance centre for sport. Heriot-Watt Services Limited is a
>>>> private limited company registered is Scotland with registered number
>>>> SC271030 and registered office at Research & Enterprise Services
>>>> Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>>>
>>>> The contents (including any attachments) are confidential. If you are
>>>> not the intended recipient of this e-mail, any disclosure, copying,
>>>> distribution or use of its contents is strictly prohibited, and you
>>>> should please notify the sender immediately and then delete it
>>>> (including any attachments) from your system.
>>>>
>>>
>>>
>>
>
>
Received on Friday, 10 November 2017 18:10:59 UTC