- From: Stephen Anyango <anyango@ebi.ac.uk>
- Date: Tue, 14 Nov 2017 11:56:31 +0000
- To: public-bioschemas@w3.org
- Message-ID: <1ec48755-cd19-801f-4ced-e2213b7fb4c5@ebi.ac.uk>
Hello,
As a data provider, we are not very specific on ontology/IRI. The
important thing I believe would be consistency which is easy to document
and hence provide reference. Option 2 appears to be overhead on the tool
developers/data consumers, and not necessarily on the data providers.
Kind regards,
Stephen Anyango
PDBe
EMBL-EBI
On 14-Nov-17 11:03 AM, Leyla Garcia wrote:
> Hi all,
>
> Would it be right to summarize our current options as the following?
>
> 1. There will be a unique Bioschemas context which will define
> recommended aliases together with officially mandatory agreed IRIs for
> all different profiles. If data providers want to use alternative
> IRIs, they can do so via additionalType. Data consumers can go to any
> data provider and directly get the mark up from them as all data
> providers will use the agreed IRIs.
>
> 2. There will be a Bioschemas context template that will state the
> officially mandatory agreed aliases and the recommended canonical
> IRIs. Data providers can use alternative IRIs. Bioschemas will provide
> a translation service that will take any mark up to the canonical
> form. Data consumers would retrieve the mark up via Bioschemas
> translator (otherwise they will end up with all sort of IRIs).
>
> I would say option 1 is what schema.org actually does. Rather than
> using myOntology:citation, if I want to comply with schema.org, I use
> schema:citation and so on.
>
> I still would like to hear some thoughts from schema.org people. as
> well as from other data providers. As a data provider, we are happy
> either way, we can accommodate.
>
> I would also suggest to Governance group to propose a data for voting
> in this matter as, at some point, we have to make a decision. No
> pressure, but it would be great if such decision can be reached before
> the first week of December so it can be included in the poster we will
> have at SWAT4LS.
>
> Regards,
>
>
> On 14/11/2017 10:21, Franck Michel wrote:
>> Dear all,
>>
>> I'd like to bring a few elements into the discussion wrt. aliases.
>>
>> In JSON-LD, aliases are just a handy short-cut notation with a local
>> scope: an alias just applies within the scope of the context where it
>> is defined. And more importantly, an alias should not bear any
>> meaning. The first thing a consumer app does with JSON-LD is to
>> expand all terms, which immediately removes all aliases.
>>
>> Hence, if I use the Bioschemas.org default context:
>> @context { "Gene": { "@id":
>> "http://purl.obolibrary.org/obo/SO_0000704" } ... }
>> I will typically write: "@type": [ "BioChemEntity", "Gene" ]
>>
>> But I may well write a document with a custom alias:
>> @context { "GeneAlias": { "@id":
>> "http://purl.obolibrary.org/obo/SO_0000704" } ... }
>> and write: "@type": [ "BioChemEntity", "GeneAlias" ]
>> With:
>> @context { "obo": { "@id": "http://purl.obolibrary.org/obo/" }
>> ... }
>> I would write: "@type": [ "BioChemEntity", "obo:SO_0000704" ]
>> Or I could even not use any alias: "@type": [ "BioChemEntity",
>> "http://purl.obolibrary.org/obo/SO_0000704" ]
>>
>> These are all equivalent from the point of view of a data consumer.
>>
>> In my view, the default context should be a useful guide for those
>> annotating data with Bioschemas.org markup, but alias names should
>> not matter at all. What matters is the URIs to which aliases resolve.
>>
>> I feel like the solution of agreed pre-defined URIs, whatever the
>> aliases used, is more sustainable. After all, schema.org advocates
>> for the use of specific agreed-upton terms. If one uses them, their
>> pages are more likely to be discoverable. They can chose to use other
>> terms if this is convenient for them, but then there is not guarantee
>> that the pages will be discovered as easily.
>>
>> Franck.
>>
>>
>> Le 13/11/2017 à 19:02, Leyla Garcia a écrit :
>>> Hi all,
>>>
>>> Rather that relying on Bioschemas clients to do the hard work on
>>> mapping, I was thinking to leave this to Bioschemas itself. So, if a
>>> client wants to retrieve the, let's say, "canonical" Bioschemas
>>> markup (which will use the recommended ontology terms as defined by
>>> main providers for recommended and minimum properties) then this
>>> client will use a Bioschemas provided tool. If a client is happy
>>> with a customized Bioschemas mark up (using whichever preferred
>>> ontology terms but always the predefined aliases) then this client
>>> will go directly to the source. Any optional property with no alias
>>> will remain as provided. Whenever possible, data providers will
>>> prefer schema.org and Bioschemas named properties.
>>>
>>> In this way we support freedom of ontology terms choice, but also
>>> support collation of information from multiple sources (soft way to
>>> refer to data integration).
>>>
>>> How does it sound? How would that work for Bioschemas? A canonical
>>> transforming tool/web service should be provided as well as servers
>>> and maintenance. How would this work for schema.org/Google? Dan, via
>>> Alasdair, kind of proposed the use of third-party properties. How
>>> this alias-based way?
>>>
>>> Regards,
>>>
>>> On 13/11/2017 16:00, Melanie Courtot wrote:
>>>> How does that currently work for schema.org, and could the same be
>>>> used with Bioschemas?
>>>>
>>>> Looking at Bioschemas as a markup language for existing data, we
>>>> should aim for the lower adoption threshold possible, including
>>>> unconstrained ontology terms, keeping required properties minimal,
>>>> and not having an overly complicated structure with many new
>>>> properties; I worry that otherwise people will just not use it.
>>>>
>>>>
>>>>
>>>> On 10/11/2017 18:10, Justin Clark-Casey wrote:
>>>>> 'Data integration' is probably too strong a phrase for what I have
>>>>> in mind. I'm really thinking about discovery and how a search
>>>>> engine (for example) may know/integrate that 2 different data
>>>>> sources are talking about the same thing, so that the user gets
>>>>> the a useful/linked set of search results.
>>>>>
>>>>> If a user wanted to find proteins transcribed by gene 'ABL1'
>>>>> (following the examples), then I think it would be a lot simpler
>>>>> if all the JSON-LD uses the term
>>>>> "http://semanticscience.org/resource/is-transcribed-from".
>>>>> Otherwise a search engine and maybe other applications would need
>>>>> to be aware of all the mappings to other terms (I know OLS
>>>>> can/will provide this but this will increase application complexity).
>>>>>
>>>>> I should be clear that this is thought programming on my part, I
>>>>> haven't actually tried to implement anything yet :) It could well
>>>>> be that there's a lot of value in sources using whatever terms are
>>>>> optimal for them, and that costs of trying to co-ordinate IRIs are
>>>>> too high. But I do want to debate the possible tradeoffs.
>>>>>
>>>>> On Fri, Nov 10, 2017 at 5:39 PM, Melanie Courtot
>>>>> <mcourtot@ebi.ac.uk <mailto:mcourtot@ebi.ac.uk>> wrote:
>>>>>
>>>>> Is data integration really a use case for Bioschemas? The
>>>>> stated goal of Bioschemas is to extend schema.org
>>>>> <http://schema.org> to provide markup for pages, and IIRC the
>>>>> use cases discussed at the last meeting were about discovery
>>>>> and retrieval.
>>>>>
>>>>> Cheers,
>>>>> Melanie
>>>>>
>>>>>
>>>>>
>>>>> On 10/11/2017 16:30, Justin Clark-Casey wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 10/11/17 14:21, ljgarcia wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I thought we did not want to impose any IRI.
>>>>> Is there any reason why
>>>>> we should?
>>>>>
>>>>>
>>>>> But then we sacrifice the interoperability and
>>>>> understanding that we
>>>>> are striving for. If you look at the n-quads for
>>>>> the two examples
>>>>> (included at the end of this email) then you will
>>>>> see a different set
>>>>> of triples.
>>>>>
>>>>>
>>>>> If there are mappings between the terms, that
>>>>> interoperability we want to achieve could still be
>>>>> achieved, could not it? With mappings, we still can
>>>>> transform any n-quads to the, let's say, canonical
>>>>> Bioschemas defined form. Would this not be a way? If a
>>>>> mapping cannot be found, then validation fails.
>>>>> Bioschemas should then use mapping tools and clearly
>>>>> state what the use mappings tool is.
>>>>>
>>>>>
>>>>> If consuming applications have to use term mappings then
>>>>> this will make them much harder to write, and in some
>>>>> cases might make it impossible to integrate some
>>>>> information. This might only be a problem for code that is
>>>>> trying to integrate data across websites, but this is an
>>>>> important use case.
>>>>>
>>>>> At least for mandatory properties and types, and major
>>>>> profiles (gene, protein, etc.), I would like to see
>>>>> pre-agreed IRIs, rather than free choice or emerging
>>>>> consensus. In some ways, I don't think this is so
>>>>> different from what we are doing with DataCatalog, Sample,
>>>>> TrainingMaterial, etc.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> On 2017-11-10 14:07, Gray, Alasdair J G wrote:
>>>>>
>>>>> On 10 Nov 2017, at 13:28, Leyla Garcia
>>>>> <ljgarcia@ebi.ac.uk
>>>>> <mailto:ljgarcia@ebi.ac.uk>> wrote:
>>>>> I was under the same impression than Melanie.
>>>>> We agree on aliases
>>>>> but providers can decide what is their
>>>>> preferred IRI for any of
>>>>> them. A Bioschemas Protein context would just
>>>>> provide a default
>>>>> context that can also be used as a template
>>>>> where IRIs (but not
>>>>> aliases) can be modified. And of course,
>>>>> anyone could add more
>>>>> aliases, Bioschemas will just not parse those
>>>>> outside the
>>>>> default/template provided context.
>>>>>
>>>>> I thought we did not want to impose any IRI.
>>>>> Is there any reason why
>>>>> we should?
>>>>>
>>>>>
>>>>> But then we sacrifice the interoperability and
>>>>> understanding that we
>>>>> are striving for. If you look at the n-quads for
>>>>> the two examples
>>>>> (included at the end of this email) then you will
>>>>> see a different set
>>>>> of triples. Aliases are only defined within the
>>>>> document. When you
>>>>> interpret them they give you different meanings.
>>>>> If we go down this
>>>>> route, we would need to make our tooling with
>>>>> knowledge of either all
>>>>> possible terms that will be used or mapping aware.
>>>>>
>>>>> Alasdair
>>>>>
>>>>> http://tinyurl.com/y9mu423y
>>>>>
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>>
>>>>> <http://schema.org/name> "ABL1" .
>>>>>
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://purl.obolibrary.org/obo/SO_0000704
>>>>> <http://purl.obolibrary.org/obo/SO_0000704>> .
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://schema.org/BioChemEntity
>>>>> <http://schema.org/BioChemEntity>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/alternateName
>>>>> <http://schema.org/alternateName>> "ABL" .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/alternateName
>>>>> <http://schema.org/alternateName>> "JTK7" .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/description> "Non-receptor
>>>>> tyrosine-protein kinase
>>>>> that plays a role..." .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/name>
>>>>> "ABL1" .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://semanticscience.org/resource/SIO_000001
>>>>> <http://semanticscience.org/resource/SIO_000001>>
>>>>> <http://pfam.xfam.org/clan/CL0001
>>>>> <http://pfam.xfam.org/clan/CL0001>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://semanticscience.org/resource/SIO_010081
>>>>> <http://semanticscience.org/resource/SIO_010081>>
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://purl.obolibrary.org/obo/PR_000000001
>>>>> <http://purl.obolibrary.org/obo/PR_000000001>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://schema.org/BioChemEntity
>>>>> <http://schema.org/BioChemEntity>> .
>>>>>
>>>>> http://tinyurl.com/yd5snze2
>>>>>
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>>
>>>>> <http://schema.org/name> "ABL1" .
>>>>>
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://purl.obolibrary.org/obo/OGI_0000004
>>>>> <http://purl.obolibrary.org/obo/OGI_0000004>> .
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://schema.org/BioChemEntity
>>>>> <http://schema.org/BioChemEntity>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://purl.obolibrary.org/obo/RO_0002510
>>>>> <http://purl.obolibrary.org/obo/RO_0002510>>
>>>>> <http://identifiers.org/ncbigene/25
>>>>> <http://identifiers.org/ncbigene/25>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/alternateName
>>>>> <http://schema.org/alternateName>> "ABL" .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/alternateName
>>>>> <http://schema.org/alternateName>> "JTK7" .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/description> "Non-receptor
>>>>> tyrosine-protein kinase
>>>>> that plays a role..." .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://schema.org/name>
>>>>> "ABL1" .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://semanticscience.org/resource/SIO_000001
>>>>> <http://semanticscience.org/resource/SIO_000001>>
>>>>> <http://pfam.xfam.org/clan/CL0001
>>>>> <http://pfam.xfam.org/clan/CL0001>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://purl.obolibrary.org/obo/NCIT_C17021
>>>>> <http://purl.obolibrary.org/obo/NCIT_C17021>> .
>>>>> <http://identifiers.org/uniprot/P00519
>>>>> <http://identifiers.org/uniprot/P00519>>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>>
>>>>> <http://schema.org/BioChemEntity
>>>>> <http://schema.org/BioChemEntity>> .
>>>>>
>>>>> Alasdair J G Gray
>>>>>
>>>>> Fellow of the Higher Education Academy
>>>>> Assistant Professor in Computer Science,
>>>>> School of Mathematical and Computer Sciences
>>>>> (Athena SWAN Bronze Award)
>>>>> Heriot-Watt University, Edinburgh UK.
>>>>>
>>>>> Email: A.J.G.Gray@hw.ac.uk
>>>>> <mailto:A.J.G.Gray@hw.ac.uk>
>>>>> Web: http://www.macs.hw.ac.uk/~ajg33
>>>>> <http://www.macs.hw.ac.uk/%7Eajg33>
>>>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>>>> <http://orcid.org/0000-0002-5711-4872>
>>>>> Office: Earl Mountbatten Building 1.39
>>>>> Twitter: @gray_alasdair
>>>>>
>>>>> Untitled Document .fsize { font-family: Arial,
>>>>> Helvetica Neue,
>>>>> Helvetica, sans-serif; font-size: 10px; }
>>>>>
>>>>> -------------------------
>>>>>
>>>>> _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY
>>>>> TIMES INTERNATIONAL
>>>>> UNIVERSITY OF THE YEAR 2018_
>>>>>
>>>>> Founded in 1821, Heriot-Watt is a leader in ideas
>>>>> and solutions. With
>>>>> campuses and students across the entire globe we
>>>>> span the world,
>>>>> delivering innovation and educational excellence
>>>>> in business,
>>>>> engineering, design and the physical, social and
>>>>> life sciences.
>>>>>
>>>>> This email is generated from the Heriot-Watt
>>>>> University Group, which
>>>>> includes:
>>>>>
>>>>> * Heriot-Watt University, a Scottish charity
>>>>> registered under number
>>>>> SC000278
>>>>> * Edinburgh Business School a Charity
>>>>> Registered in Scotland,
>>>>> SC026900. Edinburgh Business School is a company
>>>>> limited by guarantee,
>>>>> registered in Scotland with registered number
>>>>> SC173556 and registered
>>>>> office at Heriot-Watt University Finance Office,
>>>>> Riccarton, Currie,
>>>>> Midlothian, EH14 4AS
>>>>> * Heriot- Watt Services Limited (Oriam),
>>>>> Scotland's national
>>>>> performance centre for sport. Heriot-Watt Services
>>>>> Limited is a
>>>>> private limited company registered is Scotland
>>>>> with registered number
>>>>> SC271030 and registered office at Research &
>>>>> Enterprise Services
>>>>> Heriot-Watt University, Riccarton, Edinburgh, EH14
>>>>> 4AS.
>>>>>
>>>>> The contents (including any attachments) are
>>>>> confidential. If you are
>>>>> not the intended recipient of this e-mail, any
>>>>> disclosure, copying,
>>>>> distribution or use of its contents is strictly
>>>>> prohibited, and you
>>>>> should please notify the sender immediately and
>>>>> then delete it
>>>>> (including any attachments) from your system.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Received on Tuesday, 14 November 2017 11:56:59 UTC