- From: Justin Clark-Casey <justinccdev@gmail.com>
- Date: Fri, 10 Nov 2017 18:10:31 +0000
- To: Melanie Courtot <mcourtot@ebi.ac.uk>
- Cc: public-bioschemas@w3.org
- Message-ID: <CAME9NR9xOV=LO26wZmB5E5pP36XxpVGd1fMFnWBnWGHM-oRXhw@mail.gmail.com>
'Data integration' is probably too strong a phrase for what I have in mind. I'm really thinking about discovery and how a search engine (for example) may know/integrate that 2 different data sources are talking about the same thing, so that the user gets the a useful/linked set of search results. If a user wanted to find proteins transcribed by gene 'ABL1' (following the examples), then I think it would be a lot simpler if all the JSON-LD uses the term "http://semanticscience.org/resource/is-transcribed-from". Otherwise a search engine and maybe other applications would need to be aware of all the mappings to other terms (I know OLS can/will provide this but this will increase application complexity). I should be clear that this is thought programming on my part, I haven't actually tried to implement anything yet :) It could well be that there's a lot of value in sources using whatever terms are optimal for them, and that costs of trying to co-ordinate IRIs are too high. But I do want to debate the possible tradeoffs. On Fri, Nov 10, 2017 at 5:39 PM, Melanie Courtot <mcourtot@ebi.ac.uk> wrote: > Is data integration really a use case for Bioschemas? The stated goal of > Bioschemas is to extend schema.org to provide markup for pages, and IIRC > the use cases discussed at the last meeting were about discovery and > retrieval. > > Cheers, > Melanie > > > > On 10/11/2017 16:30, Justin Clark-Casey wrote: > >> >> >> On 10/11/17 14:21, ljgarcia wrote: >> >>> Hi, >>> >>> I thought we did not want to impose any IRI. Is there any reason why >>>>> we should? >>>>> >>>> >>>> But then we sacrifice the interoperability and understanding that we >>>> are striving for. If you look at the n-quads for the two examples >>>> (included at the end of this email) then you will see a different set >>>> of triples. >>>> >>> >>> If there are mappings between the terms, that interoperability we want >>> to achieve could still be achieved, could not it? With mappings, we still >>> can transform any n-quads to the, let's say, canonical Bioschemas defined >>> form. Would this not be a way? If a mapping cannot be found, then >>> validation fails. Bioschemas should then use mapping tools and clearly >>> state what the use mappings tool is. >>> >> >> If consuming applications have to use term mappings then this will make >> them much harder to write, and in some cases might make it impossible to >> integrate some information. This might only be a problem for code that is >> trying to integrate data across websites, but this is an important use case. >> >> At least for mandatory properties and types, and major profiles (gene, >> protein, etc.), I would like to see pre-agreed IRIs, rather than free >> choice or emerging consensus. In some ways, I don't think this is so >> different from what we are doing with DataCatalog, Sample, >> TrainingMaterial, etc. >> >> >>> Regards, >>> >>> On 2017-11-10 14:07, Gray, Alasdair J G wrote: >>> >>>> On 10 Nov 2017, at 13:28, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote: >>>>> I was under the same impression than Melanie. We agree on aliases >>>>> but providers can decide what is their preferred IRI for any of >>>>> them. A Bioschemas Protein context would just provide a default >>>>> context that can also be used as a template where IRIs (but not >>>>> aliases) can be modified. And of course, anyone could add more >>>>> aliases, Bioschemas will just not parse those outside the >>>>> default/template provided context. >>>>> >>>>> I thought we did not want to impose any IRI. Is there any reason why >>>>> we should? >>>>> >>>> >>>> But then we sacrifice the interoperability and understanding that we >>>> are striving for. If you look at the n-quads for the two examples >>>> (included at the end of this email) then you will see a different set >>>> of triples. Aliases are only defined within the document. When you >>>> interpret them they give you different meanings. If we go down this >>>> route, we would need to make our tooling with knowledge of either all >>>> possible terms that will be used or mapping aware. >>>> >>>> Alasdair >>>> >>>> http://tinyurl.com/y9mu423y >>>> >>>> <http://identifiers.org/ncbigene/25> <http://schema.org/name> "ABL1" . >>>> >>>> <http://identifiers.org/ncbigene/25> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://purl.obolibrary.org/obo/SO_0000704> . >>>> <http://identifiers.org/ncbigene/25> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://schema.org/BioChemEntity> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://schema.org/alternateName> "ABL" . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://schema.org/alternateName> "JTK7" . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://schema.org/description> "Non-receptor tyrosine-protein kinase >>>> that plays a role..." . >>>> <http://identifiers.org/uniprot/P00519> <http://schema.org/name> >>>> "ABL1" . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://semanticscience.org/resource/SIO_000001> >>>> <http://pfam.xfam.org/clan/CL0001> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://semanticscience.org/resource/SIO_010081> >>>> <http://identifiers.org/ncbigene/25> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://purl.obolibrary.org/obo/PR_000000001> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://schema.org/BioChemEntity> . >>>> >>>> http://tinyurl.com/yd5snze2 >>>> >>>> <http://identifiers.org/ncbigene/25> <http://schema.org/name> "ABL1" . >>>> >>>> <http://identifiers.org/ncbigene/25> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://purl.obolibrary.org/obo/OGI_0000004> . >>>> <http://identifiers.org/ncbigene/25> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://schema.org/BioChemEntity> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://purl.obolibrary.org/obo/RO_0002510> >>>> <http://identifiers.org/ncbigene/25> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://schema.org/alternateName> "ABL" . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://schema.org/alternateName> "JTK7" . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://schema.org/description> "Non-receptor tyrosine-protein kinase >>>> that plays a role..." . >>>> <http://identifiers.org/uniprot/P00519> <http://schema.org/name> >>>> "ABL1" . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://semanticscience.org/resource/SIO_000001> >>>> <http://pfam.xfam.org/clan/CL0001> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://purl.obolibrary.org/obo/NCIT_C17021> . >>>> <http://identifiers.org/uniprot/P00519> >>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>>> <http://schema.org/BioChemEntity> . >>>> >>>> Alasdair J G Gray >>>> >>>> Fellow of the Higher Education Academy >>>> Assistant Professor in Computer Science, >>>> School of Mathematical and Computer Sciences >>>> (Athena SWAN Bronze Award) >>>> Heriot-Watt University, Edinburgh UK. >>>> >>>> Email: A.J.G.Gray@hw.ac.uk >>>> Web: http://www.macs.hw.ac.uk/~ajg33 >>>> ORCID: http://orcid.org/0000-0002-5711-4872 >>>> Office: Earl Mountbatten Building 1.39 >>>> Twitter: @gray_alasdair >>>> >>>> Untitled Document .fsize { font-family: Arial, Helvetica Neue, >>>> Helvetica, sans-serif; font-size: 10px; } >>>> >>>> ------------------------- >>>> >>>> _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY TIMES INTERNATIONAL >>>> UNIVERSITY OF THE YEAR 2018_ >>>> >>>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With >>>> campuses and students across the entire globe we span the world, >>>> delivering innovation and educational excellence in business, >>>> engineering, design and the physical, social and life sciences. >>>> >>>> This email is generated from the Heriot-Watt University Group, which >>>> includes: >>>> >>>> * Heriot-Watt University, a Scottish charity registered under >>>> number >>>> SC000278 >>>> * Edinburgh Business School a Charity Registered in Scotland, >>>> SC026900. Edinburgh Business School is a company limited by guarantee, >>>> registered in Scotland with registered number SC173556 and registered >>>> office at Heriot-Watt University Finance Office, Riccarton, Currie, >>>> Midlothian, EH14 4AS >>>> * Heriot- Watt Services Limited (Oriam), Scotland's national >>>> performance centre for sport. Heriot-Watt Services Limited is a >>>> private limited company registered is Scotland with registered number >>>> SC271030 and registered office at Research & Enterprise Services >>>> Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS. >>>> >>>> The contents (including any attachments) are confidential. If you are >>>> not the intended recipient of this e-mail, any disclosure, copying, >>>> distribution or use of its contents is strictly prohibited, and you >>>> should please notify the sender immediately and then delete it >>>> (including any attachments) from your system. >>>> >>> >>> >> > >
Received on Friday, 10 November 2017 18:10:59 UTC