- From: Anders Riutta <anders.riutta@gladstone.ucsf.edu>
- Date: Thu, 9 Nov 2017 14:11:58 -0800
- To: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <CAJEHyTm5cr6Uwi6yF-NU-hPk1CcRfjXP1BOAWn+sRKzr-PrdrQ@mail.gmail.com>
Hi, I share Carol's hesitation to mint new IRIs if they'll be exactMatches for existing IRIs (this xkcd comic <https://xkcd.com/927/> is cited so often as to be a cliche, but there is some truth to it). I also like Leyla's idea of focusing our efforts on creating one or more shared JSON-LD contexts that reflect the consensus of the Bioschemas community. The terms in this context or contexts can have a consistent casing convention of our choosing, and the IRIs can be exclusively third-party, pre-existing IRIs. > We have to select terms from existing ontologies, i.e. we will be selecting one ontology over another Alasdair's concern above is justified, because doing this can be sensitive, and it can be tricky to accommodate the subtle variations in meaning that different sub-communities attach to certain terms, especially when there are multiple IRIs with a 98% overlap in meaning but a 2% difference that is just enough to prevent them from being exactMatches. However, judiciously endorsing selected IRIs as well-thought out and reflective of existing practice could actually be a significant source of value that the Bioschemas community could provide, because it would be an efficient and transparent process for recognizing and forming consensus. Note that we don't have to choose one ontology in toto over another; we can pick and choose terms from multiple ontologies, as appropriate. In regard to Leyla's options 1 and 2, it seems there are two concerns: how to markup an existing API vs. a newly created one. For a new API, the creators can easily use both the *terms* and *IRIs* from Bioschemas by formatting their JSON like this <https://github.com/ariutta/specifications/blob/ariutta-demo/Protein/examples/ProteinEntityNew.json>, where "http://bioschemas.org/context.jsonld" would point to something like this file <https://github.com/ariutta/specifications/blob/ariutta-demo/context.jsonld>. For a pre-existing API, the terms usually cannot be changed, but the creators can still integrate with Bioschemas by adding a new JSON-LD context that maps their terms to the *IRIs* endorsed by Bioschemas. That could look roughly like this <https://github.com/ariutta/specifications/blob/ariutta-demo/Protein/examples/ProteinEntityPreexisting.json> . Creating one or more Bioschemas JSON-LD contexts, each made up of our preferred terms mapped to pre-existing IRIs, will support a gradual convergence of both terms and IRIs in our community. New APIs can match both our terms and endorsed IRIs, while pre-existing APIs can keep their terms but use our endorsed IRIs. The output from these pre-existing APIs can be transformed to match Bioschemas terms by expanding with the JSON-LD context specified by the API creators and then compacting with the Bioschemas context. The Bioschemas JSON-LD context could be a single file, or it could be a combined context, where "http://bioschemas.org/context.jsonld" might point to a collection of contexts like this: [ "http://schema.org/", "http://bioschemas.org/Protein/context.jsonld", "http://bioschemas.org/LabProtocol/context.jsonld", ... ] > Regards, Anders On Thu, Nov 9, 2017 at 8:00 AM, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote: > Hi, > > On 09/11/2017 14:25, Gray, Alasdair J G wrote: > > Hi > > Unless I’m mistaken, the decision is now about presentation. Options 1 and > 2 are both equivalent when you expand them out. > > Mmm, maybe it is not just about presentation. In order to make easier > things for tools and validators, we would need to agree on a set of > predefined the aliases. Let's suppose Bioschemas recommends the aliases > "Protein" and "transcribedFrom" but a mark up uses "EnzymeProtein" and > "comesFromGene". The Bioschemas validation and tools would not know what to > do with those "unknown" aliases. > > If we do not want to impose any predefined aliases, then yes, the two > options are the same. And then Bioschemas tools and validators will need to > come up with a strategy to figure it out what corresponds to one profile or > the other and when two different aliases refer to the same concept. > > Regards, > > > > I personally like 2 as it makes the json-ld very readable and also > explicitly declares where each property is from. > https://github.com/BioSchemas/specifications/blob/master/Protein/examples/ > ProteinEntity-with-context.json > > Alasdair > > On 9 Nov 2017, at 13:44, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote: > > Hi, > > In that case, our options are reduced to: > > 1. eliminate the long property names by introducing shorthands in the > context, something like the latest commit to my example > https://github.com/BioSchemas/specifications/blob/master/ > PhysicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld > <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld> > > 2. using a context with predefined aliases linking to the preferred > ontology by the data provider (see https://github.com/BioSchemas/ > specifications/blob/master/Protein/examples/ProteinEntity-with-context. > json). > > Any preferences? > > Regards, > > On 09/11/2017 13:31, Carole Goble wrote: > > > > Bioschemas has already been publicly accused of running parallel ontology > efforts and we were very clear that we were not going to reinvent > ontologies. > So I am very reticent about doing so. > > Rafa and I are already involved in a Researchschemas initiative with EOSC > and have lines of enquiry for joining up with initiatives in biodiversity > and geosciences. It would be good if we didn’t end up with many many > parallel activities. But instead a converged one > > Carole > > > > > Sent from my iPhone by > Professor Carole Goble > The University of Manchester > UK > > On 9 Nov 2017, at 11:13, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote: > > Hi all, > > On 09/11/2017 10:47, Gray, Alasdair J G wrote: > > Hi All, > > Leyla, thanks for providing a concrete example from which we can base our > discussions. > > Points in favour of Leyla’s proposal: > - Properties and types defined in Bioschemas namespace > - json-ld validates using the structured data markup tool > > It will depending on whether schema.org is before (validates but all > schema terms are moved to the bioschemas namespace) or after (does not > validate but the namespace are correctly conserved) > > - We don’t need to choose one ontology over another > > Points against Leyla’s proposal > - We are minting our own ontology terms > > > We can avoid that by using a context with just predefined aliases (see > https://github.com/BioSchemas/specifications/blob/master/Protein/examples/ > ProteinEntity-with-context.json). But then, Google does not know anything > about all those possible types that could be associated to the aliases. > > Minting our own terms (I would not say ontology) makes things easier as > Google would need to know only schema.org and Bioschemas. BUT, then maybe > Google does not want to open that door as Bioschemas would become a somehow > parallel vocabulary and other projects/groups might want to do something > similar... OR maybe Google will prefer all to be moved as proper types to > schema.org. > > Also, keep in mind that schema.org mints terms already covered by > ontologies. Citations for instance are covered by the Bibliographic > Ontology (BIBO) and the Semantic Publishing and Referencing (SPAR) > ontologies. > > Regards, > > > We can of course eliminate the long property names by introducing > shorthands in the context, something like the latest commit to my example > https://github.com/BioSchemas/specifications/blob/master/ > PhysicalEntity/examples/BioChemEntityAlt-min%2Brec.jsonld > <https://github.com/BioSchemas/specifications/blob/master/PhysicalEntity/examples/BioChemEntityAlt-min+rec.jsonld> > This could be expanded to something similar to the full context that Leyla > used, but instead of creating new Bioschema terms, we would reused terms > from existing ontologies > > Points in favour of Alasdair's proposal: > - We are not minting our own ontology terms > - With full context like Leyla’s the example would validate > > Points against Alasdair's proposal > - We have to select terms from existing ontologies, i.e. we will be > selecting one ontology over another > > Ultimately, with all these proposals someone adopting will need to edit > the same number of characters, and we should highlight somehow what these > are. > > I think we are in broad agreement that we can move away from using the > additionalProperties. > > What we still need to determine is are we going to mint terms in the > Bioschemas namespace (that could eventually be pushed to schema.org) or > select terms from existing ontologies. Opinions on this last point please. > > Alasdair > > Alasdair J G Gray > Fellow of the Higher Education Academy > Assistant Professor in Computer Science, > School of Mathematical and Computer Sciences > (Athena SWAN Bronze Award) > Heriot-Watt University, Edinburgh UK. > > Email: A.J.G.Gray@hw.ac.uk > Web: http://www.macs.hw.ac.uk/~ajg33 > ORCID: http://orcid.org/0000-0002-5711-4872 > Office: Earl Mountbatten Building 1.39 > Twitter: @gray_alasdair > > > > > > > > > > > ------------------------------ > > *Heriot-Watt University is The Times & The Sunday Times International > University of the Year 2018* > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With > campuses and students across the entire globe we span the world, delivering > innovation and educational excellence in business, engineering, design and > the physical, social and life sciences. > > This email is generated from the Heriot-Watt University Group, which > includes: > > 1. Heriot-Watt University, a Scottish charity registered under number > SC000278 > 2. Edinburgh Business School a Charity Registered in Scotland, > SC026900. Edinburgh Business School is a company limited by guarantee, > registered in Scotland with registered number SC173556 and registered > office at Heriot-Watt University Finance Office, Riccarton, Currie, > Midlothian, EH14 4AS > 3. Heriot- Watt Services Limited (Oriam), Scotland's national > performance centre for sport. Heriot-Watt Services Limited is a private > limited company registered is Scotland with registered number SC271030 and > registered office at Research & Enterprise Services Heriot-Watt University, > Riccarton, Edinburgh, EH14 4AS. > > The contents (including any attachments) are confidential. If you are not > the intended recipient of this e-mail, any disclosure, copying, > distribution or use of its contents is strictly prohibited, and you should > please notify the sender immediately and then delete it (including any > attachments) from your system. > > > > > Alasdair J G Gray > Fellow of the Higher Education Academy > Assistant Professor in Computer Science, > School of Mathematical and Computer Sciences > (Athena SWAN Bronze Award) > Heriot-Watt University, Edinburgh UK. > > Email: A.J.G.Gray@hw.ac.uk > Web: http://www.macs.hw.ac.uk/~ajg33 > ORCID: http://orcid.org/0000-0002-5711-4872 > Office: Earl Mountbatten Building 1.39 > Twitter: @gray_alasdair > > > > > > > > > > > >
Received on Thursday, 9 November 2017 22:12:28 UTC