W3C home > Mailing lists > Public > public-bioschemas@w3.org > July 2018

Re: Bioschemas profile specification: how to name a profile

From: ljgarcia <ljgarcia@ebi.ac.uk>
Date: Wed, 04 Jul 2018 21:24:46 +0100
To: Justin Clark-Casey <justinccdev@gmail.com>
Cc: Franck Michel <fmichel@i3s.unice.fr>, Melanie Courtot <mcourtot@ebi.ac.uk>, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, public-bioschemas@w3.org
Message-ID: <33664a45496e4979e694312154788b2a@ebi.ac.uk>
Hi all,

As Justin said, there should be no concern regarding the short or long 
version for a URL in option (a).

Option (b) could work as well. My concern there is whether explicitly 
defining those terms in Bioschemas is kind of defining what has been 
properly defined in a well-known ontology already. Point in favor, we do 
not have to choose one ontology term over others as it happens in (a).

I do not see (c) as an alternative for naming profiles but more for 
supporting mappings and so.

Regards,

On 2018-07-04 19:29, Justin Clark-Casey wrote:
> On Fri, 29 Jun 2018 at 11:30, Franck Michel <franck.michel@cnrs.fr>
> wrote:
> 
>> Second sub-thread: How to name a profile?
>> 
>> Three different options are being discussed.
>> (a) the context defines the profile name to be the chosen type URI,
>> e.g. "Protein": { "@id":
>> "http://purl.obolibrary.org/obo/PR_000000001" [1] }
>> (b) the context defines a type within namespace
>> http://bioschemas.org like http://bioschemas.org/Protein. This is a
>> hollow shell that just denotes we're talking about a Bioschemas
>> profile.
>> (c) We use the new schema.org [2] concepts of defined term and
>> defined term set, such as in the example provided by Mélanie:
>> 
>> "@type": "DefinedTerm",
>> "@id": "http://purl.obolibrary.org/obo/PR_000000001",
>> "name": "Protein",
>> "inDefinedTermSet": "http://bioschemas.org/terms",
>> "description": "An amino acid chain that is produced de
>> novo by ribosome-mediated translation of a genetically-encoded
>> mRNA.",
>> "sameAs": "http://purl.obolibrary.org/obo/NCIT_C17021",
>> "sameAs":
>> "http://semanticscience.org/resource/SIO_010043"
>> Here are a few thoughts with respect to these options:
>> 
>> My concern with (a) is that a JSON-LD context is just a handy way to
>> write data: the string "Protein" is a sheer shorthand, it could be
>> named anything else. A webpage may use it this way:
>> "@type": "Protein"
>> But it would be perfectly equivalent to not use the context and
>> write this instead:
>> "@type": "http://purl.obolibrary.org/obo/PR_000000001"
>> My point is that a tool extracting Bioschemas markup should not rely
>> on the use of any specific shorthand.
>> Besides, doing so would force using Bioschemas with JSON-LD only,
>> but what about webpages using other markup formats? Unless I'm
>> missing something here?
> 
> This shouldn't be a concern - a JSON-LD parser would recognize these
> definitions as being equivalent.  The @context is just there to save
> people having to write out full form URLs or definitions each time.
> 
> As for JSON-LD, this is the single language supported by Bioschemas.
> However, I believe some older event markup is written up in rdfa.  It
> shouldn't really matter as a parser can translate rdfa into the
> equivalent JSON-LD.
> 
>> Hence, I'm more inclined to go for (b) that defines a hollow shell
>> for each profile such as http://bioschemas.org/Protein. The
>> advantage is that it will always look the same whether a webpage
>> uses the Bioschemas context or not. And this works the same across
>> markup formats, JSON-LD, RDFa etc.
>> 
>> (c) seems a interesting alternative. Instead of defining a JSON-LD
>> context, we would define a Bioschemas vocabulary by means of
>> DefinedTerms. For now, I don't quite understand how we would refer
>> to the "Protein" defined term in a webpage markup. Any clues?
>> Advantage: this solution avoids defining a Bioschemas profile as a
>> type (option (b)), which makes the distinction between a type and a
>> profile quite unclear.
>> Still, I agree with Justin that there is a need for specific code to
>> cope with such DefinedTerms. However, is this really an issue since,
>> in any case, a Bioschemas extractor tool will have to know the
>> profiles specifications to figure out what it looks for. Also, this
>> is not much different from the additionalProperty case: there has to
>> be some specific code to cope with it too. Right?
> 
> Yes, I think a Bioschemas tools, such as validators, will need to
> recognize certain fields and analyze for cardinality, mandatory, etc.
> How far this needs to go may depend on the application.  A search
> engine might largely not validate additionalType and just try and work
> with whatever's there.  I don't think any of the profiles specify
> particular additionalProperties (?) so it might still be a free for
> all, with the more difficult findability story that this implies.
> 
> I advocate (b) because it seems simpler than the alternatives, and I
> believe the barrier to doing Bioschemas markup has to be as low as
> possible.
> 
>> Franck.
>> 
>> Le 28/06/2018 à 19:40, Justin Clark-Casey a écrit :
>> 
>> On Thu, 28 Jun 2018 at 16:42, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>> Hi,
>> 
>> What Melanie suggests is useful to describe profiles, they would
>> become
>> a DefinedTerm. That would help as well to avoid type/profile
>> confusion.
>> We would talk then about DefinedTerms. If we find a way to also
>> described the properties accepted with their restrictions, that
>> would be
>> even better. That might be a good subject for a different
>> discussion.
>> 
>> This means there will have to be special Bioschemas code that knows
>> to look in a DefinedTerm somewhere for this information.  I still
>> think using a subtype to signify a profile will be simpler.
>> 
>> I also disagree with Alasdair in that I think there should be a
>> http://bioschema.org/Protein type.  This would be an empty type that
>> just signifies we're talking about a Bioschemas defined protein. so
>> it isn't treading on anybodies toes.  This would have information
>> saying it's defined by http://purl.obolibrary.org/obo/PR_000000001
>> and it's same as terms.  Without this, there's not much point having
>> a bioschemas context, and requiring people to use this specific
>> string every time is cumbersome, especially if every group chooses
>> something from a different ontology.  This makes writing and
>> consuming markup harder.
>> 
>> The question remains. How do we choose a term over others to
>> associate
>> it to a profile/DefinedTerm?
>> 
>> I suggest having members of each specification group propose which
>> term they want and then come to consensus via discussion and/or
>> vote.
>> 
>> Regards,
>> 
>> On 2018-06-28 15:45, Melanie Courtot wrote:
>>> Hi,
>>> 
>>> We could consider using the defined terms,
>>> 
>> 
> https://dataliberate.com/2018/06/18/schema-org-introduces-defined-terms/,
>>> to do that.
>>> 
>>> So have a protein be defined as
>>> 
>>> "@type": "DefinedTerm",
>>> "@id": "http://purl.obolibrary.org/obo/PR_000000001",
>>> "name": "Protein",
>>> "inDefinedTermSet": "http://bioschemas.org/terms",
>>> "description": "An amino acid chain that is produced
>> de
>>> novo by ribosome-mediated translation of a genetically-encoded
>> mRNA.",
>>> "sameAs":
>> "http://purl.obolibrary.org/obo/NCIT_C17021",
>>> "sameAs":
>> "http://semanticscience.org/resource/SIO_010043"
>>> 
>>> (Using random examples of sameAs from
>>> https://www.ebi.ac.uk/ols/search?q=protein)
>>> 
>>> Cheers,
>>> Melanie
>>> 
>>> ---
>>> Melanie Courtot, PhD
>>> EMBL-EBI
>>> GA4GH/BioSamples project lead
>>> 
>>>> On 28 Jun 2018, at 15:18, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>>>> Hi,
>>>> 
>>>> I understood Franck's question in a different way.
>>>> 
>>>> Alasdair says
>>>> 
>>>>> I also agree that a context file should be provided which has
>> the
>>>>> chosen types and terms in it, i.e. the context file would define
>>>>> Protein to be the URI
>> http://purl.obolibrary.org/obo/PR_000000001.
>>>> 
>>>> I think what Franck is asking is how to choose
>>>> http://purl.obolibrary.org/obo/PR_000000001 over other possible
>>>> terms to define a Protein. For the taxon case, same as it happens
>>>> with proteins, there are multiple possibilities. Franck, is this
>>>> your question? If it is, I do not think there is any agreement on
>>>> how to choose, other than going for well-known ontologies broadly
>>>> accepted by the community of interest, even better if the term is
>>>> mapped to other possible ones.
>>>> 
>>>> Regards,
>>>> 
>>>> On 2018-06-28 11:50, Gray, Alasdair J G wrote:
>>>> On 27 Jun 2018, at 19:19, Justin Clark-Casey
>> <justinccdev@gmail.com>
>>>> wrote:
>>>> I think we should have mandatory known @types and properties.  In
>>>> my view, Bioschemas should be as easy as possible to write and
>>>> consume.  Multiple options will increase cognitive load on
>> writers
>>>> (which one do I choose?  Why are these 2 examples using these
>>>> different terms?) and open the door to greater inconsistency.
>>>> Non-mandatory types will also raise the barriers for writing
>>>> Bioschemas software that will have to be aware of equivalent
>>>> mappings.
>>>> I completely agree that we should have a single approved type for
>>>> each profile, and likewise for each property a single chosen
>> term.
>>>> This is the whole point of having the profiles.
>>>> I would go one step further and say that Bioschemas should
>> provide
>>>> an http://bioschemas.org [1] [1]context that will define types
>> such
>>>> as
>>>> Taxon, rather than blessing particular ontology terms.
>>>> I also agree that a context file should be provided which has the
>>>> chosen types and terms in it, i.e. the context file would define
>>>> Protein to be the URI
>> http://purl.obolibrary.org/obo/PR_000000001.
>>>> To
>>>> be completely explicit, we would not be defining a type in the
>>>> bioschemas namespace, e.g. http://bioschemas.org/Protein.
>>>> This context can also document equivalent terms in different
>>>> ontologies.
>>>> I like the idea that this also contains mappings to the
>> equivalent
>>>> terms in other ontologies.
>>>> Alasdair
>>>> Alasdair J G Gray
>>>> Fellow of the Higher Education Academy
>>>> Assistant Professor in Computer Science,
>>>> School of Mathematical and Computer Sciences
>>>> (Athena SWAN Bronze Award)
>>>> Heriot-Watt University, Edinburgh UK.
>>>> Email: A.J.G.Gray@hw.ac.uk
>>>> Web: http://www.macs.hw.ac.uk/~ajg33 [3]
>>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>>> Office: Earl Mountbatten Building 1.39
>>>> Twitter: @gray_alasdair
>>>> Untitled Document
>>>> -------------------------
>>>> _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY TIMES
>>>> INTERNATIONAL
>>>> UNIVERSITY OF THE YEAR 2018_
>>>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions.
>>>> With
>>>> campuses and students across the entire globe we span the world,
>>>> delivering innovation and educational excellence in business,
>>>> engineering, design and the physical, social and life sciences.
>>>> This email is generated from the Heriot-Watt University Group,
>> which
>>>> includes:
>>>> * Heriot-Watt University, a Scottish charity registered under
>>>> number
>>>> SC000278
>>>> * Edinburgh Business School a Charity Registered in Scotland,
>>>> SC026900. Edinburgh Business School is a company limited by
>>>> guarantee,
>>>> registered in Scotland with registered number SC173556 and
>>>> registered
>>>> office at Heriot-Watt University Finance Office, Riccarton,
>> Currie,
>>>> Midlothian, EH14 4AS
>>>> * Heriot- Watt Services Limited (Oriam), Scotland's national
>>>> performance centre for sport. Heriot-Watt Services Limited is a
>>>> private limited company registered is Scotland with registered
>>>> number
>>>> SC271030 and registered office at Research & Enterprise Services
>>>> Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>>>> The contents (including any attachments) are confidential. If you
>>>> are
>>>> not the intended recipient of this e-mail, any disclosure,
>> copying,
>>>> distribution or use of its contents is strictly prohibited, and
>> you
>>>> should please notify the sender immediately and then delete it
>>>> (including any attachments) from your system.
>>>> Links:
>>>> ------
>>>> [1] http://bioschemas.org/
>>> 
>>> 
>>> 
>>> Links:
>>> ------
>>> [1] http://bioschemas.org/
> 
> 
> 
> Links:
> ------
> [1] http://purl.obolibrary.org/obo/PR_000000001
> [2] http://schema.org
> [3] http://www.macs.hw.ac.uk/%7Eajg33
Received on Wednesday, 4 July 2018 20:25:19 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 4 July 2018 20:25:20 UTC