Re: Bioschemas profile specification: how to name a profile

On Fri, 29 Jun 2018 at 11:30, Franck Michel <franck.michel@cnrs.fr> wrote:

>
>
> *Second sub-thread: How to name a profile? * Three different options are
> being discussed.
> (a) the context defines the profile name to be the chosen type URI, e.g.
> "Protein": { "@id": "http://purl.obolibrary.org/obo/PR_000000001"
> <http://purl.obolibrary.org/obo/PR_000000001> }
> (b) the context defines a type within namespace http://bioschemas.org
> like http://bioschemas.org/Protein. This is a hollow shell that just
> denotes we're talking about a Bioschemas profile.
> (c) We use the new schema.org concepts of defined term and defined term
> set, such as in the example provided by Mélanie:
>            "@type": "DefinedTerm",
>             "@id": "http://purl.obolibrary.org/obo/PR_000000001",
>             "name": "Protein",
>             "inDefinedTermSet": "http://bioschemas.org/terms",
>             "description": "An amino acid chain that is produced de novo
> by ribosome-mediated translation of a genetically-encoded mRNA.",
>             "sameAs": "http://purl.obolibrary.org/obo/NCIT_C17021",
>             "sameAs": "http://semanticscience.org/resource/SIO_010043"
>
> Here are a few thoughts with respect to these options:
>
> My concern with (a) is that a JSON-LD context is just a handy way to write
> data: the string "Protein" is a sheer shorthand, it could be named anything
> else. A webpage may use it this way:
>     "@type": "Protein"
> But it would be perfectly equivalent to *not* use the context and write
> this instead:
>     "@type": "http://purl.obolibrary.org/obo/PR_000000001"
> My point is that a tool extracting Bioschemas markup should *not* rely on
> the use of any specific shorthand.
> Besides, doing so would force using Bioschemas with JSON-LD only, but what
> about webpages using other markup formats? Unless I'm missing something
> here?
>

This shouldn't be a concern - a JSON-LD parser would recognize these
definitions as being equivalent.  The @context is just there to save people
having to write out full form URLs or definitions each time.

As for JSON-LD, this is the single language supported by Bioschemas.
However, I believe some older event markup is written up in rdfa.  It
shouldn't really matter as a parser can translate rdfa into the equivalent
JSON-LD.


> Hence, I'm more inclined to go for (b) that defines a hollow shell for
> each profile such as http://bioschemas.org/Protein. The advantage is that
> it will always look the same whether a webpage uses the Bioschemas context
> or not. And this works the same across markup formats, JSON-LD, RDFa etc.
>
> (c) seems a interesting alternative. Instead of defining a JSON-LD
> context, we would define a Bioschemas vocabulary by means of DefinedTerms.
> For now, I don't quite understand how we would refer to the "Protein"
> defined term in a webpage markup. Any clues?
> Advantage: this solution avoids defining a Bioschemas profile as a type
> (option (b)), which makes the distinction between a type and a profile
> quite unclear.
> Still, I agree with Justin that there is a need for specific code to cope
> with such DefinedTerms. However, is this really an issue since, in any
> case, a Bioschemas extractor tool will have to know the profiles
> specifications to figure out what it looks for. Also, this is not much
> different from the additionalProperty case: there has to be some specific
> code to cope with it too. Right?
>
>
Yes, I think a Bioschemas tools, such as validators, will need to recognize
certain fields and analyze for cardinality, mandatory, etc.  How far this
needs to go may depend on the application.  A search engine might largely
not validate additionalType and just try and work with whatever's there.  I
don't think any of the profiles specify particular additionalProperties (?)
so it might still be a free for all, with the more difficult findability
story that this implies.

I advocate (b) because it seems simpler than the alternatives, and I
believe the barrier to doing Bioschemas markup has to be as low as possible.

Franck.
>
>
> Le 28/06/2018 à 19:40, Justin Clark-Casey a écrit :
>
> On Thu, 28 Jun 2018 at 16:42, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>
>> Hi,
>>
>> What Melanie suggests is useful to describe profiles, they would become
>> a DefinedTerm. That would help as well to avoid type/profile confusion.
>> We would talk then about DefinedTerms. If we find a way to also
>> described the properties accepted with their restrictions, that would be
>> even better. That might be a good subject for a different discussion.
>>
>
> This means there will have to be special Bioschemas code that knows to
> look in a DefinedTerm somewhere for this information.  I still think using
> a subtype to signify a profile will be simpler.
>
> I also disagree with Alasdair in that I think there should be a
> http://bioschema.org/Protein type.  This would be an empty type that just
> signifies we're talking about a Bioschemas defined protein. so it isn't
> treading on anybodies toes.  This would have information saying it's
> defined by http://purl.obolibrary.org/obo/PR_000000001 and it's same as
> terms.  Without this, there's not much point having a bioschemas context,
> and requiring people to use this specific string every time is cumbersome,
> especially if every group chooses something from a different ontology.
> This makes writing and consuming markup harder.
>
>
>> The question remains. How do we choose a term over others to associate
>> it to a profile/DefinedTerm?
>>
>
> I suggest having members of each specification group propose which term
> they want and then come to consensus via discussion and/or vote.
>
>
>> Regards,
>>
>>
>> On 2018-06-28 15:45, Melanie Courtot wrote:
>> > Hi,
>> >
>> > We could consider using the defined terms,
>> >
>> https://dataliberate.com/2018/06/18/schema-org-introduces-defined-terms/,
>> > to do that.
>> >
>> > So have a protein be defined as
>> >
>> >            "@type": "DefinedTerm",
>> >             "@id": "http://purl.obolibrary.org/obo/PR_000000001",
>> >             "name": "Protein",
>> >             "inDefinedTermSet": "http://bioschemas.org/terms",
>> >             "description": "An amino acid chain that is produced de
>> > novo by ribosome-mediated translation of a genetically-encoded mRNA.",
>> >             "sameAs": "http://purl.obolibrary.org/obo/NCIT_C17021",
>> >             "sameAs": "http://semanticscience.org/resource/SIO_010043"
>> >
>> > (Using random examples of sameAs from
>> > https://www.ebi.ac.uk/ols/search?q=protein)
>> >
>> > Cheers,
>> > Melanie
>> >
>> > ---
>> > Melanie Courtot, PhD
>> > EMBL-EBI
>> > GA4GH/BioSamples project lead
>> >
>> >> On 28 Jun 2018, at 15:18, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>> >> Hi,
>> >>
>> >> I understood Franck's question in a different way.
>> >>
>> >> Alasdair says
>> >>
>> >>> I also agree that a context file should be provided which has the
>> >>> chosen types and terms in it, i.e. the context file would define
>> >>> Protein to be the URI http://purl.obolibrary.org/obo/PR_000000001.
>> >>
>> >> I think what Franck is asking is how to choose
>> >> http://purl.obolibrary.org/obo/PR_000000001 over other possible
>> >> terms to define a Protein. For the taxon case, same as it happens
>> >> with proteins, there are multiple possibilities. Franck, is this
>> >> your question? If it is, I do not think there is any agreement on
>> >> how to choose, other than going for well-known ontologies broadly
>> >> accepted by the community of interest, even better if the term is
>> >> mapped to other possible ones.
>> >>
>> >> Regards,
>> >>
>> >> On 2018-06-28 11:50, Gray, Alasdair J G wrote:
>> >> On 27 Jun 2018, at 19:19, Justin Clark-Casey <justinccdev@gmail.com>
>> >> wrote:
>> >> I think we should have mandatory known @types and properties.  In
>> >> my view, Bioschemas should be as easy as possible to write and
>> >> consume.  Multiple options will increase cognitive load on writers
>> >> (which one do I choose?  Why are these 2 examples using these
>> >> different terms?) and open the door to greater inconsistency.
>> >> Non-mandatory types will also raise the barriers for writing
>> >> Bioschemas software that will have to be aware of equivalent
>> >> mappings.
>> >> I completely agree that we should have a single approved type for
>> >> each profile, and likewise for each property a single chosen term.
>> >> This is the whole point of having the profiles.
>> >> I would go one step further and say that Bioschemas should provide
>> >> an http://bioschemas.org [1] [1]context that will define types such
>> >> as
>> >> Taxon, rather than blessing particular ontology terms.
>> >> I also agree that a context file should be provided which has the
>> >> chosen types and terms in it, i.e. the context file would define
>> >> Protein to be the URI http://purl.obolibrary.org/obo/PR_000000001.
>> >> To
>> >> be completely explicit, we would not be defining a type in the
>> >> bioschemas namespace, e.g. http://bioschemas.org/Protein.
>> >> This context can also document equivalent terms in different
>> >> ontologies.
>> >> I like the idea that this also contains mappings to the equivalent
>> >> terms in other ontologies.
>> >> Alasdair
>> >> Alasdair J G Gray
>> >> Fellow of the Higher Education Academy
>> >> Assistant Professor in Computer Science,
>> >> School of Mathematical and Computer Sciences
>> >> (Athena SWAN Bronze Award)
>> >> Heriot-Watt University, Edinburgh UK.
>> >> Email: A.J.G.Gray@hw.ac.uk
>> >> Web: http://www.macs.hw.ac.uk/~ajg33
>> >> ORCID: http://orcid.org/0000-0002-5711-4872
>> >> Office: Earl Mountbatten Building 1.39
>> >> Twitter: @gray_alasdair
>> >> Untitled Document
>> >> -------------------------
>> >> _HERIOT-WATT UNIVERSITY IS THE TIMES & THE SUNDAY TIMES
>> >> INTERNATIONAL
>> >> UNIVERSITY OF THE YEAR 2018_
>> >> Founded in 1821, Heriot-Watt is a leader in ideas and solutions.
>> >> With
>> >> campuses and students across the entire globe we span the world,
>> >> delivering innovation and educational excellence in business,
>> >> engineering, design and the physical, social and life sciences.
>> >> This email is generated from the Heriot-Watt University Group, which
>> >> includes:
>> >> * Heriot-Watt University, a Scottish charity registered under
>> >> number
>> >> SC000278
>> >> * Edinburgh Business School a Charity Registered in Scotland,
>> >> SC026900. Edinburgh Business School is a company limited by
>> >> guarantee,
>> >> registered in Scotland with registered number SC173556 and
>> >> registered
>> >> office at Heriot-Watt University Finance Office, Riccarton, Currie,
>> >> Midlothian, EH14 4AS
>> >> * Heriot- Watt Services Limited (Oriam), Scotland's national
>> >> performance centre for sport. Heriot-Watt Services Limited is a
>> >> private limited company registered is Scotland with registered
>> >> number
>> >> SC271030 and registered office at Research & Enterprise Services
>> >> Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.
>> >> The contents (including any attachments) are confidential. If you
>> >> are
>> >> not the intended recipient of this e-mail, any disclosure, copying,
>> >> distribution or use of its contents is strictly prohibited, and you
>> >> should please notify the sender immediately and then delete it
>> >> (including any attachments) from your system.
>> >> Links:
>> >> ------
>> >> [1] http://bioschemas.org/
>> >
>> >
>> >
>> > Links:
>> > ------
>> > [1] http://bioschemas.org/
>>
>
>

Received on Wednesday, 4 July 2018 18:30:25 UTC