W3C home > Mailing lists > Public > public-bioschemas@w3.org > March 2018

Re: BioChemEntity.additionalType (was Re: [Proposal] CategoryCode...)

From: ljgarcia <ljgarcia@ebi.ac.uk>
Date: Thu, 22 Mar 2018 18:35:35 +0000
To: Justin Clark-Casey <jc955@cam.ac.uk>
Cc: public-bioschemas@w3.org
Message-ID: <e8af2bf50890bc0cb093da7e7faf9035@ebi.ac.uk>
Hi Justin,

Yes, something should be updated, we are working on it this week and the 
next one.

Regards,

On 2018-03-22 17:50, Justin Clark-Casey wrote:
> If it's 'might', should this now be an optional property rather than
> recommended?
> 
> Also, fyi (or Kenneth's i) on the drafts page v0.4 of proteins is
> still linked rather than v0.5 [2].
> 
> [1] http://bioschemas.org/specifications/drafts
> [2]
> https://github.com/BioSchemas/specifications/blob/master/Protein/proteinProfileSpecification.html
> 
> On 22/03/18 16:56, ljgarcia wrote:
>> Hi Justin,
>> 
>> The additionalType might be useful for data providers or consumers if 
>> they want to link to other resources not necessarily using Bioschemas. 
>> It is also a way to link to your own ontology. For instance, I imagine 
>> WikiData preferring to use their types, so if they are specified as 
>> additionalType, then they can reference to their protein or gene or so 
>> type in WikiData.
>> 
>> Cheers,
>> 
>> On 2018-03-22 16:11, Justin Clark-Casey wrote:
>>> Thanks Leyla for the edits, much appreciated.
>>> 
>>> I hadn't realized from the old examples that BioChemEntity was now
>>> being specified through multiple inheritance directly, e.g.
>>> 
>>> {
>>>     "@context": "http://schema.org",
>>>     "@type": ["BioChemEntity",
>>> "http://purl.obolibrary.org/obo/PR_000000001"],
>>>     "additionalType":
>>> "http://semanticscience.org/resource/SIO_010043",
>>>     ...
>>> }
>>> 
>>> with "http://purl.obolibrary.org/obo/PR_000000001" as the mandatory
>>> string for protein.  In this case, though, what is the purpose of 
>>> also
>>> giving additionalType (a recommended property)?  To optionally 
>>> specify
>>> the type further in a less controlled manner?
>>> 
>>> On Wed, Mar 21, 2018 at 6:53 PM, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>>> 
>>>> Hi Justin,
>>>> 
>>>> Thanks for this initiative, nice summary for additionalProperty and
>>>> its alternative via direct reuse of properties coined in other
>>>> controlled vocabularies.
>>>> 
>>>> There were some issues regarding the use of additionalType so I made
>>>> some editions to the first and second sections. Feel free to ping me
>>>> if you have any questions or want to discuss further.
>>>> 
>>>> Regards,
>>>> 
>>>> On 2018-03-21 17:59, Justin Clark-Casey wrote:
>>>> I think you're right, this could be 2 distinct things.
>>>> 
>>>> I recently read "Schema.org: Evolution of Structured Data on the
>>>> Web"
>>>> [1] and it was very illuminating as to the philosophy of schema.org
>>>> [1].
>>>> Namely that:
>>>> 
>>>> * Things should be much easier for the data publishers and harder
>>>> for
>>>> the consumers
>>>> * Developers chiefly implement by adapting examples (we knew this)
>>>> * Getting initial adoption is much more important than getting the
>>>> structures optimal upfront.  Once there is adoption, that's
>>>> justification to improve structure if necessary.
>>>> 
>>>> So I agree with you - specifying sample relations through
>>>> additionalProperty is easiest and specifying more universal
>>>> per-profile relations (e.g. amino acid sequence on protein) could be
>>>> done through direct additional relations to make validation easier.
>>>> 
>>>> To get additional relations (and the general
>>>> BioChemEntity/DataRecord
>>>> mechanisms) more straight in my head, I published a wiki page [2].
>>>> Apologies for any mistakes, please anybody feel free to edit/extend
>>>> and I will do so as necessary.  I ended up repeating quite a bit of
>>>> what Alasdair originally wrote [3] and what is in examples, but I do
>>>> find it useful to have this stuff in findable wiki form (Google docs
>>>> aren't exposed to search engines afaik).
>>>> 
>>>> [1] https://queue.acm.org/detail.cfm?id=2857276 [2]
>>>> [2]
>>>> 
>>> https://github.com/BioSchemas/specifications/wiki/Adding-profile-specific-relations-to-BioChemEntity-and-DataRecord
>>>> [3]
>>>> [3]
>>>> 
>>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/0001.html
>>>> [4]
>>>> 
>>>> On 19/03/18 18:14, ljgarcia wrote:
>>>> Hi all,
>>>> 
>>>> I think we are talking about two different things here.
>>>> 
>>>> For Samples, directly using additionalProperty seems the easiest
>>>> option as this reduce requirements for small labs providing samples.
>>>> They do not have to agree on any predefined terms or properties,
>>>> just to provide key-value pairs via additionalProperty. Most likely,
>>>> they will not including information regarding a CategoryCode, this
>>>> one would be added whenever possible by BioSamples. @Luca, @Matt,
>>>> please correct me if I am wrong. For the Samples case, it is a +1 on
>>>> my side for accepting CategoryCode as a possible range for
>>>> valueReference property on PropertyValue.
>>>> 
>>>> For other groups/profiles, what Justin mentions makes sense and is
>>>> useful. We use that way (or an approximation,I still need to tune a
>>>> bit of things there) in the Protein profile.
>>>> 
>>>> What do you think? Do we have two topics here? If so, let's separate
>>>> them first. In any case, I will take a deeper look to Justin's
>>>> examples later, I got a bit lost when I saw SampleDataRecord and
>>>> also the schema:RangeIncludes.
>>>> 
>>>> Regards,
>>>> 
>>>> On 2018-03-19 17:47, Justin Clark-Casey wrote:
>>>> So, last Friday at the Samples event, Leyla, Rafa and myself were
>>>> talking about the alternative of specifying additional properties
>>>> using a second context, rather than through AdditionalProperty.  The
>>>> original discussion in November was at [1] but I don't think was
>>>> fully
>>>> formalized (and the example links are now broken).  But under this
>>>> approach, I think the above would instead be something like
>>>> 
>>>> {
>>>> "@context": ["http://schema.org",
>>>> "http://bioschemas.org/samples"],
>>>> "@type": ["SampleDataRecord"],
>>>> "diagnosisAvailable": [
>>>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9 [5]",
>>>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9 [6]"
>>>> ]
>>>> }
>>>> 
>>>> with http://bioschemas.org/samples as
>>>> 
>>>> {
>>>> "@context": {
>>>> "rdfs": "http://www.w3.org/2000/01/rdf-schema# [7]",
>>>> },
>>>> "@graph": [
>>>> "@id"; "http://bioschemas.org/samples",
>>>> {
>>>> "@id": "http://bioschemas.org/samples/SampleDataRecord [8]",
>>>> "@type": "rdfs:Class",
>>>> "rdfs:subClassOf": { "@id": "http://schema.org/DataRecord" }
>>>> }
>>>> {
>>>> "@id": "http://bioschemas.org/samples/diagnosisAvailable [9]",
>>>> "@type": "rdfs:Property",
>>>> "rdfs:label": "Diagnosis available",
>>>> "http://schema.org/domainIncludes [10]": [
>>>> {
>>>> "@id": "http://bioschemas.org/samples/SamplesDataRecord
>>>> [11]"
>>>> },
>>>> "http://schema.org/rangeIncludes [12]": [
>>>> {
>>>> "@id", "http://schema.org/URL"
>>>> }
>>>> ]
>>>> }
>>>> ]
>>>> }
>>>> 
>>>> See [2] for schema.org [1] [1]'s own type specification file.
>>>> 
>>>> Pros:
>>>> * Using existing validation tools should be easier, as this
>>>> definition uses standard schema.org [1] [1] mechanisms to define
>>>> additional properties, rather than the AdditionalProperty escape
>>>> hatch.
>>>> * Information such as name and label can go in the bioschemas.org
>>>> [13]
>>>> [7] file rather than be repeated in the data record text
>>>> 
>>>> * Easier to put in different language translations to the
>>>> bioschemas.org [13] [7] file
>>>> 
>>>> Cons:
>>>> 
>>>> * Applications may need to rely the URL itself (purl.org [14] [8]
>>>> above)
>>>> to retrieve information such as human-readable name for the
>>>> categoryCode itself (e.g. "IN SITU NEOPLASMS").  This is good
>>>> semantic
>>>> web practise I believe, but may reduce reliability.  Possibly this
>>>> information could also be served from http://bioschemas.org as a
>>>> similar set of property definitions.
>>>> 
>>>> * Perhaps not quite so easy to add arbitrary additional
>>>> properties,
>>>> though a data provider could always define and serve a third context
>>>> themselves, or embed it inline.
>>>> 
>>>> Thoughts?  Would especially like Leyla (though I know she's on
>>>> holiday), Rafa, Alasdair, Dan, etc. to weigh in.
>>>> 
>>>> [1]
>>>> 
>>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/thread.html
>>>> [15]
>>>> [2] https://schema.org/version/latest/schema.jsonld [16]
>>>> 
>>>> -- Justin Clark-Casey, http://justincc.org
>>>> 
>>>> Research Software Engineer, Intermine, Cambridge
>>>> 
>>>> ELIXIR UK Node technical co-orindator
>>>> 
>>>> On Mon, Mar 19, 2018 at 11:21 AM, Philippe <proccaserra@gmail.com>
>>>> wrote:
>>>> 
>>>> Hi Luca,
>>>> 
>>>> I am including a snippet from the notes so people can have a feel
>>>> for how things could look like:
>>>> 
>>>> {
>>>> 
>>>> "@context": "http://schema.org" [1],
>>>> 
>>>> "@type": ["DataRecord"],
>>>> 
>>>> "additionalProperty": [
>>>> 
>>>> {
>>>> 
>>>> "@type": "PropertyValue",
>>>> 
>>>> "name": "diagnosis_available",
>>>> 
>>>> "value": "urn:miriam:icd:C00-C97",
>>>> 
>>>> "valueReference": [
>>>> 
>>>> {
>>>> 
>>>> "@type": "CategoryCode",
>>>> 
>>>> "name": "Malignant neoplasms",
>>>> 
>>>> "url":
>>>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9 [5]" [2],
>>>> 
>>>> "codeValue": "C00-C97.9"
>>>> 
>>>> }
>>>> 
>>>> ]
>>>> 
>>>> },
>>>> 
>>>> {
>>>> 
>>>> "@type": "PropertyValue",
>>>> 
>>>> "name": "diagnosis_available",
>>>> 
>>>> "value": "urn:miriam:icd:D00-D09",
>>>> 
>>>> "valueReference": [
>>>> 
>>>> {
>>>> 
>>>> "@type": "CategoryCode",
>>>> 
>>>> "name": "In situ neoplasms",
>>>> 
>>>> "url":
>>>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9 [6]" [3],
>>>> 
>>>> "codeValue": "D00-D09.9"
>>>> 
>>>> }
>>>> 
>>>> ]
>>>> 
>>>> },
>>>> I also include the link the schema.org [1] [1] CategoryCode:
>>>> https://pending.schema.org/CategoryCode [17] [4] and their JSON-LD
>>>> snippet
>>>> 
>>>> * {
>>>> *  "@context": "http://schema.org/" [5],
>>>> *  "@type": "CategoryCode",
>>>> *  "codeValue": "Man",
>>>> *  "inCodeSet": "http://id.loc.gov/vocabulary/resourceTypes [18]"
>>>> [6]
>>>> * }
>>>> 
>>>> Question: Should 'inCodeSet' attribute be used instead ?
>>>> 
>>>> Best
>>>> 
>>>> Philippe
>>>> 
>>>> On 19/03/2018 11:10, Luca Cherubin wrote:
>>>> 
>>>> Hi everybody,
>>>> 
>>>> During the Hackathon event last week with various Biobanks
>>>> representatives we had the chance to use Bioschemas profiles and
>>>> types to support BioBanks use cases for metadata sharing.
>>>> 
>>>> As you may know, in the Sample profile we proposed a solution for
>>>> linking ontology terms to a PropertyValue using CategoryCode as
>>>> valid type for the valueReference field. Note that CategoryCode is
>>>> already a proposed schema.org [1] [1] type but in the
>>>> bioschemas/samples specification we propose that it should be an
>>>> acceptable value for valueReference.
>>>> 
>>>> To support BioBank use cases, we are using DataRecord and they
>>>> need to use the same CategoryCode strategy to describe all the
>>>> PropertyValue associated with a DataRecord.
>>>> 
>>>> In our opinion this is a very strong use case for supporting the
>>>> use of CategoryCode as valid type for valueReference for any
>>>> PropertyValue in Bioschemas/schema.org [1] [1], not only for the
>>>> Sample profile. We can see this being very useful in other areas
>>>> where there is a need for a flexible linking of ontology terms to
>>>> values.
>>>> 
>>>> We would like to get your feedback on this.
>>>> 
>>>> Best regards,
>>>> 
>>>> Luca and Matt
>>> 
>>> Links:
>>> ------
>>> [1] http://schema.org
>>> [2] http://purl.bioontology.org/ontology/ICD10/C00-C97.9 [5]
>>> [3] http://purl.bioontology.org/ontology/ICD10/D00-D09.9 [6]
>>> [4] https://pending.schema.org/CategoryCode [17]
>>> [5] http://schema.org/
>>> [6] http://id.loc.gov/vocabulary/resourceTypes [18]
>>> [7] http://bioschemas.org
>>> [8] http://purl.org
>>> 
>>> 
>>> 
>>> Links:
>>> ------
>>> [1] http://schema.org
>>> [2] https://queue.acm.org/detail.cfm?id=2857276
>>> [3]
>>> https://github.com/BioSchemas/specifications/wiki/Adding-profile-specific-relations-to-BioChemEntity-and-DataRecord
>>> [4] 
>>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/0001.html
>>> [5] http://purl.bioontology.org/ontology/ICD10/C00-C97.9
>>> [6] http://purl.bioontology.org/ontology/ICD10/D00-D09.9
>>> [7] http://www.w3.org/2000/01/rdf-schema#
>>> [8] http://bioschemas.org/samples/SampleDataRecord
>>> [9] http://bioschemas.org/samples/diagnosisAvailable
>>> [10] http://schema.org/domainIncludes
>>> [11] http://bioschemas.org/samples/SamplesDataRecord
>>> [12] http://schema.org/rangeIncludes
>>> [13] http://bioschemas.org
>>> [14] http://purl.org
>>> [15] 
>>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/thread.html
>>> [16] https://schema.org/version/latest/schema.jsonld
>>> [17] https://pending.schema.org/CategoryCode
>>> [18] http://id.loc.gov/vocabulary/resourceTypes
>> 
Received on Thursday, 22 March 2018 18:36:03 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:03 UTC