W3C home > Mailing lists > Public > public-bioschemas@w3.org > March 2018

Re: [Proposal] CategoryCode as valid type for valueReference for any PropertyValue in Bioschemas/schema.org

From: Justin Clark-Casey <justinccdev@gmail.com>
Date: Thu, 22 Mar 2018 16:11:49 +0000
Message-ID: <CAME9NR-8mdWTKujiaJ9tU951fhR9N5GnbCjjcGsAesNY=aJdBQ@mail.gmail.com>
To: ljgarcia <ljgarcia@ebi.ac.uk>
Cc: Justin Clark-Casey <jc955@cam.ac.uk>, public-bioschemas@w3.org
Thanks Leyla for the edits, much appreciated.

I hadn't realized from the old examples that BioChemEntity was now being
specified through multiple inheritance directly, e.g.

{
    "@context": "http://schema.org",
    "@type": ["BioChemEntity", "http://purl.obolibrary.org/obo/PR_000000001"],
    "additionalType": "http://semanticscience.org/resource/SIO_010043",
    ...
}

with "http://purl.obolibrary.org/obo/PR_000000001" as the mandatory string
for protein.  In this case, though, what is the purpose of also giving
additionalType (a recommended property)?  To optionally specify the type
further in a less controlled manner?


On Wed, Mar 21, 2018 at 6:53 PM, ljgarcia <ljgarcia@ebi.ac.uk> wrote:

> Hi Justin,
>
> Thanks for this initiative, nice summary for additionalProperty and its
> alternative via direct reuse of properties coined in other controlled
> vocabularies.
>
> There were some issues regarding the use of additionalType so I made some
> editions to the first and second sections. Feel free to ping me if you have
> any questions or want to discuss further.
>
> Regards,
>
>
> On 2018-03-21 17:59, Justin Clark-Casey wrote:
>
>> I think you're right, this could be 2 distinct things.
>>
>> I recently read "Schema.org: Evolution of Structured Data on the Web"
>> [1] and it was very illuminating as to the philosophy of schema.org.
>> Namely that:
>>
>> * Things should be much easier for the data publishers and harder for
>> the consumers
>> * Developers chiefly implement by adapting examples (we knew this)
>> * Getting initial adoption is much more important than getting the
>> structures optimal upfront.  Once there is adoption, that's
>> justification to improve structure if necessary.
>>
>> So I agree with you - specifying sample relations through
>> additionalProperty is easiest and specifying more universal
>> per-profile relations (e.g. amino acid sequence on protein) could be
>> done through direct additional relations to make validation easier.
>>
>> To get additional relations (and the general BioChemEntity/DataRecord
>> mechanisms) more straight in my head, I published a wiki page [2].
>> Apologies for any mistakes, please anybody feel free to edit/extend
>> and I will do so as necessary.  I ended up repeating quite a bit of
>> what Alasdair originally wrote [3] and what is in examples, but I do
>> find it useful to have this stuff in findable wiki form (Google docs
>> aren't exposed to search engines afaik).
>>
>> [1] https://queue.acm.org/detail.cfm?id=2857276
>> [2]
>> https://github.com/BioSchemas/specifications/wiki/Adding-pro
>> file-specific-relations-to-BioChemEntity-and-DataRecord
>> [3] https://lists.w3.org/Archives/Public/public-bioschemas/2017N
>> ov/0001.html
>>
>> On 19/03/18 18:14, ljgarcia wrote:
>>
>>> Hi all,
>>>
>>> I think we are talking about two different things here.
>>>
>>> For Samples, directly using additionalProperty seems the easiest option
>>> as this reduce requirements for small labs providing samples. They do not
>>> have to agree on any predefined terms or properties, just to provide
>>> key-value pairs via additionalProperty. Most likely, they will not
>>> including information regarding a CategoryCode, this one would be added
>>> whenever possible by BioSamples. @Luca, @Matt, please correct me if I am
>>> wrong. For the Samples case, it is a +1 on my side for accepting
>>> CategoryCode as a possible range for valueReference property on
>>> PropertyValue.
>>>
>>> For other groups/profiles, what Justin mentions makes sense and is
>>> useful. We use that way (or an approximation,I still need to tune a bit of
>>> things there) in the Protein profile.
>>>
>>> What do you think? Do we have two topics here? If so, let's separate
>>> them first. In any case, I will take a deeper look to Justin's examples
>>> later, I got a bit lost when I saw SampleDataRecord and also the
>>> schema:RangeIncludes.
>>>
>>> Regards,
>>>
>>>
>>> On 2018-03-19 17:47, Justin Clark-Casey wrote:
>>>
>>>> So, last Friday at the Samples event, Leyla, Rafa and myself were
>>>> talking about the alternative of specifying additional properties
>>>> using a second context, rather than through AdditionalProperty.  The
>>>> original discussion in November was at [1] but I don't think was fully
>>>> formalized (and the example links are now broken).  But under this
>>>> approach, I think the above would instead be something like
>>>>
>>>> {
>>>>     "@context": ["http://schema.org",
>>>> "http://bioschemas.org/samples"],
>>>>     "@type": ["SampleDataRecord"],
>>>>     "diagnosisAvailable": [
>>>>         "http://purl.bioontology.org/ontology/ICD10/C00-C97.9",
>>>>         "http://purl.bioontology.org/ontology/ICD10/D00-D09.9"
>>>>     ]
>>>> }
>>>>
>>>> with http://bioschemas.org/samples as
>>>>
>>>> {
>>>>   "@context": {
>>>>     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>>>>   },
>>>>   "@graph": [
>>>>     "@id"; "http://bioschemas.org/samples",
>>>>     {
>>>>       "@id": "http://bioschemas.org/samples/SampleDataRecord",
>>>>       "@type": "rdfs:Class",
>>>>       "rdfs:subClassOf": { "@id": "http://schema.org/DataRecord" }
>>>>     }
>>>>     {
>>>>       "@id": "http://bioschemas.org/samples/diagnosisAvailable",
>>>>       "@type": "rdfs:Property",
>>>>       "rdfs:label": "Diagnosis available",
>>>>       "http://schema.org/domainIncludes": [
>>>>         {
>>>>           "@id": "http://bioschemas.org/samples/SamplesDataRecord"
>>>>         },
>>>>       "http://schema.org/rangeIncludes": [
>>>>         {
>>>>           "@id", "http://schema.org/URL"
>>>>         }
>>>>       ]
>>>>     }
>>>>   ]
>>>> }
>>>>
>>>> See [2] for schema.org [1]'s own type specification file.
>>>>
>>>> Pros:
>>>>   * Using existing validation tools should be easier, as this
>>>> definition uses standard schema.org [1] mechanisms to define
>>>> additional properties, rather than the AdditionalProperty escape
>>>> hatch.
>>>>   * Information such as name and label can go in the bioschemas.org
>>>> [7] file rather than be repeated in the data record text
>>>>
>>>>   * Easier to put in different language translations to the
>>>> bioschemas.org [7] file
>>>>
>>>> Cons:
>>>>
>>>>   * Applications may need to rely the URL itself (purl.org [8] above)
>>>> to retrieve information such as human-readable name for the
>>>> categoryCode itself (e.g. "IN SITU NEOPLASMS").  This is good semantic
>>>> web practise I believe, but may reduce reliability.  Possibly this
>>>> information could also be served from http://bioschemas.org as a
>>>> similar set of property definitions.
>>>>
>>>>   * Perhaps not quite so easy to add arbitrary additional properties,
>>>> though a data provider could always define and serve a third context
>>>> themselves, or embed it inline.
>>>>
>>>> Thoughts?  Would especially like Leyla (though I know she's on
>>>> holiday), Rafa, Alasdair, Dan, etc. to weigh in.
>>>>
>>>> [1]
>>>> https://lists.w3.org/Archives/Public/public-bioschemas/2017N
>>>> ov/thread.html
>>>> [2] https://schema.org/version/latest/schema.jsonld
>>>>
>>>> -- Justin Clark-Casey, http://justincc.org
>>>>
>>>> Research Software Engineer, Intermine, Cambridge
>>>>
>>>> ELIXIR UK Node technical co-orindator
>>>>
>>>> On Mon, Mar 19, 2018 at 11:21 AM, Philippe <proccaserra@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Luca,
>>>>>
>>>>> I am including a snippet from the notes so people can have a feel
>>>>> for how things could look like:
>>>>>
>>>>> {
>>>>>
>>>>> "@context": "http://schema.org" [1],
>>>>>
>>>>> "@type": ["DataRecord"],
>>>>>
>>>>> "additionalProperty": [
>>>>>
>>>>> {
>>>>>
>>>>> "@type": "PropertyValue",
>>>>>
>>>>> "name": "diagnosis_available",
>>>>>
>>>>> "value": "urn:miriam:icd:C00-C97",
>>>>>
>>>>> "valueReference": [
>>>>>
>>>>> {
>>>>>
>>>>> "@type": "CategoryCode",
>>>>>
>>>>> "name": "Malignant neoplasms",
>>>>>
>>>>> "url":
>>>>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9" [2],
>>>>>
>>>>> "codeValue": "C00-C97.9"
>>>>>
>>>>> }
>>>>>
>>>>> ]
>>>>>
>>>>> },
>>>>>
>>>>> {
>>>>>
>>>>> "@type": "PropertyValue",
>>>>>
>>>>> "name": "diagnosis_available",
>>>>>
>>>>> "value": "urn:miriam:icd:D00-D09",
>>>>>
>>>>> "valueReference": [
>>>>>
>>>>> {
>>>>>
>>>>> "@type": "CategoryCode",
>>>>>
>>>>> "name": "In situ neoplasms",
>>>>>
>>>>> "url":
>>>>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9" [3],
>>>>>
>>>>> "codeValue": "D00-D09.9"
>>>>>
>>>>> }
>>>>>
>>>>> ]
>>>>>
>>>>> },
>>>>> I also include the link the schema.org [1] CategoryCode:
>>>>> https://pending.schema.org/CategoryCode [4] and their JSON-LD
>>>>> snippet
>>>>>
>>>>> * {
>>>>> *  "@context": "http://schema.org/" [5],
>>>>> *  "@type": "CategoryCode",
>>>>> *  "codeValue": "Man",
>>>>> *  "inCodeSet": "http://id.loc.gov/vocabulary/resourceTypes" [6]
>>>>> * }
>>>>>
>>>>> Question: Should 'inCodeSet' attribute be used instead ?
>>>>>
>>>>> Best
>>>>>
>>>>> Philippe
>>>>>
>>>>> On 19/03/2018 11:10, Luca Cherubin wrote:
>>>>>
>>>>> Hi everybody,
>>>>>>
>>>>>> During the Hackathon event last week with various Biobanks
>>>>>> representatives we had the chance to use Bioschemas profiles and
>>>>>> types to support BioBanks use cases for metadata sharing.
>>>>>>
>>>>>> As you may know, in the Sample profile we proposed a solution for
>>>>>> linking ontology terms to a PropertyValue using CategoryCode as
>>>>>> valid type for the valueReference field. Note that CategoryCode is
>>>>>> already a proposed schema.org [1] type but in the
>>>>>> bioschemas/samples specification we propose that it should be an
>>>>>> acceptable value for valueReference.
>>>>>>
>>>>>> To support BioBank use cases, we are using DataRecord and they
>>>>>> need to use the same CategoryCode strategy to describe all the
>>>>>> PropertyValue associated with a DataRecord.
>>>>>>
>>>>>> In our opinion this is a very strong use case for supporting the
>>>>>> use of CategoryCode as valid type for valueReference for any
>>>>>> PropertyValue in Bioschemas/schema.org [1], not only for the
>>>>>> Sample profile. We can see this being very useful in other areas
>>>>>> where there is a need for a flexible linking of ontology terms to
>>>>>> values.
>>>>>>
>>>>>> We would like to get your feedback on this.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Luca and Matt
>>>>>>
>>>>>
>>>>
>>>>
>>>> Links:
>>>> ------
>>>> [1] http://schema.org
>>>> [2] http://purl.bioontology.org/ontology/ICD10/C00-C97.9
>>>> [3] http://purl.bioontology.org/ontology/ICD10/D00-D09.9
>>>> [4] https://pending.schema.org/CategoryCode
>>>> [5] http://schema.org/
>>>> [6] http://id.loc.gov/vocabulary/resourceTypes
>>>> [7] http://bioschemas.org
>>>> [8] http://purl.org
>>>>
>>>
>>>
>
Received on Thursday, 22 March 2018 16:12:14 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:03 UTC