W3C home > Mailing lists > Public > public-bioschemas@w3.org > March 2018

Re: [Proposal] CategoryCode as valid type for valueReference for any PropertyValue in Bioschemas/schema.org

From: Justin Clark-Casey <jc955@cam.ac.uk>
Date: Wed, 21 Mar 2018 17:59:13 +0000
To: public-bioschemas@w3.org
Message-ID: <021ba8d7-5b6e-268c-413a-25f8d0c810cb@cam.ac.uk>
I think you're right, this could be 2 distinct things.

I recently read "Schema.org: Evolution of Structured Data on the Web" [1] and it was very illuminating as to the philosophy of schema.org.  Namely that:

* Things should be much easier for the data publishers and harder for the consumers
* Developers chiefly implement by adapting examples (we knew this)
* Getting initial adoption is much more important than getting the structures optimal upfront.  Once there is adoption, that's justification to improve 
structure if necessary.

So I agree with you - specifying sample relations through additionalProperty is easiest and specifying more universal per-profile relations (e.g. amino acid 
sequence on protein) could be done through direct additional relations to make validation easier.

To get additional relations (and the general BioChemEntity/DataRecord mechanisms) more straight in my head, I published a wiki page [2].  Apologies for any 
mistakes, please anybody feel free to edit/extend and I will do so as necessary.  I ended up repeating quite a bit of what Alasdair originally wrote [3] and 
what is in examples, but I do find it useful to have this stuff in findable wiki form (Google docs aren't exposed to search engines afaik).

[1] https://queue.acm.org/detail.cfm?id=2857276
[2] https://github.com/BioSchemas/specifications/wiki/Adding-profile-specific-relations-to-BioChemEntity-and-DataRecord
[3] https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/0001.html

On 19/03/18 18:14, ljgarcia wrote:
> Hi all,
> 
> I think we are talking about two different things here.
> 
> For Samples, directly using additionalProperty seems the easiest option as this reduce requirements for small labs providing samples. They do not have to agree 
> on any predefined terms or properties, just to provide key-value pairs via additionalProperty. Most likely, they will not including information regarding a 
> CategoryCode, this one would be added whenever possible by BioSamples. @Luca, @Matt, please correct me if I am wrong. For the Samples case, it is a +1 on my 
> side for accepting CategoryCode as a possible range for valueReference property on PropertyValue.
> 
> For other groups/profiles, what Justin mentions makes sense and is useful. We use that way (or an approximation,I still need to tune a bit of things there) in 
> the Protein profile.
> 
> What do you think? Do we have two topics here? If so, let's separate them first. In any case, I will take a deeper look to Justin's examples later, I got a bit 
> lost when I saw SampleDataRecord and also the schema:RangeIncludes.
> 
> Regards,
> 
> 
> On 2018-03-19 17:47, Justin Clark-Casey wrote:
>> So, last Friday at the Samples event, Leyla, Rafa and myself were
>> talking about the alternative of specifying additional properties
>> using a second context, rather than through AdditionalProperty.  The
>> original discussion in November was at [1] but I don't think was fully
>> formalized (and the example links are now broken).  But under this
>> approach, I think the above would instead be something like
>>
>> {
>>     "@context": ["http://schema.org",
>> "http://bioschemas.org/samples"],
>>     "@type": ["SampleDataRecord"],
>>     "diagnosisAvailable": [
>>         "http://purl.bioontology.org/ontology/ICD10/C00-C97.9",
>>         "http://purl.bioontology.org/ontology/ICD10/D00-D09.9"
>>     ]
>> }
>>
>> with http://bioschemas.org/samples as
>>
>> {
>>   "@context": {
>>     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>>   },
>>   "@graph": [
>>     "@id"; "http://bioschemas.org/samples",
>>     {
>>       "@id": "http://bioschemas.org/samples/SampleDataRecord",
>>       "@type": "rdfs:Class",
>>       "rdfs:subClassOf": { "@id": "http://schema.org/DataRecord" }
>>     }
>>     {
>>       "@id": "http://bioschemas.org/samples/diagnosisAvailable",
>>       "@type": "rdfs:Property",
>>       "rdfs:label": "Diagnosis available",
>>       "http://schema.org/domainIncludes": [
>>         {
>>           "@id": "http://bioschemas.org/samples/SamplesDataRecord"
>>         },
>>       "http://schema.org/rangeIncludes": [
>>         {
>>           "@id", "http://schema.org/URL"
>>         }
>>       ]
>>     }
>>   ]
>> }
>>
>> See [2] for schema.org [1]'s own type specification file.
>>
>> Pros:
>>   * Using existing validation tools should be easier, as this
>> definition uses standard schema.org [1] mechanisms to define
>> additional properties, rather than the AdditionalProperty escape
>> hatch.
>>   * Information such as name and label can go in the bioschemas.org
>> [7] file rather than be repeated in the data record text
>>
>>   * Easier to put in different language translations to the
>> bioschemas.org [7] file
>>
>> Cons:
>>
>>   * Applications may need to rely the URL itself (purl.org [8] above)
>> to retrieve information such as human-readable name for the
>> categoryCode itself (e.g. "IN SITU NEOPLASMS").  This is good semantic
>> web practise I believe, but may reduce reliability.  Possibly this
>> information could also be served from http://bioschemas.org as a
>> similar set of property definitions.
>>
>>   * Perhaps not quite so easy to add arbitrary additional properties,
>> though a data provider could always define and serve a third context
>> themselves, or embed it inline.
>>
>> Thoughts?  Would especially like Leyla (though I know she's on
>> holiday), Rafa, Alasdair, Dan, etc. to weigh in.
>>
>> [1]
>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/thread.html
>> [2] https://schema.org/version/latest/schema.jsonld
>>
>> -- 
>>
>> Justin Clark-Casey, http://justincc.org
>>
>> Research Software Engineer, Intermine, Cambridge
>>
>> ELIXIR UK Node technical co-orindator
>>
>> On Mon, Mar 19, 2018 at 11:21 AM, Philippe <proccaserra@gmail.com>
>> wrote:
>>
>>> Hi Luca,
>>>
>>> I am including a snippet from the notes so people can have a feel
>>> for how things could look like:
>>>
>>> {
>>>
>>> "@context": "http://schema.org" [1],
>>>
>>> "@type": ["DataRecord"],
>>>
>>> "additionalProperty": [
>>>
>>> {
>>>
>>> "@type": "PropertyValue",
>>>
>>> "name": "diagnosis_available",
>>>
>>> "value": "urn:miriam:icd:C00-C97",
>>>
>>> "valueReference": [
>>>
>>> {
>>>
>>> "@type": "CategoryCode",
>>>
>>> "name": "Malignant neoplasms",
>>>
>>> "url":
>>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9" [2],
>>>
>>> "codeValue": "C00-C97.9"
>>>
>>> }
>>>
>>> ]
>>>
>>> },
>>>
>>> {
>>>
>>> "@type": "PropertyValue",
>>>
>>> "name": "diagnosis_available",
>>>
>>> "value": "urn:miriam:icd:D00-D09",
>>>
>>> "valueReference": [
>>>
>>> {
>>>
>>> "@type": "CategoryCode",
>>>
>>> "name": "In situ neoplasms",
>>>
>>> "url":
>>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9" [3],
>>>
>>> "codeValue": "D00-D09.9"
>>>
>>> }
>>>
>>> ]
>>>
>>> },
>>> I also include the link the schema.org [1] CategoryCode:
>>> https://pending.schema.org/CategoryCode [4] and their JSON-LD
>>> snippet
>>>
>>> * {
>>> *  "@context": "http://schema.org/" [5],
>>> *  "@type": "CategoryCode",
>>> *  "codeValue": "Man",
>>> *  "inCodeSet": "http://id.loc.gov/vocabulary/resourceTypes" [6]
>>> * }
>>>
>>> Question: Should 'inCodeSet' attribute be used instead ?
>>>
>>> Best
>>>
>>> Philippe
>>>
>>> On 19/03/2018 11:10, Luca Cherubin wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> During the Hackathon event last week with various Biobanks
>>>> representatives we had the chance to use Bioschemas profiles and
>>>> types to support BioBanks use cases for metadata sharing.
>>>>
>>>> As you may know, in the Sample profile we proposed a solution for
>>>> linking ontology terms to a PropertyValue using CategoryCode as
>>>> valid type for the valueReference field. Note that CategoryCode is
>>>> already a proposed schema.org [1] type but in the
>>>> bioschemas/samples specification we propose that it should be an
>>>> acceptable value for valueReference.
>>>>
>>>> To support BioBank use cases, we are using DataRecord and they
>>>> need to use the same CategoryCode strategy to describe all the
>>>> PropertyValue associated with a DataRecord.
>>>>
>>>> In our opinion this is a very strong use case for supporting the
>>>> use of CategoryCode as valid type for valueReference for any
>>>> PropertyValue in Bioschemas/schema.org [1], not only for the
>>>> Sample profile. We can see this being very useful in other areas
>>>> where there is a need for a flexible linking of ontology terms to
>>>> values.
>>>>
>>>> We would like to get your feedback on this.
>>>>
>>>> Best regards,
>>>>
>>>> Luca and Matt
>>
>>
>>
>> Links:
>> ------
>> [1] http://schema.org
>> [2] http://purl.bioontology.org/ontology/ICD10/C00-C97.9
>> [3] http://purl.bioontology.org/ontology/ICD10/D00-D09.9
>> [4] https://pending.schema.org/CategoryCode
>> [5] http://schema.org/
>> [6] http://id.loc.gov/vocabulary/resourceTypes
>> [7] http://bioschemas.org
>> [8] http://purl.org
> 
Received on Wednesday, 21 March 2018 17:59:47 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:03 UTC