Re: [Proposal] CategoryCode as valid type for valueReference for any PropertyValue in Bioschemas/schema.org from ljgarcia on 2018-03-21 (public-bioschemas@w3.org from March 2018)

From: ljgarcia <ljgarcia@ebi.ac.uk>
Date: Wed, 21 Mar 2018 18:53:36 +0000
To: Justin Clark-Casey <jc955@cam.ac.uk>
Cc: public-bioschemas@w3.org
Message-ID: <c9fc19277b33212c44a1ff201d32a4f1@ebi.ac.uk>
Hi Justin,

Thanks for this initiative, nice summary for additionalProperty and its 
alternative via direct reuse of properties coined in other controlled 
vocabularies.

There were some issues regarding the use of additionalType so I made 
some editions to the first and second sections. Feel free to ping me if 
you have any questions or want to discuss further.

Regards,

On 2018-03-21 17:59, Justin Clark-Casey wrote:
> I think you're right, this could be 2 distinct things.
> 
> I recently read "Schema.org: Evolution of Structured Data on the Web"
> [1] and it was very illuminating as to the philosophy of schema.org.
> Namely that:
> 
> * Things should be much easier for the data publishers and harder for
> the consumers
> * Developers chiefly implement by adapting examples (we knew this)
> * Getting initial adoption is much more important than getting the
> structures optimal upfront.  Once there is adoption, that's
> justification to improve structure if necessary.
> 
> So I agree with you - specifying sample relations through
> additionalProperty is easiest and specifying more universal
> per-profile relations (e.g. amino acid sequence on protein) could be
> done through direct additional relations to make validation easier.
> 
> To get additional relations (and the general BioChemEntity/DataRecord
> mechanisms) more straight in my head, I published a wiki page [2].
> Apologies for any mistakes, please anybody feel free to edit/extend
> and I will do so as necessary.  I ended up repeating quite a bit of
> what Alasdair originally wrote [3] and what is in examples, but I do
> find it useful to have this stuff in findable wiki form (Google docs
> aren't exposed to search engines afaik).
> 
> [1] https://queue.acm.org/detail.cfm?id=2857276
> [2]
> https://github.com/BioSchemas/specifications/wiki/Adding-profile-specific-relations-to-BioChemEntity-and-DataRecord
> [3] 
> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/0001.html
> 
> On 19/03/18 18:14, ljgarcia wrote:
>> Hi all,
>> 
>> I think we are talking about two different things here.
>> 
>> For Samples, directly using additionalProperty seems the easiest 
>> option as this reduce requirements for small labs providing samples. 
>> They do not have to agree on any predefined terms or properties, just 
>> to provide key-value pairs via additionalProperty. Most likely, they 
>> will not including information regarding a CategoryCode, this one 
>> would be added whenever possible by BioSamples. @Luca, @Matt, please 
>> correct me if I am wrong. For the Samples case, it is a +1 on my side 
>> for accepting CategoryCode as a possible range for valueReference 
>> property on PropertyValue.
>> 
>> For other groups/profiles, what Justin mentions makes sense and is 
>> useful. We use that way (or an approximation,I still need to tune a 
>> bit of things there) in the Protein profile.
>> 
>> What do you think? Do we have two topics here? If so, let's separate 
>> them first. In any case, I will take a deeper look to Justin's 
>> examples later, I got a bit lost when I saw SampleDataRecord and also 
>> the schema:RangeIncludes.
>> 
>> Regards,
>> 
>> 
>> On 2018-03-19 17:47, Justin Clark-Casey wrote:
>>> So, last Friday at the Samples event, Leyla, Rafa and myself were
>>> talking about the alternative of specifying additional properties
>>> using a second context, rather than through AdditionalProperty.  The
>>> original discussion in November was at [1] but I don't think was 
>>> fully
>>> formalized (and the example links are now broken).  But under this
>>> approach, I think the above would instead be something like
>>> 
>>> {
>>>     "@context": ["http://schema.org",
>>> "http://bioschemas.org/samples"],
>>>     "@type": ["SampleDataRecord"],
>>>     "diagnosisAvailable": [
>>>         "http://purl.bioontology.org/ontology/ICD10/C00-C97.9",
>>>         "http://purl.bioontology.org/ontology/ICD10/D00-D09.9"
>>>     ]
>>> }
>>> 
>>> with http://bioschemas.org/samples as
>>> 
>>> {
>>>   "@context": {
>>>     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>>>   },
>>>   "@graph": [
>>>     "@id"; "http://bioschemas.org/samples",
>>>     {
>>>       "@id": "http://bioschemas.org/samples/SampleDataRecord",
>>>       "@type": "rdfs:Class",
>>>       "rdfs:subClassOf": { "@id": "http://schema.org/DataRecord" }
>>>     }
>>>     {
>>>       "@id": "http://bioschemas.org/samples/diagnosisAvailable",
>>>       "@type": "rdfs:Property",
>>>       "rdfs:label": "Diagnosis available",
>>>       "http://schema.org/domainIncludes": [
>>>         {
>>>           "@id": "http://bioschemas.org/samples/SamplesDataRecord"
>>>         },
>>>       "http://schema.org/rangeIncludes": [
>>>         {
>>>           "@id", "http://schema.org/URL"
>>>         }
>>>       ]
>>>     }
>>>   ]
>>> }
>>> 
>>> See [2] for schema.org [1]'s own type specification file.
>>> 
>>> Pros:
>>>   * Using existing validation tools should be easier, as this
>>> definition uses standard schema.org [1] mechanisms to define
>>> additional properties, rather than the AdditionalProperty escape
>>> hatch.
>>>   * Information such as name and label can go in the bioschemas.org
>>> [7] file rather than be repeated in the data record text
>>> 
>>>   * Easier to put in different language translations to the
>>> bioschemas.org [7] file
>>> 
>>> Cons:
>>> 
>>>   * Applications may need to rely the URL itself (purl.org [8] above)
>>> to retrieve information such as human-readable name for the
>>> categoryCode itself (e.g. "IN SITU NEOPLASMS").  This is good 
>>> semantic
>>> web practise I believe, but may reduce reliability.  Possibly this
>>> information could also be served from http://bioschemas.org as a
>>> similar set of property definitions.
>>> 
>>>   * Perhaps not quite so easy to add arbitrary additional properties,
>>> though a data provider could always define and serve a third context
>>> themselves, or embed it inline.
>>> 
>>> Thoughts?  Would especially like Leyla (though I know she's on
>>> holiday), Rafa, Alasdair, Dan, etc. to weigh in.
>>> 
>>> [1]
>>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/thread.html
>>> [2] https://schema.org/version/latest/schema.jsonld
>>> 
>>> -- Justin Clark-Casey, http://justincc.org
>>> 
>>> Research Software Engineer, Intermine, Cambridge
>>> 
>>> ELIXIR UK Node technical co-orindator
>>> 
>>> On Mon, Mar 19, 2018 at 11:21 AM, Philippe <proccaserra@gmail.com>
>>> wrote:
>>> 
>>>> Hi Luca,
>>>> 
>>>> I am including a snippet from the notes so people can have a feel
>>>> for how things could look like:
>>>> 
>>>> {
>>>> 
>>>> "@context": "http://schema.org" [1],
>>>> 
>>>> "@type": ["DataRecord"],
>>>> 
>>>> "additionalProperty": [
>>>> 
>>>> {
>>>> 
>>>> "@type": "PropertyValue",
>>>> 
>>>> "name": "diagnosis_available",
>>>> 
>>>> "value": "urn:miriam:icd:C00-C97",
>>>> 
>>>> "valueReference": [
>>>> 
>>>> {
>>>> 
>>>> "@type": "CategoryCode",
>>>> 
>>>> "name": "Malignant neoplasms",
>>>> 
>>>> "url":
>>>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9" [2],
>>>> 
>>>> "codeValue": "C00-C97.9"
>>>> 
>>>> }
>>>> 
>>>> ]
>>>> 
>>>> },
>>>> 
>>>> {
>>>> 
>>>> "@type": "PropertyValue",
>>>> 
>>>> "name": "diagnosis_available",
>>>> 
>>>> "value": "urn:miriam:icd:D00-D09",
>>>> 
>>>> "valueReference": [
>>>> 
>>>> {
>>>> 
>>>> "@type": "CategoryCode",
>>>> 
>>>> "name": "In situ neoplasms",
>>>> 
>>>> "url":
>>>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9" [3],
>>>> 
>>>> "codeValue": "D00-D09.9"
>>>> 
>>>> }
>>>> 
>>>> ]
>>>> 
>>>> },
>>>> I also include the link the schema.org [1] CategoryCode:
>>>> https://pending.schema.org/CategoryCode [4] and their JSON-LD
>>>> snippet
>>>> 
>>>> * {
>>>> *  "@context": "http://schema.org/" [5],
>>>> *  "@type": "CategoryCode",
>>>> *  "codeValue": "Man",
>>>> *  "inCodeSet": "http://id.loc.gov/vocabulary/resourceTypes" [6]
>>>> * }
>>>> 
>>>> Question: Should 'inCodeSet' attribute be used instead ?
>>>> 
>>>> Best
>>>> 
>>>> Philippe
>>>> 
>>>> On 19/03/2018 11:10, Luca Cherubin wrote:
>>>> 
>>>>> Hi everybody,
>>>>> 
>>>>> During the Hackathon event last week with various Biobanks
>>>>> representatives we had the chance to use Bioschemas profiles and
>>>>> types to support BioBanks use cases for metadata sharing.
>>>>> 
>>>>> As you may know, in the Sample profile we proposed a solution for
>>>>> linking ontology terms to a PropertyValue using CategoryCode as
>>>>> valid type for the valueReference field. Note that CategoryCode is
>>>>> already a proposed schema.org [1] type but in the
>>>>> bioschemas/samples specification we propose that it should be an
>>>>> acceptable value for valueReference.
>>>>> 
>>>>> To support BioBank use cases, we are using DataRecord and they
>>>>> need to use the same CategoryCode strategy to describe all the
>>>>> PropertyValue associated with a DataRecord.
>>>>> 
>>>>> In our opinion this is a very strong use case for supporting the
>>>>> use of CategoryCode as valid type for valueReference for any
>>>>> PropertyValue in Bioschemas/schema.org [1], not only for the
>>>>> Sample profile. We can see this being very useful in other areas
>>>>> where there is a need for a flexible linking of ontology terms to
>>>>> values.
>>>>> 
>>>>> We would like to get your feedback on this.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Luca and Matt
>>> 
>>> 
>>> 
>>> Links:
>>> ------
>>> [1] http://schema.org
>>> [2] http://purl.bioontology.org/ontology/ICD10/C00-C97.9
>>> [3] http://purl.bioontology.org/ontology/ICD10/D00-D09.9
>>> [4] https://pending.schema.org/CategoryCode
>>> [5] http://schema.org/
>>> [6] http://id.loc.gov/vocabulary/resourceTypes
>>> [7] http://bioschemas.org
>>> [8] http://purl.org
>>
Received on Wednesday, 21 March 2018 18:54:03 UTC