- From: Justin Clark-Casey <jc955@cam.ac.uk>
- Date: Wed, 21 Mar 2018 17:59:13 +0000
- To: public-bioschemas@w3.org
I think you're right, this could be 2 distinct things.
I recently read "Schema.org: Evolution of Structured Data on the Web" [1] and it was very illuminating as to the philosophy of schema.org. Namely that:
* Things should be much easier for the data publishers and harder for the consumers
* Developers chiefly implement by adapting examples (we knew this)
* Getting initial adoption is much more important than getting the structures optimal upfront. Once there is adoption, that's justification to improve
structure if necessary.
So I agree with you - specifying sample relations through additionalProperty is easiest and specifying more universal per-profile relations (e.g. amino acid
sequence on protein) could be done through direct additional relations to make validation easier.
To get additional relations (and the general BioChemEntity/DataRecord mechanisms) more straight in my head, I published a wiki page [2]. Apologies for any
mistakes, please anybody feel free to edit/extend and I will do so as necessary. I ended up repeating quite a bit of what Alasdair originally wrote [3] and
what is in examples, but I do find it useful to have this stuff in findable wiki form (Google docs aren't exposed to search engines afaik).
[1] https://queue.acm.org/detail.cfm?id=2857276
[2] https://github.com/BioSchemas/specifications/wiki/Adding-profile-specific-relations-to-BioChemEntity-and-DataRecord
[3] https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/0001.html
On 19/03/18 18:14, ljgarcia wrote:
> Hi all,
>
> I think we are talking about two different things here.
>
> For Samples, directly using additionalProperty seems the easiest option as this reduce requirements for small labs providing samples. They do not have to agree
> on any predefined terms or properties, just to provide key-value pairs via additionalProperty. Most likely, they will not including information regarding a
> CategoryCode, this one would be added whenever possible by BioSamples. @Luca, @Matt, please correct me if I am wrong. For the Samples case, it is a +1 on my
> side for accepting CategoryCode as a possible range for valueReference property on PropertyValue.
>
> For other groups/profiles, what Justin mentions makes sense and is useful. We use that way (or an approximation,I still need to tune a bit of things there) in
> the Protein profile.
>
> What do you think? Do we have two topics here? If so, let's separate them first. In any case, I will take a deeper look to Justin's examples later, I got a bit
> lost when I saw SampleDataRecord and also the schema:RangeIncludes.
>
> Regards,
>
>
> On 2018-03-19 17:47, Justin Clark-Casey wrote:
>> So, last Friday at the Samples event, Leyla, Rafa and myself were
>> talking about the alternative of specifying additional properties
>> using a second context, rather than through AdditionalProperty. The
>> original discussion in November was at [1] but I don't think was fully
>> formalized (and the example links are now broken). But under this
>> approach, I think the above would instead be something like
>>
>> {
>> "@context": ["http://schema.org",
>> "http://bioschemas.org/samples"],
>> "@type": ["SampleDataRecord"],
>> "diagnosisAvailable": [
>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9",
>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9"
>> ]
>> }
>>
>> with http://bioschemas.org/samples as
>>
>> {
>> "@context": {
>> "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>> },
>> "@graph": [
>> "@id"; "http://bioschemas.org/samples",
>> {
>> "@id": "http://bioschemas.org/samples/SampleDataRecord",
>> "@type": "rdfs:Class",
>> "rdfs:subClassOf": { "@id": "http://schema.org/DataRecord" }
>> }
>> {
>> "@id": "http://bioschemas.org/samples/diagnosisAvailable",
>> "@type": "rdfs:Property",
>> "rdfs:label": "Diagnosis available",
>> "http://schema.org/domainIncludes": [
>> {
>> "@id": "http://bioschemas.org/samples/SamplesDataRecord"
>> },
>> "http://schema.org/rangeIncludes": [
>> {
>> "@id", "http://schema.org/URL"
>> }
>> ]
>> }
>> ]
>> }
>>
>> See [2] for schema.org [1]'s own type specification file.
>>
>> Pros:
>> * Using existing validation tools should be easier, as this
>> definition uses standard schema.org [1] mechanisms to define
>> additional properties, rather than the AdditionalProperty escape
>> hatch.
>> * Information such as name and label can go in the bioschemas.org
>> [7] file rather than be repeated in the data record text
>>
>> * Easier to put in different language translations to the
>> bioschemas.org [7] file
>>
>> Cons:
>>
>> * Applications may need to rely the URL itself (purl.org [8] above)
>> to retrieve information such as human-readable name for the
>> categoryCode itself (e.g. "IN SITU NEOPLASMS"). This is good semantic
>> web practise I believe, but may reduce reliability. Possibly this
>> information could also be served from http://bioschemas.org as a
>> similar set of property definitions.
>>
>> * Perhaps not quite so easy to add arbitrary additional properties,
>> though a data provider could always define and serve a third context
>> themselves, or embed it inline.
>>
>> Thoughts? Would especially like Leyla (though I know she's on
>> holiday), Rafa, Alasdair, Dan, etc. to weigh in.
>>
>> [1]
>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Nov/thread.html
>> [2] https://schema.org/version/latest/schema.jsonld
>>
>> --
>>
>> Justin Clark-Casey, http://justincc.org
>>
>> Research Software Engineer, Intermine, Cambridge
>>
>> ELIXIR UK Node technical co-orindator
>>
>> On Mon, Mar 19, 2018 at 11:21 AM, Philippe <proccaserra@gmail.com>
>> wrote:
>>
>>> Hi Luca,
>>>
>>> I am including a snippet from the notes so people can have a feel
>>> for how things could look like:
>>>
>>> {
>>>
>>> "@context": "http://schema.org" [1],
>>>
>>> "@type": ["DataRecord"],
>>>
>>> "additionalProperty": [
>>>
>>> {
>>>
>>> "@type": "PropertyValue",
>>>
>>> "name": "diagnosis_available",
>>>
>>> "value": "urn:miriam:icd:C00-C97",
>>>
>>> "valueReference": [
>>>
>>> {
>>>
>>> "@type": "CategoryCode",
>>>
>>> "name": "Malignant neoplasms",
>>>
>>> "url":
>>> "http://purl.bioontology.org/ontology/ICD10/C00-C97.9" [2],
>>>
>>> "codeValue": "C00-C97.9"
>>>
>>> }
>>>
>>> ]
>>>
>>> },
>>>
>>> {
>>>
>>> "@type": "PropertyValue",
>>>
>>> "name": "diagnosis_available",
>>>
>>> "value": "urn:miriam:icd:D00-D09",
>>>
>>> "valueReference": [
>>>
>>> {
>>>
>>> "@type": "CategoryCode",
>>>
>>> "name": "In situ neoplasms",
>>>
>>> "url":
>>> "http://purl.bioontology.org/ontology/ICD10/D00-D09.9" [3],
>>>
>>> "codeValue": "D00-D09.9"
>>>
>>> }
>>>
>>> ]
>>>
>>> },
>>> I also include the link the schema.org [1] CategoryCode:
>>> https://pending.schema.org/CategoryCode [4] and their JSON-LD
>>> snippet
>>>
>>> * {
>>> * "@context": "http://schema.org/" [5],
>>> * "@type": "CategoryCode",
>>> * "codeValue": "Man",
>>> * "inCodeSet": "http://id.loc.gov/vocabulary/resourceTypes" [6]
>>> * }
>>>
>>> Question: Should 'inCodeSet' attribute be used instead ?
>>>
>>> Best
>>>
>>> Philippe
>>>
>>> On 19/03/2018 11:10, Luca Cherubin wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> During the Hackathon event last week with various Biobanks
>>>> representatives we had the chance to use Bioschemas profiles and
>>>> types to support BioBanks use cases for metadata sharing.
>>>>
>>>> As you may know, in the Sample profile we proposed a solution for
>>>> linking ontology terms to a PropertyValue using CategoryCode as
>>>> valid type for the valueReference field. Note that CategoryCode is
>>>> already a proposed schema.org [1] type but in the
>>>> bioschemas/samples specification we propose that it should be an
>>>> acceptable value for valueReference.
>>>>
>>>> To support BioBank use cases, we are using DataRecord and they
>>>> need to use the same CategoryCode strategy to describe all the
>>>> PropertyValue associated with a DataRecord.
>>>>
>>>> In our opinion this is a very strong use case for supporting the
>>>> use of CategoryCode as valid type for valueReference for any
>>>> PropertyValue in Bioschemas/schema.org [1], not only for the
>>>> Sample profile. We can see this being very useful in other areas
>>>> where there is a need for a flexible linking of ontology terms to
>>>> values.
>>>>
>>>> We would like to get your feedback on this.
>>>>
>>>> Best regards,
>>>>
>>>> Luca and Matt
>>
>>
>>
>> Links:
>> ------
>> [1] http://schema.org
>> [2] http://purl.bioontology.org/ontology/ICD10/C00-C97.9
>> [3] http://purl.bioontology.org/ontology/ICD10/D00-D09.9
>> [4] https://pending.schema.org/CategoryCode
>> [5] http://schema.org/
>> [6] http://id.loc.gov/vocabulary/resourceTypes
>> [7] http://bioschemas.org
>> [8] http://purl.org
>
Received on Wednesday, 21 March 2018 17:59:47 UTC