Re: Preferred ontology for PhysicalEntity.additionalType?

Hello,

I'm currently building a validator which I will present in October. I've 
fully dealt with EDAM ontology terms but wonder how to validate terms 
from other ontologies too. This is what I did:

The specification is loaded as a JSON file containing the fields 
constrains [1]. Several fields are under ontology constrains which can 
either be static (a few terms from a list) or dynamic (several tables 
ect ...).

These constrains are implemented using the json-schema "enum" property. 
It can contain either a static vocabulary or a list of allowed 
ontologies and the table the terms should be referring to (example: 
"enum" : ["EDAM/FORMAT", "OTHER_ONTOLOGY/TARGET_TYPE"]) [2].

Inputs and outputs should always be pointing to an ontology term. To do 
this we use the potentialAction field of SoftwareApplication (which 
expects a ControlAction input). It can take up to three object 
variables, from which two are under vocabulary constrains. R*esult* and 
*object* fields require a Dataset object to which we added the 
*additionalType* field. This field expects either an URL or a string 
which should indicate the selected ontology, table and term. It will 
decompose this string/URL and verify if the ontology is supported, if 
the target table/type is the expected one and finally if the term exists.

Example:

"potentialAction": {
         "@type": "ControlAction",
         "object": {
             "@type": "Dataset",
             "additionalType": "http://edamontology.org/format_3749"
         },
         "result": [
 {
             "@type": "Dataset",
             "additionalType": "http://edamontology.org/format_2331"
         },
 {
     "@type" : "Dataset",
     "additionalType" : "http://target_ontology.com/target_table/target_ID"
 }
}

To do that, and in regards to EDAM, we use the Biosphere API [3] which 
allows us to retrieve a term string name based on it's ID, or the ID 
based on the term, as a JSON variable. This way, we can decompose a 
given EDAM URL, pass the table and ID variables to the API and check 
that the term actually exists.

So, whichever ontology users wish to use, we need, at some point, to be 
able to programmatically validate the terms (is the ontology allowed, 
does the term exist and is it of the expected type? ). This is a 
downside because we will need a new term validator for every single 
ontology we wish to be compatible with. So the question is, should we be 
compatible with every ontology out there (in which case I need to find 
new solutions to deal with this constrain) or should we only have 
predefined sets of compatible ontologies (in which case I will put 
efforts into putting up new term validators) ?

In regards to the issue of how to interpret same concepts among 
different ontologies, it probably won't impact the validator as long as 
a term can be programmatically validated. However, crawlers will be 
deeply impacted as their databases need to be aware somehow that terms 
from different ontologies may represent the same concept. All data 
consumers will be somehow impacted by this choice.


[1] Following the http://json-schema.org/ standards.

[2] For the complete Tool specification: 
https://github.com/terazus/bioschemas_validator_drupal7/blob/master/specs/default/softwareapplication/softwareapplication.json

[3] 
https://biosphere.france-bioinformatique.fr/edamontology/table/id/?media=json 
(replace table and id with your input).


Hope this can help,

Best regards,



Dominique Batista

Web Engineer IFB

Website: https://www.france-bioinformatique.fr


On 19/09/17 14:39, Leyla Garcia wrote:
> Hi,
>
> This is an important question that goes beyond the specification. Does 
> maybe the governance/tools groups have a suggestion here? Whatever is 
> decided, will impact how the validation and other tools interpret the 
> specification.
>
> PhisycalEntity profiles, such as Protein, could, I guess, 
> suggest/require a particular type. How should that be modeled? And, 
> should profiles restrict this type to only one predefined value?
>
> Regards,
>
> On 19/09/2017 12:31, Justin Clark-Casey wrote:
>> Hi all,
>>
>> From [1], I see that we are now proposing that 
>> PhysicalEntity.additionalType (via Thing) show the type of entity in 
>> Bioschemas by pointing to an URL for an ontology term.  This replaces 
>> BiologicalEntity,biologicalType, which used a controlled vocabulary 
>> of strings "gene", "phenotype", etc. Leyla gives an example from the 
>> semantic science ontology [2].
>>
>> Are we planning to recommend particular ontologies for 
>> additionalType?  Or are we expecting search engines to use ontology 
>> mappings (e.g. [3]) to handle cases where people use different 
>> ontologies for the same concept?
>>
>> In InterMine we have the particular use case that any user can extend 
>> our provided data model with new classes.  Since we need to make term 
>> selection as easy as possible (and preferably consistent with the 
>> rest of the model) we'll probably end up guiding them in some fashion.
>>
>> [1] 
>> https://lists.w3.org/Archives/Public/public-bioschemas/2017Sep/0013.html
>> [2] http://semanticscience.org/resource/SIO_010043.rdf
>> [3] http://www.pistoiaalliance.org/projects/ontologies-mapping/
>>
>> -- Justin Clark-Casey
>
>
>

Received on Tuesday, 19 September 2017 13:43:53 UTC