- From: Dominique Batista <dominique.batista@france-bioinformatique.fr>
- Date: Tue, 19 Sep 2017 15:43:29 +0200
- To: <public-bioschemas@w3.org>
- Message-ID: <028a5201-87b9-46ef-48ee-31c32d10864e@france-bioinformatique.fr>
Hello, I'm currently building a validator which I will present in October. I've fully dealt with EDAM ontology terms but wonder how to validate terms from other ontologies too. This is what I did: The specification is loaded as a JSON file containing the fields constrains [1]. Several fields are under ontology constrains which can either be static (a few terms from a list) or dynamic (several tables ect ...). These constrains are implemented using the json-schema "enum" property. It can contain either a static vocabulary or a list of allowed ontologies and the table the terms should be referring to (example: "enum" : ["EDAM/FORMAT", "OTHER_ONTOLOGY/TARGET_TYPE"]) [2]. Inputs and outputs should always be pointing to an ontology term. To do this we use the potentialAction field of SoftwareApplication (which expects a ControlAction input). It can take up to three object variables, from which two are under vocabulary constrains. R*esult* and *object* fields require a Dataset object to which we added the *additionalType* field. This field expects either an URL or a string which should indicate the selected ontology, table and term. It will decompose this string/URL and verify if the ontology is supported, if the target table/type is the expected one and finally if the term exists. Example: "potentialAction": { "@type": "ControlAction", "object": { "@type": "Dataset", "additionalType": "http://edamontology.org/format_3749" }, "result": [ { "@type": "Dataset", "additionalType": "http://edamontology.org/format_2331" }, { "@type" : "Dataset", "additionalType" : "http://target_ontology.com/target_table/target_ID" } } To do that, and in regards to EDAM, we use the Biosphere API [3] which allows us to retrieve a term string name based on it's ID, or the ID based on the term, as a JSON variable. This way, we can decompose a given EDAM URL, pass the table and ID variables to the API and check that the term actually exists. So, whichever ontology users wish to use, we need, at some point, to be able to programmatically validate the terms (is the ontology allowed, does the term exist and is it of the expected type? ). This is a downside because we will need a new term validator for every single ontology we wish to be compatible with. So the question is, should we be compatible with every ontology out there (in which case I need to find new solutions to deal with this constrain) or should we only have predefined sets of compatible ontologies (in which case I will put efforts into putting up new term validators) ? In regards to the issue of how to interpret same concepts among different ontologies, it probably won't impact the validator as long as a term can be programmatically validated. However, crawlers will be deeply impacted as their databases need to be aware somehow that terms from different ontologies may represent the same concept. All data consumers will be somehow impacted by this choice. [1] Following the http://json-schema.org/ standards. [2] For the complete Tool specification: https://github.com/terazus/bioschemas_validator_drupal7/blob/master/specs/default/softwareapplication/softwareapplication.json [3] https://biosphere.france-bioinformatique.fr/edamontology/table/id/?media=json (replace table and id with your input). Hope this can help, Best regards, Dominique Batista Web Engineer IFB Website: https://www.france-bioinformatique.fr On 19/09/17 14:39, Leyla Garcia wrote: > Hi, > > This is an important question that goes beyond the specification. Does > maybe the governance/tools groups have a suggestion here? Whatever is > decided, will impact how the validation and other tools interpret the > specification. > > PhisycalEntity profiles, such as Protein, could, I guess, > suggest/require a particular type. How should that be modeled? And, > should profiles restrict this type to only one predefined value? > > Regards, > > On 19/09/2017 12:31, Justin Clark-Casey wrote: >> Hi all, >> >> From [1], I see that we are now proposing that >> PhysicalEntity.additionalType (via Thing) show the type of entity in >> Bioschemas by pointing to an URL for an ontology term. This replaces >> BiologicalEntity,biologicalType, which used a controlled vocabulary >> of strings "gene", "phenotype", etc. Leyla gives an example from the >> semantic science ontology [2]. >> >> Are we planning to recommend particular ontologies for >> additionalType? Or are we expecting search engines to use ontology >> mappings (e.g. [3]) to handle cases where people use different >> ontologies for the same concept? >> >> In InterMine we have the particular use case that any user can extend >> our provided data model with new classes. Since we need to make term >> selection as easy as possible (and preferably consistent with the >> rest of the model) we'll probably end up guiding them in some fashion. >> >> [1] >> https://lists.w3.org/Archives/Public/public-bioschemas/2017Sep/0013.html >> [2] http://semanticscience.org/resource/SIO_010043.rdf >> [3] http://www.pistoiaalliance.org/projects/ontologies-mapping/ >> >> -- Justin Clark-Casey > > >
Received on Tuesday, 19 September 2017 13:43:53 UTC