Fundamental changes on Bioschemas specifications for Life Sciences entities

Dear all,

We presented our BiologicalEntity idea in a poster during ISMB last July 
and we got some mixed feedback. Although the idea was a good one, the 
specification had some problems. Particularly, the specification was not 
flexible enough and the properties seem to be a bit random. Flexible and 
extensible schemata is key for Bioschemas so during the past week, four 
of us have been working on changes to BiologicalEntity. We have already 
shared these changes with attendees to the BioHackathon and this mailing 
list as well. We have got some good comments. In this email I just 
extend what we already shared (meaning this is a long mail).

* BiologicalEntity does not exist anymore, it has been replaced by 
PhysicalEntity and Record
* PhysicalEntity follows the flexibility initially proposed by the 
Samples group by reusing the additionalProperty property
* LabProtocol is simplified thanks to a change proposed for CreativeWork
* PhysicalEntity  and Record would be customized by profiles such as 
Protein, Sample and so on. Properties such as additionalProperty, 
isContainedIn, contains are expected to be customized by profiles

The graph below shows the key points but if you want more detail you can 
keep reading.

*Summary of PhysicalEntity*

* PhysicalEntity extends from Thing and reuses
     **additionalType in order to specify whether it is a protein, 
sample, phenotype, etc. Ontology terms should be used to point to the 
corresponding concept (minimum)
     ** identifier (minimum)
     ** mainEntityOfPage to link to the corresponding Record on a Dataset
     ** sameAs to point to any webpage defining this entity
     ** url to point to the official webpage
     ** alternateName, description, image are used as described by 
schema.org.
* PhysicalEntity has as own properties
     ** additionalProperty so any other property can be added. 
additionalType of the property should be used to better specify the 
nature of the property, name/description should be use as a label or so 
for the property, value should be used to point the actual range of this 
property
     ** isContainedIn,
     ** contains,
     ** location
     ** hasRepresentation to point to representations other than a 
Record or an image, for instance it could be a text corresponding to a 
sequence

*Summary of Record*

* Record extends from Dataset and reuses
     ** distribution so we can point to a downloadable version of the Record
* Record has as own properties
     ** additionalProperty that follows the same guidelines as for 
PhysicalEntity
     ** seeAlso to link to any related Thing whenever the relation is 
not so clear by we know it exists (usually cross-references)

*Example of PhysicalEntity customization done for the Protein case
*

* additionalType. minimum, many. Recommended type will probably be 
"http://semanticscience.org/resource/SIO_010043"
* alternateName. optional. For UniProt it would look like ["ABL, "ABL1"]
* description. recommended. For Uniprot it would be the protein function
* identifier. minimum. For UniProt it would look like "P00519"
* image. optional. Probably not used yet by UniProt
* mainEntityOfPage. optional. Probably not used by UniProt, we would 
link from the Record to the PhysicalEntity as it works better for us
* name. recommended. For UniProt it would look like "Tyrosine-protein 
kinase ABL1"
* sameAs. optional.
* url. recommend. For UniProt it would look like 
"http://www.uniprot.org/uniprot/P00519"
* additionalProperty. optional
* additionalProperty/disease-association. recommended.
     ** additionalType for property probably 
"http://semanticscience.org/resource/SIO_000983",
     ** name for property "disease association",
     ** value types StructuredValue and MedicalCondition,
     ** additionalType for value probably 
"http://semanticscience.org/resource/SIO_010299"
     ** the rest of the properties depend on what the source can 
actually provide, for instance disease name, disease url, medical code, etc.
* additionalProperty/transcribed-gene. minimum.
     ** additionalType for property probably 
"http://semanticscience.org/resource/SIO_010081",
     ** name for property "gene",
     ** value types StructuredValue and PhysicalEntity,
     ** additionalType for value probably 
"http://semanticscience.org/resource/SIO_010035"
     ** the rest of the properties depend on what the source can 
actually provide, for instance disease name
* isContainedIn. optional
* isContainedIn/organism. minimum.
     ** type would be PhysicalEntity
     ** additionalType would probably be 
"http://semanticscience.org/resource/SIO_010000"
     ** identifier would be taxon ID
     ** url could be a link to NCBI taxon
     ** sameAs could be a link to UniProt taxonomy
* location. optional. Probably not used for proteins but for protein 
annotations it could be a FALDO position
* hasRepresentation. optional. For instance the protein sequence

*Example of Record customization done for the Protein case *

* distribution. optional. For UniProt links to FASTA, text, XML, RDF files
* additionalType. optional. For UniProt probably 
"http://purl.uniprot.org/core/Protein"
* seeAlso. optional
* identifier. minimum. For UniProt it would be like "P00519"
* url. recommend/optional? For UniProt it would look like 
"http://www.uniprot.org/uniprot/P00519"
* mainEntity. recommended. For UniProt, all the PhysicalEntity/Protein 
information will go here
* citation, dateCreated, dateModified, datePublished, hasPart, 
isBasedOn, isBasisFor, isPartOf, keywords, license. optional. Used as 
needed and depending on the information actually provided by the Dataset 
containing this record.

Regards,

Received on Friday, 22 September 2017 15:39:56 UTC