Re: Fundamental changes on Bioschemas specifications for Life Sciences entities

Looks good Leyla. It really reflects the conversations we had at the
biohackathon.  my outstanding concern is the addition of structuralElement
instead of just reusing hasPart.

m.

On Fri, Sep 22, 2017 at 5:39 PM, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote:

> Dear all,
>
> We presented our BiologicalEntity idea in a poster during ISMB last July
> and we got some mixed feedback. Although the idea was a good one, the
> specification had some problems. Particularly, the specification was not
> flexible enough and the properties seem to be a bit random. Flexible and
> extensible schemata is key for Bioschemas so during the past week, four of
> us have been working on changes to BiologicalEntity. We have already shared
> these changes with attendees to the BioHackathon and this mailing list as
> well. We have got some good comments. In this email I just extend what we
> already shared (meaning this is a long mail).
>
> * BiologicalEntity does not exist anymore, it has been replaced by
> PhysicalEntity and Record
> * PhysicalEntity follows the flexibility initially proposed by the Samples
> group by reusing the additionalProperty property
> * LabProtocol is simplified thanks to a change proposed for CreativeWork
> * PhysicalEntity  and Record would be customized by profiles such as
> Protein, Sample and so on. Properties such as additionalProperty,
> isContainedIn, contains are expected to be customized by profiles
>
> The graph below shows the key points but if you want more detail you can
> keep reading.
>
> *Summary of PhysicalEntity*
>
> * PhysicalEntity extends from Thing and reuses
>     **additionalType in order to specify whether it is a protein, sample,
> phenotype, etc. Ontology terms should be used to point to the corresponding
> concept (minimum)
>     ** identifier (minimum)
>     ** mainEntityOfPage to link to the corresponding Record on a Dataset
>     ** sameAs to point to any webpage defining this entity
>     ** url to point to the official webpage
>     ** alternateName, description, image are used as described by
> schema.org.
> * PhysicalEntity has as own properties
>     ** additionalProperty so any other property can be added.
> additionalType of the property should be used to better specify the nature
> of the property, name/description should be use as a label or so for the
> property, value should be used to point the actual range of this property
>     ** isContainedIn,
>     ** contains,
>     ** location
>     ** hasRepresentation to point to representations other than a Record
> or an image, for instance it could be a text corresponding to a sequence
>
> *Summary of Record*
>
> * Record extends from Dataset and reuses
>     ** distribution so we can point to a downloadable version of the Record
> * Record has as own properties
>     ** additionalProperty that follows the same guidelines as for
> PhysicalEntity
>     ** seeAlso to link to any related Thing whenever the relation is not
> so clear by we know it exists (usually cross-references)
>
>
> *Example of PhysicalEntity customization done for the Protein case *
>
> * additionalType. minimum, many. Recommended type will probably be
> "http://semanticscience.org/resource/SIO_010043"
> <http://semanticscience.org/resource/SIO_010043>
> * alternateName. optional. For UniProt it would look like ["ABL, "ABL1"]
> * description. recommended. For Uniprot it would be the protein function
> * identifier. minimum. For UniProt it would look like "P00519"
> * image. optional. Probably not used yet by UniProt
> * mainEntityOfPage. optional. Probably not used by UniProt, we would link
> from the Record to the PhysicalEntity as it works better for us
> * name. recommended. For UniProt it would look like "Tyrosine-protein
> kinase ABL1"
> * sameAs. optional.
> * url. recommend. For UniProt it would look like "http://www.uniprot.org/
> uniprot/P00519" <http://www.uniprot.org/uniprot/P00519>
> * additionalProperty. optional
> * additionalProperty/disease-association. recommended.
>     ** additionalType for property probably "http://semanticscience.org/
> resource/SIO_000983" <http://semanticscience.org/resource/SIO_000983>,
>     ** name for property "disease association",
>     ** value types StructuredValue and MedicalCondition,
>     ** additionalType for value probably "http://semanticscience.org/
> resource/SIO_010299"
>     ** the rest of the properties depend on what the source can actually
> provide, for instance disease name, disease url, medical code, etc.
> * additionalProperty/transcribed-gene. minimum.
>     ** additionalType for property probably "http://semanticscience.org/
> resource/SIO_010081" <http://semanticscience.org/resource/SIO_010081>,
>     ** name for property "gene",
>     ** value types StructuredValue and PhysicalEntity,
>     ** additionalType for value probably "http://semanticscience.org/
> resource/SIO_010035"
>     ** the rest of the properties depend on what the source can actually
> provide, for instance disease name
> * isContainedIn. optional
> * isContainedIn/organism. minimum.
>     ** type would be PhysicalEntity
>     ** additionalType would probably be "http://semanticscience.org/
> resource/SIO_010000" <http://semanticscience.org/resource/SIO_010000>
>     ** identifier would be taxon ID
>     ** url could be a link to NCBI taxon
>     ** sameAs could be a link to UniProt taxonomy
> * location. optional. Probably not used for proteins but for protein
> annotations it could be a FALDO position
> * hasRepresentation. optional. For instance the protein sequence
>
> *Example of Record customization done for the Protein case *
> * distribution. optional. For UniProt links to FASTA, text, XML, RDF files
> * additionalType. optional. For UniProt probably
> "http://purl.uniprot.org/core/Protein"
> <http://purl.uniprot.org/core/Protein>
> * seeAlso. optional
> * identifier. minimum. For UniProt it would be like "P00519"
> * url. recommend/optional? For UniProt it would look like
> "http://www.uniprot.org/uniprot/P00519"
> <http://www.uniprot.org/uniprot/P00519>
> * mainEntity. recommended. For UniProt, all the PhysicalEntity/Protein
> information will go here
> * citation, dateCreated, dateModified, datePublished, hasPart, isBasedOn,
> isBasisFor, isPartOf, keywords, license. optional. Used as needed and
> depending on the information actually provided by the Dataset containing
> this record.
>
> Regards,
>



-- 
Michel Dumontier
Distinguished Professor of Data Science
Maastricht University
http://dumontierlab.com

Received on Friday, 22 September 2017 15:51:50 UTC