- From: Leyla Garcia <ljgarcia@ebi.ac.uk>
- Date: Fri, 22 Sep 2017 16:39:30 +0100
- To: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <0d8b6f3c-2d09-8a75-10a8-1bccaf1a7e2d@ebi.ac.uk>
Dear all, We presented our BiologicalEntity idea in a poster during ISMB last July and we got some mixed feedback. Although the idea was a good one, the specification had some problems. Particularly, the specification was not flexible enough and the properties seem to be a bit random. Flexible and extensible schemata is key for Bioschemas so during the past week, four of us have been working on changes to BiologicalEntity. We have already shared these changes with attendees to the BioHackathon and this mailing list as well. We have got some good comments. In this email I just extend what we already shared (meaning this is a long mail). * BiologicalEntity does not exist anymore, it has been replaced by PhysicalEntity and Record * PhysicalEntity follows the flexibility initially proposed by the Samples group by reusing the additionalProperty property * LabProtocol is simplified thanks to a change proposed for CreativeWork * PhysicalEntity and Record would be customized by profiles such as Protein, Sample and so on. Properties such as additionalProperty, isContainedIn, contains are expected to be customized by profiles The graph below shows the key points but if you want more detail you can keep reading. *Summary of PhysicalEntity* * PhysicalEntity extends from Thing and reuses **additionalType in order to specify whether it is a protein, sample, phenotype, etc. Ontology terms should be used to point to the corresponding concept (minimum) ** identifier (minimum) ** mainEntityOfPage to link to the corresponding Record on a Dataset ** sameAs to point to any webpage defining this entity ** url to point to the official webpage ** alternateName, description, image are used as described by schema.org. * PhysicalEntity has as own properties ** additionalProperty so any other property can be added. additionalType of the property should be used to better specify the nature of the property, name/description should be use as a label or so for the property, value should be used to point the actual range of this property ** isContainedIn, ** contains, ** location ** hasRepresentation to point to representations other than a Record or an image, for instance it could be a text corresponding to a sequence *Summary of Record* * Record extends from Dataset and reuses ** distribution so we can point to a downloadable version of the Record * Record has as own properties ** additionalProperty that follows the same guidelines as for PhysicalEntity ** seeAlso to link to any related Thing whenever the relation is not so clear by we know it exists (usually cross-references) *Example of PhysicalEntity customization done for the Protein case * * additionalType. minimum, many. Recommended type will probably be "http://semanticscience.org/resource/SIO_010043" * alternateName. optional. For UniProt it would look like ["ABL, "ABL1"] * description. recommended. For Uniprot it would be the protein function * identifier. minimum. For UniProt it would look like "P00519" * image. optional. Probably not used yet by UniProt * mainEntityOfPage. optional. Probably not used by UniProt, we would link from the Record to the PhysicalEntity as it works better for us * name. recommended. For UniProt it would look like "Tyrosine-protein kinase ABL1" * sameAs. optional. * url. recommend. For UniProt it would look like "http://www.uniprot.org/uniprot/P00519" * additionalProperty. optional * additionalProperty/disease-association. recommended. ** additionalType for property probably "http://semanticscience.org/resource/SIO_000983", ** name for property "disease association", ** value types StructuredValue and MedicalCondition, ** additionalType for value probably "http://semanticscience.org/resource/SIO_010299" ** the rest of the properties depend on what the source can actually provide, for instance disease name, disease url, medical code, etc. * additionalProperty/transcribed-gene. minimum. ** additionalType for property probably "http://semanticscience.org/resource/SIO_010081", ** name for property "gene", ** value types StructuredValue and PhysicalEntity, ** additionalType for value probably "http://semanticscience.org/resource/SIO_010035" ** the rest of the properties depend on what the source can actually provide, for instance disease name * isContainedIn. optional * isContainedIn/organism. minimum. ** type would be PhysicalEntity ** additionalType would probably be "http://semanticscience.org/resource/SIO_010000" ** identifier would be taxon ID ** url could be a link to NCBI taxon ** sameAs could be a link to UniProt taxonomy * location. optional. Probably not used for proteins but for protein annotations it could be a FALDO position * hasRepresentation. optional. For instance the protein sequence *Example of Record customization done for the Protein case * * distribution. optional. For UniProt links to FASTA, text, XML, RDF files * additionalType. optional. For UniProt probably "http://purl.uniprot.org/core/Protein" * seeAlso. optional * identifier. minimum. For UniProt it would be like "P00519" * url. recommend/optional? For UniProt it would look like "http://www.uniprot.org/uniprot/P00519" * mainEntity. recommended. For UniProt, all the PhysicalEntity/Protein information will go here * citation, dateCreated, dateModified, datePublished, hasPart, isBasedOn, isBasisFor, isPartOf, keywords, license. optional. Used as needed and depending on the information actually provided by the Dataset containing this record. Regards,
Received on Friday, 22 September 2017 15:39:56 UTC