W3C home > Mailing lists > Public > public-bioschemas@w3.org > October 2017

ShEx for proteins

From: Leyla Garcia <ljgarcia@ebi.ac.uk>
Date: Mon, 23 Oct 2017 16:52:16 +0100
To: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Message-ID: <d0e40ff4-682c-6108-0823-a71c99b559db@ebi.ac.uk>
Hi all,

I attach here a ShEx schema to validate a protein entity. This is what I 
want a protein to have:

* preferrelLabel: "Protein",
* additionalType: at least one URL
* identifier: at least one Text or URL or PropertyValue
* name: at least one Text
* isContainedIn: at least one URL or BioChemEntity
* additionalProperty: at least one for the transcribed gene
* additionalProperty: zero or more of any other kind

A complaint protein is available in ProteinEntity.ttl. So far, with ShEx 
I have managed to require:

* preferrelLabel: "Protein",
* additionalType: at least one IRI
* identifier: at least one Text or URL or PropertyValue or string or IRI
* name: at least one Text or string
* isContainedIn: at least one IRI or blank node
* additionalProperty: at least one of transcribed gene or IRI or blank node

Some constraints are still missing and I would appreciate any help with 
them. Not sure if they can be expressed in ShEx:

* Any object of isContainedIn should be a BioChemEntity (but if all I 
have is an IRI, could/should I add that restriction?)
* At least one additionalProperty for transcribed gene is mandatory, any 
other is optional. Something like
    ( schema:additionalProperty @my:TranscribedFromGene)+ |
    ( schema:additionalProperty IRI |
     schema:additionalProperty BNODE)*

By the way, what I have so far is valid according to 
http://rawgit.com/shexSpec/shex.js/master/doc/shex-simple.html. I just 
tried it for one node [1] (sorry, could not make it shorter).


schema%3A <http%3A%2F%2Fschema.org%2F>%0APREFIX xsd%3A 
<http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23>%0APREFIX my%3A 
<http%3A%2F%2Fmy.example%2F%23>%0A%0Aschema%3ABioChemEntity {%0A 
schema%3ApreferredLabel ["Protein"] %3B%0A schema%3AadditionalType 
IRI%2B %3B%0A%0A (%0A schema%3Aidentifier xsd%3Astring |%0A 
schema%3Aidentifier IRI |%0A schema%3Aidentifier schema%3APropertyValue 
|%0A schema%3Aidentifier schema%3AText |%0A schema%3Aidentifier 
schema%3AURL%0A )%2B %3B%0A (%0A schema%3Aname xsd%3Astring |%0A 
schema%3Aname schema%3AText%0A )%2B %3B%0A%0A (%0A 
schema%3AisContainedIn IRI |%0A schema%3AisContainedIn BNODE%0A )%2B 
%3B%0A%0A (%0A schema%3AadditionalProperty %40my%3ATranscribedFromGene 
|%0A schema%3AadditionalProperty IRI |%0A schema%3AadditionalProperty 
BNODE%0A )%2B%0A}%0A%0Amy%3ATranscribedFromGene {%0A 
schema%3AadditionalType IRI%2B %3B%0A schema%3Aname ["gene"] %3B%0A 
schema%3Avalue %40my%3AGene%0A}%0A%0Amy%3AGene {%0A 
schema%3ApreferredLabel ["Gene"] %3B%0A schema%3AadditionalType IRI%2B 
%3B%0A}&data=%40prefix rdf%3A 
.%0A%40prefix schema%3A <http%3A%2F%2Fschema.org%2F> .%0A%40prefix 
xsd%3A <http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23> 
.%0A%0A<http%3A%2F%2Fwww.uniprot.org%2Funiprot%2FP00519>%0A a 
schema%3ABioChemEntity %3B%0A schema%3ApreferredLabel "Protein" %3B%0A 
<http%3A%2F%2Fsemanticscience.org%2Fresource%2FSIO_010043> %3B%0A%0A 
schema%3Aidentifier "P00519" %3B%0A schema%3Aname "ABL1" %3B%0A%0A 
schema%3AisContainedIn <http%3A%2F%2Fwww.identifiers.org%2Ftaxon%3A9606> 
%3B%0A%0A schema%3AadditionalProperty [%0A a schema%3APropertyValue 
%3B%0A schema%3AadditionalType 
<http%3A%2F%2Fsemanticscience.org%2Fresource%2FSIO_010081> %3B%0A 
schema%3Aname "gene" %3B%0A schema%3Avalue [%0A a 
schema%3AStructuredValue%2C schema%3ABioChemEntity %3B%0A 
schema%3ApreferredLabel "Gene" %3B%0A schema%3AadditionalType 
<http%3A%2F%2Fsemanticscience.org%2Fresource%2FSIO_010035> %3B%0A 
schema%3Aidentifier "ABL1" %3B%0A schema%3Aname "ABL1"%0A ]%0A ]%2C [%0A 
a schema%3APropertyValue %3B%0A schema%3AadditionalType 
<http%3A%2F%2Fsemanticscience.org%2Fresource%2FSIO_000983> %3B%0A 
schema%3Aname "disease association" %3B%0A schema%3Avalue [%0A a 
schema%3AStructuredValue%2C schema%3AMedicalCondition %3B%0A 
<http%3A%2F%2Fsemanticscience.org%2Fresource%2FSIO_010299> %3B%0A 
schema%3Acode [%0A a schema%3AMedicalCode %3B%0A schema%3Acode "608232" 
%3B%0A schema%3AcodingSystem "OMIM"%0A ] %3B%0A schema%3Aname 
"Leukemia%2C chronic myeloid (CML)" %3B%0A schema%3AsameAs 
<http%3A%2F%2Fwww.uniprot.org%2Fdiseases%2FDI-03735>%0A ]%0A 
]%0A.%0A%0A<http%3A%2F%2Fwww.identifiers.org%2Ftaxon%3A9606>%0A a 
schema%3ABioChemEntity %3B%0A schema%3Aidentifier "9606" %3B%0A 
schema%3Aname "Homo sapiens" %3B%0A schema%3AsameAs 
<http%3A%2F%2Fpurl.uniprot.org%2Ftaxonomy%2F9606> %3B%0A schema%3Aurl 

Received on Monday, 23 October 2017 15:52:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:07:59 UTC