W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > March 2014

Re: Issue arising from validating ChEMBL18 VoID

From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
Date: Mon, 10 Mar 2014 11:56:54 +0000
To: Michel Dumontier <michel.dumontier@gmail.com>
CC: "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>, Mark Davies <mdavies@ebi.ac.uk>, Simon Jupp <jupp@ebi.ac.uk>
Message-ID: <F7091455-AD0D-4C54-9A35-EDA96BD6E6B3@hw.ac.uk>

On 6 Mar 2014, at 21:10, Michel Dumontier <michel.dumontier@gmail.com<mailto:michel.dumontier@gmail.com>> wrote:



On Thu, Mar 6, 2014 at 3:59 AM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>> wrote:
Hi All,

In trying to validate the new ChEMBL 17 dataset description I came across the following issues.

  1.  The validator I stood up does not accept standard turtle. This is going to require some effort to fix.

which toolkit are you using to parse it in the first place? maybe you could call a web service like http://www.w3.org/RDF/Validator/

I’m using Eric Prud’hommeaux’s ShapeExpression scripts
http://www.w3.org/2013/ShEx/Primer


  1.  Should Summary Level Descriptions have a created date indicating when the dataset was originally created?

i believe that the consensus was that it would not, and that create date would be associated with version level descriptions (e.g. version 1 = first create date).

This of course assumes that folk are going to create descriptions for all historic versions. For instance, i would be surprised if ChEMBL create descriptions for all 17 (soon to be 18) versions, but rather only provide those for the versions that have RDF, i.e. version 14 onwards, and then the earlier ones probably won’t be updated to the new standard.


  1.  We do not have any statement of requirements for dct:theme and dct:keyword for version and distribution level descriptions.

chembl actually has a void file where they identify their vocabs. i haven't checked whether they associate keywords.

Vocabularies are down at the distribution level which I did not validate. I suspect that they have also done theme’s at the subset level so that they can say that the molecules subset has a theme of chemical molecules.

The bigger question here is how should subsets interact with version and distribution level descriptions?

  1.  We say that a summary level distribution may have a dcat:accessURL. I’m not sure this is correct.

 i think they should too.

Slightly confused, you think the summary level should have a dcat:accessURL or that it shouldn’t.

  1.  Should distribution level descriptions also be typed as dcat:Distributions?

yes. It would also be good to think about tagging our version level description as :VersionedDatasetDescription - can we find a vocabulary home for this?

Tough question. Not sure that any of VoID or DCAT would adopt this as it doesn’t fit with their data models. However, we should ask.

  1.  We’ve not got any details of the sparql endpoint in the table of properties.

right.  the problem is that the representation is complex (not just a predicate-object pair) and really doesn't fit well in that table - unless we enable the specification of void:sparqlendpoint, which we ruled out because its time-dependent nature (that version of the data may not be in the sparql endpoint in the future).

Regardless of the fact that the representation of the object is complex, we should have a row in the table that ensures that folk don’t miss it. When generating description, most developers will use the table as a checklist. Likewise when creating the validator.

I have attached the an updated version of the ChEMBL description. My question is, how should the distribution level description be validated since it is split into several subsets?

i think that <http://rdf.ebi.ac.uk/distribution/rdf/chembl/17.0> is a subset of the versioned dataset
<http://rdf.ebi.ac.uk/dataset/chembl/17.0>, and this then points to the various file-based distributions.

Are ChEMBL Molecules, ChEMBL Targets, etc subsets of a distribution or a version? We could of course argue that they are subsets of the dataset, but then we would have to repeat the versioning information across each subset. I think it makes most sense to split at the version level first, but then I’m wondering if we should do the subsets ahead of the distributions. However, this approach might not make sense for other datasets, does anyone have a counterexample?

Alasdair

m.

Cheers,

Alasdair


Alasdair J G Gray
Lecturer in Computer Science, Heriot-Watt University, UK.
Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Telephone: +44 131 451 3429<tel:%2B44%20131%20451%203429>
Twitter: @gray_alasdair
Arrange a Meeting: http://doodle.com/agray

--

PLEASE NOTE: There may be a delay in me dealing with your email as I am participating in UCU industrial action by ‘working to contract’ in support of the union’s campaign for fair pay in higher education.
For more info go here www.ucu.org.uk/hepay13<http://www.ucu.org.uk/hepay13>





________________________________

Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)

We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders<http://www.hw.ac.uk/researchleaders> for further information and how to apply.

Heriot-Watt University is a Scottish charity registered under charity number SC000278.


Alasdair J G Gray
Lecturer in Computer Science, Heriot-Watt University, UK.
Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Telephone: +44 131 451 3429
Twitter: @gray_alasdair
Arrange a Meeting: http://doodle.com/agray

--

PLEASE NOTE: There may be a delay in me dealing with your email as I am participating in UCU industrial action by ‘working to contract’ in support of the union’s campaign for fair pay in higher education.
For more info go here www.ucu.org.uk/hepay13<http://www.ucu.org.uk/hepay13>






----- 
Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)


We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.
Received on Monday, 10 March 2014 12:09:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:08 UTC