W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > March 2014

Re: Issue arising from validating ChEMBL18 VoID

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Thu, 6 Mar 2014 13:10:39 -0800
Message-ID: <CALcEXf4PBumc9AifM=wOv9mZGZbtEASZ8wtXeJAYyspWVDwzjQ@mail.gmail.com>
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Cc: HCLS IG <public-semweb-lifesci@w3.org>, Mark Davies <mdavies@ebi.ac.uk>, Simon Jupp <jupp@ebi.ac.uk>
On Thu, Mar 6, 2014 at 3:59 AM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>wrote:

>  Hi All,
>
>  In trying to validate the new ChEMBL 17 dataset description I came
> across the following issues.
>
>    1. The validator I stood up does not accept standard turtle. This is
>    going to require some effort to fix.
>
> which toolkit are you using to parse it in the first place? maybe you
could call a web service like http://www.w3.org/RDF/Validator/


>
>    1. Should Summary Level Descriptions have a created date indicating
>    when the dataset was originally created?
>
> i believe that the consensus was that it would not, and that create date
would be associated with version level descriptions (e.g. version 1 = first
create date).


>
>    1. We do not have any statement of requirements for dct:theme and
>    dct:keyword for version and distribution level descriptions.
>
> chembl actually has a void file where they identify their vocabs. i
haven't checked whether they associate keywords.

>
>    1. We say that a summary level distribution may have a dcat:accessURL.
>    I'm not sure this is correct.
>
>  i think they should too.

>
>    1. Should distribution level descriptions also be typed as
>    dcat:Distributions?
>
> yes. It would also be good to think about tagging our version level
description as :VersionedDatasetDescription - can we find a vocabulary home
for this?

>
>    1. We've not got any details of the sparql endpoint in the table of
>    properties.
>
>
> right.  the problem is that the representation is complex (not just a
predicate-object pair) and really doesn't fit well in that table - unless
we enable the specification of void:sparqlendpoint, which we ruled out
because its time-dependent nature (that version of the data may not be in
the sparql endpoint in the future).


>  I have attached the an updated version of the ChEMBL description. My
> question is, how should the distribution level description be validated
> since it is split into several subsets?
>
> i think that <http://rdf.ebi.ac.uk/distribution/rdf/chembl/17.0> is a
subset of the versioned dataset
<http://rdf.ebi.ac.uk/dataset/chembl/17.0>, and this then points to the
various file-based distributions.

m.


>  Cheers,
>
>  Alasdair
>
>
>  Alasdair J G Gray
>  Lecturer in Computer Science, Heriot-Watt University, UK.
> Email: A.J.G.Gray@hw.ac.uk
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Telephone: +44 131 451 3429
> Twitter: @gray_alasdair
> Arrange a Meeting: http://doodle.com/agray
>
>  --
>
>  PLEASE NOTE: There may be a delay in me dealing with your email as I
> am participating in UCU industrial action by 'working to contract' in
> support of the union's campaign for fair pay in higher education.
> For more info go here www.ucu.org.uk/hepay13
>
>
>
>
>
> ------------------------------
>
>  Sunday Times Scottish University of the Year 2011-2013
> Top in the UK for student experience
> Fourth university in the UK and top in Scotland (National Student Survey
> 2012)
>
>  We invite research leaders and ambitious early career researchers to join
> us in leading and driving research in key inter-disciplinary themes. Please
> see www.hw.ac.uk/researchleaders for further information and how to
> apply.
>
>  Heriot-Watt University is a Scottish charity registered under charity
> number SC000278.
>
Received on Thursday, 6 March 2014 21:11:34 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:08 UTC