W3C home > Mailing lists > Public > public-lod@w3.org > February 2017

Re: Support for the Data Hub LOD Datasets

From: Ahmad Assaf <ahmad.a.assaf@gmail.com>
Date: Thu, 9 Feb 2017 03:30:02 -0800
Message-ID: <CAAxij-JQxjN12-Vi21D=+TAmKFkuaGxr92w0OhK+UPp5RudwAg@mail.gmail.com>
To: public-lod@w3.org, Franck Michel <franck.michel@cnrs.fr>
Dear Franck,

As part of my PhD thesis, i have developed a command-line tool to assess
datasets on data portal and measure their objective quality and their
alignment with LOD publishing guidelines by assessing the quality of the
attached metadata.

The tool works by plugging into an online data portal but am sure you can
work around that and make it assess a locally available dataset.

The tool can be found here: https://github.com/ahmadassaf/OpenData-Checker and
the relevant publication in:
http://ceur-ws.org/Vol-1362/PROFILES2015_paper1.pdf

A sample of the report generated for example when running on the lodcloud
group on datahub.io will be:

================================================================================
                              Metadata Report
================================================================================
[259] group information is missing. Check organization information as they
can be mixed sometimes
[133] maintainer field exists but there is no value defined
[143] maintainer_email field exists but there is no value defined
[6] author field exists but there is no value defined
[39] author_email field exists but there is no value defined
[156] version field exists but there is no value defined
[44] The url defined for this dataset is not reachable !
[28] organization_image_url field exists but there is no value defined
[34] notes field exists but there is no value defined
[1] Tags information [Tags, Vocabularies] is missing
[224] resources information (API endpoints, downloadable dumpds, etc.) is
missing
[11] author_email is not a valid e-mail address !
[6] maintainer_email is not a valid e-mail address !
[3] organization_description field exists but there is no value defined
[3] url field exists but there is no value defined
[2] The organization image_url defined for this dataset is not reachable !
================================================================================
                              Dataset Statistics
================================================================================
There is a total of: 259 [missing] group fields  100.00%
There is one [missing] tag field  0.39%
There is a total of: 224 [missing] resources fields  86.49%
There is a total of: 133 [undefined] maintainer fields  51.35%
There is a total of: 143 [undefined] maintainer_email fields  55.21%
There is a total of: 6 [undefined] author fields  2.32%
There is a total of: 39 [undefined] author_email fields  15.06%
There is a total of: 156 [undefined] version fields  60.23%
There is a total of: 28 [undefined] organization_image_url fields  10.81%
There is a total of: 34 [undefined] notes fields  13.13%
There is a total of: 3 [undefined] organization_description fields  1.16%
There is a total of: 3 [undefined] url fields  1.16%
================================================================================
                              Dataset Connectivity Issues
================================================================================
There are 44 connectivity issues with the following URLs:
   - http://id.loc.gov/authorities/
   - http://libris.kb.se
   - http://www.london-gazette.co.uk/mashup/gazettesdata.htm
.......

I have not maintained the tool in a while but will be more than glad to
help if you decided to use it.

I hope this helps.


On 9 February 2017 at 10:24:58 am, Franck Michel (franck.michel@cnrs.fr)
wrote:

Dear list members,

I'm in the process of compiling a dataset on datahub.io and I'd like to
assess its compliance with LOD guidelines
<
https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation>.


It is advised to use the dataset validation
<http://validator.lod-cloud.net/index.php>, however I'm not sure whether
this portal is maintained: the search form keeps returning the same page
about compliance levels, but does not give data about any specific
dataset, including those already listed.

The validator does not mention any contact information; would you know
to whom I should report this issue?
Does anyone have experience in using it, and more generally in
publishing a dataset in the LOD?

Thanks for your help,
Franck.









Best Regards

*AHMAD ASSAF, PhD*
Senior Data Scientist, Beamery
ahmad.a.assaf@gmail.com   | ahmad@beamery.com

* http://ahmadassaf.com <http://ahmadassaf.com/>
  | http://ahmadassaf.com/blog <http://ahmadassaf.com/blog>   *
Twitter <http://twitter.com/ahmadaassaf>  | Google+
<http://plus.google.com/112890166770582228940>  | Linkedin
<http://www.linkedin.com/in/ahmadassaf>  | Facebook
<https://www.facebook.com/simplytech>  | Github
<http://github.com/ahmadassaf>   144 Octavia House
213 Townmead Road
SW6 2FJ London, UK
Mobile: +4 (0) 7429204600 <%2B33%20%280%296%2095436614>

See who we know in common
<http://www.linkedin.com/e/wwk/55170065/?hs=false&tok=3A-7Bzs1wOP641>
Received on Thursday, 9 February 2017 14:56:28 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 23 February 2017 16:49:10 UTC