- From: Ahmad Assaf <ahmad.a.assaf@gmail.com>
- Date: Thu, 9 Feb 2017 03:30:02 -0800
- To: public-lod@w3.org, Franck Michel <franck.michel@cnrs.fr>
- Message-ID: <CAAxij-JQxjN12-Vi21D=+TAmKFkuaGxr92w0OhK+UPp5RudwAg@mail.gmail.com>
Dear Franck, As part of my PhD thesis, i have developed a command-line tool to assess datasets on data portal and measure their objective quality and their alignment with LOD publishing guidelines by assessing the quality of the attached metadata. The tool works by plugging into an online data portal but am sure you can work around that and make it assess a locally available dataset. The tool can be found here: https://github.com/ahmadassaf/OpenData-Checker and the relevant publication in: http://ceur-ws.org/Vol-1362/PROFILES2015_paper1.pdf A sample of the report generated for example when running on the lodcloud group on datahub.io will be: ================================================================================ Metadata Report ================================================================================ [259] group information is missing. Check organization information as they can be mixed sometimes [133] maintainer field exists but there is no value defined [143] maintainer_email field exists but there is no value defined [6] author field exists but there is no value defined [39] author_email field exists but there is no value defined [156] version field exists but there is no value defined [44] The url defined for this dataset is not reachable ! [28] organization_image_url field exists but there is no value defined [34] notes field exists but there is no value defined [1] Tags information [Tags, Vocabularies] is missing [224] resources information (API endpoints, downloadable dumpds, etc.) is missing [11] author_email is not a valid e-mail address ! [6] maintainer_email is not a valid e-mail address ! [3] organization_description field exists but there is no value defined [3] url field exists but there is no value defined [2] The organization image_url defined for this dataset is not reachable ! ================================================================================ Dataset Statistics ================================================================================ There is a total of: 259 [missing] group fields 100.00% There is one [missing] tag field 0.39% There is a total of: 224 [missing] resources fields 86.49% There is a total of: 133 [undefined] maintainer fields 51.35% There is a total of: 143 [undefined] maintainer_email fields 55.21% There is a total of: 6 [undefined] author fields 2.32% There is a total of: 39 [undefined] author_email fields 15.06% There is a total of: 156 [undefined] version fields 60.23% There is a total of: 28 [undefined] organization_image_url fields 10.81% There is a total of: 34 [undefined] notes fields 13.13% There is a total of: 3 [undefined] organization_description fields 1.16% There is a total of: 3 [undefined] url fields 1.16% ================================================================================ Dataset Connectivity Issues ================================================================================ There are 44 connectivity issues with the following URLs: - http://id.loc.gov/authorities/ - http://libris.kb.se - http://www.london-gazette.co.uk/mashup/gazettesdata.htm ....... I have not maintained the tool in a while but will be more than glad to help if you decided to use it. I hope this helps. On 9 February 2017 at 10:24:58 am, Franck Michel (franck.michel@cnrs.fr) wrote: Dear list members, I'm in the process of compiling a dataset on datahub.io and I'd like to assess its compliance with LOD guidelines < https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation>. It is advised to use the dataset validation <http://validator.lod-cloud.net/index.php>, however I'm not sure whether this portal is maintained: the search form keeps returning the same page about compliance levels, but does not give data about any specific dataset, including those already listed. The validator does not mention any contact information; would you know to whom I should report this issue? Does anyone have experience in using it, and more generally in publishing a dataset in the LOD? Thanks for your help, Franck. Best Regards *AHMAD ASSAF, PhD* Senior Data Scientist, Beamery ahmad.a.assaf@gmail.com | ahmad@beamery.com * http://ahmadassaf.com <http://ahmadassaf.com/> | http://ahmadassaf.com/blog <http://ahmadassaf.com/blog> * Twitter <http://twitter.com/ahmadaassaf> | Google+ <http://plus.google.com/112890166770582228940> | Linkedin <http://www.linkedin.com/in/ahmadassaf> | Facebook <https://www.facebook.com/simplytech> | Github <http://github.com/ahmadassaf> 144 Octavia House 213 Townmead Road SW6 2FJ London, UK Mobile: +4 (0) 7429204600 <%2B33%20%280%296%2095436614> See who we know in common <http://www.linkedin.com/e/wwk/55170065/?hs=false&tok=3A-7Bzs1wOP641>
Received on Thursday, 9 February 2017 14:56:28 UTC