- From: Michel Dumontier <michel.dumontier@gmail.com>
- Date: Thu, 22 Jun 2017 15:32:52 +0200
- To: John McCrae <john@mccr.ae>
- Cc: "public-lod@w3.org" <public-lod@w3.org>, Andrejs Abele <andrejs.abele@insight-centre.org>
- Message-ID: <CALcEXf4LDbgkJPphagTx7GrPhcnyj1RnMfs4MDdb0EO1xxD0DQ@mail.gmail.com>
Can somebody fix the validation tool? That would be very helpful. m. On Thu, Jun 22, 2017 at 3:18 PM, John McCrae <john@mccr.ae> wrote: > Hi all, > > Thank you for your suggestions, apologies that we have taken so long to > reply we have been quite busy <http://ldk2017.org>. Firstly due to > unforeseen circumstances, we have had to delayed the next diagram until > later in July. > > While Datahub is certainly not an ideal tool for collecting metadata about > LOD datasets, it functions well enough for most people and has a big > advantage in that most of the data we need is already there. > > We are currently working on improving the metadata procedure behind the > LOD Cloud Diagram and in fact our goal is to eliminate the need for data > providers to inform us about their datasets. Instead, we are planning to > extract the topic, size and links in a dataset from the available data and > discover new datasets by crawling (this sounds easier than it is!). As > such, our plans for the short term still involve using Datahub as our UI > and data collection point, but hopefully to an increasingly lesser extent > as we automate more of the process of generating the metadata required for > the diagram. > > Regards, > John P. McCrae > > On Fri, Jun 16, 2017 at 12:57 PM, Sarven Capadisli <info@csarven.ca> > wrote: > >> On 2017-06-16 13:05, Andrejs Abele wrote: >> > ** >> > >> > *Hi everyone,* >> > >> > * >> > >> > >> > >> > We want to address some of the comments and suggestions mentioned in >> > this thread. >> > >> > >> > >> > While we are keen to use metadata from resource publishers wherever >> > possible, we still require that your resource is listed in Datahub as we >> > do not have the capability to crawl the Web looking for VoID >> > descriptions at this moment. >> > >> > * >> > >> > In DataHub there is an option to upload VoID descriptions. Please >> > upload here and we will attempt to extract the metadata from the >> > VoID file. >> > >> > * >> > >> > For VoID file to be useful, it would have to contain >> > "dcterms:subject" property that describes the topics contained in >> > the dataset and "void:Linkset" describing links to other datasets >> > and number of triples linking to said dataset. The target of any >> > linkset must correspond to a VoID file listed in Datahub; e.g., to >> > link to European Nature Information System (EUNIS) you should link >> > to the VoID description listed at Datahub: >> > http://eunis.eea.europa.eu/void.rdf. >> > >> > * >> > >> > For now, if you have VoID file that contains this information, and >> > for any reason you don't want to publish it on DataHub, you can send >> > it to us and we will add it to our system. >> > >> > >> > >> > We are aware that there is a dataset validation tool >> > (http://validator.lod-cloud.net/) available, but we are not currently >> > maintaining it our using it to check the validity of a dataset. If you >> > are unsure as to whether your Datahub record is suitable please email us >> > and we will check that it appears correctly in the diagram.: >> > >> > >> > >> > As there have been multiple requests, we will postpone the generation >> > of the diagram till next weekend (24.06.17) >> > >> > >> > >> > >> > >> > Kind regards, >> > >> > Andrejs Abele and John P. McCrae >> > >> > >> > >> > Unit for Natural Language Processing >> > >> > Insight Centre for Data Analytics >> > >> > National University of Ireland Galway >> > >> > https://nuig.insight-centre.org/unlp/people/members/andrejs-abele/ >> > >> > http://john.mccr.ae* >> > >> > >> > On 14/06/17 10:45, Victor de Boer @ VU wrote: >> >> Dear Andrejs & John, >> >> >> >> Indeed, we would really like to see our GTAA thesaurus >> >> (https://datahub.io/dataset/gemeenschappelijke-thesaurus-aud >> iovisuele-archieven) >> >> in the cloud diagram. The validator tool gives a number of >> >> non-compliance messages. >> >> Missing URL. Please provide an URL for the data set. >> >> Missing authorship. Please provide the name of publishing org and/or >> >> person using the CKAN field Author. It is important to know who >> >> created this data set. >> >> Missing lod tag. Please tag the data set with lod. >> >> Missing contact email. Please provide a contact email using the CKAN >> >> field Author email or Maintainer email. It is important to know who to >> >> contact if there are errors or missing dataset descriptions. >> >> >> >> However, looking at the Datahub metadata, as far as I can see, all the >> >> required fields are there (URL, Author, links, etc) >> >> >> >> Any idea how we can make sure that the GTAA (and other datasets) will >> >> appear in the new diagram? >> >> >> >> thanks! >> >> --victor >> >> >> >> RE: >> >> Dear Andrejs & John, >> >> >> >> Great that you guys are committed in this task. >> >> >> >> Just a remark: before a dataset is considered in the LOD cloud, it >> >> must comply with the guidelines described >> >> at https://www.w3.org/wiki/TaskForces/CommunityProjects/Linking >> OpenData/DataSets/CKANmetainformation >> >> <https://www.w3.org/wiki/TaskForces/CommunityProjects/Linkin >> gOpenData/DataSets/CKANmetainformation>. >> >> The document refers to the dataset validation tool >> >> (http://validator.lod-cloud.net/ <http://validator.lod-cloud.net/>) to >> >> figure out the "completeness level", that is why a given dataset >> >> published on datahub.io <http://datahub.io/> does or does not comply >> >> with those guidelines. >> >> >> >> As I already mentioned on this list >> >> (https://lists.w3.org/Archives/Public/public-lod/2017Feb/0001.html >> >> <https://lists.w3.org/Archives/Public/public-lod/2017Feb/0001.html>), >> >> this tool seems buggy: the search form keeps returning the same page >> >> about compliance levels, but does not give data about any specific >> >> dataset, including those already listed. >> >> >> >> As a result, I would assume that existing datasets are not included in >> >> the LOD cloud just because publishers don' t know what needs to be >> fixed. >> >> >> >> Are you aware of this issue? Do you know who could to contact? >> >> >> >> Thx, >> >> Franck. >> >> >> >> Le 12/06/2017 à 18:19, Andrejs Abele a écrit : >> >>> >> >>> Hi everyone, >> >>> >> >>> >> >>> >> >>> The Linked Open Data Cloud Diagram (http://lod-cloud.net >> >>> <http://lod-cloud.net/>) is one of the most visible tools in our >> >>> community and we at the Insight Centre for Data Analytics have >> >>> committed to providing regular updates to this diagram. >> >>> >> >>> >> >>> >> >>> We are planing to generate the next version of the LOD cloud diagram, >> >>> at the end of this week (17.06.17) >> >>> >> >>> >> >>> >> >>> In order to help us best reflect the true state of the Linked Open >> >>> Data Cloud, please update your resource description in DataHub.io >> >>> (https://datahub.io <https://datahub.io/>) based on guidelines below >> >>> by this Friday. >> >>> >> >>> * >> >>> >> >>> Provide tags describing your dataset >> >>> >> >>> * >> >>> >> >>> Provide number of triples >> >>> >> >>> * >> >>> >> >>> Provide information about links to other datasets in format: >> >>> >> >>> o >> >>> >> >>> links:<resource id in DataHub> >> >>> >> >>> o >> >>> >> >>> E.g., links:dbpedia >> >>> >> >>> For more details please see the LOD Cloud Diagram Page or the >> >>> detailed description here: >> >>> >> >>> https://www.w3.org/wiki/TaskForces/CommunityProjects/Linking >> OpenData/DataSets/CKANmetainformation >> >>> <https://www.w3.org/wiki/TaskForces/CommunityProjects/Linkin >> gOpenData/DataSets/CKANmetainformation> >> >>> >> >>> >> >>> >> >>> Kind regards, >> >>> >> >>> Andrejs Abele and John P. McCrae >> >>> >> >>> >> >>> >> >>> Unit for Natural Language Processing >> >>> >> >>> Insight Centre for Data Analytics >> >>> >> >>> National University of Ireland Galway >> >>> >> >>> https://nuig.insight-centre.org/unlp/people/members/andrejs-abele/ >> >>> <https://nuig.insight-centre.org/unlp/people/members/andrejs-abele/> >> >>> >> >>> http://john.mccr.ae <http://john.mccr.ae/> >> >> >> > >> > -- >> > Unit for Natural Language Processing >> > Insight Centre for Data Analytics >> > National University of Ireland Galway >> > https://nuig.insight-centre.org/unlp/people/members/andrejs-abele/ >> > >> >> Just to put this on the table. There is a "follow your nose" approach >> that can be incorporated here that I believe can address a bunch of >> technical and social hurdles for both dataset owners and the consumers >> (like for the preparation of the LOD cloud). >> >> Have a discoverable relation to receive notifications about dataset >> updates. You can decide on the subject and object URL: >> >> <http://lod-cloud.net/> <http://www.w3.org/ns/ldp#inbox> >> <http://lod-cloud.net/inbox/> . >> >> Allow POST requests with JSON-LD (and other RDF syntaxes if you'd like) >> on the Inbox URL. >> >> Dataset owners can send a payload indicating where to discover their >> datasets, and provenance level data that you'd be interested in knowing. >> You can kindly ask what to include in the payload (eg on the homepage) >> or set constraints on the Inbox etc. >> >> In this way, you are not bound to a 3rd party service (no account >> creations, or information which might go stale if not updated). There is >> also no need to have a call where people should suddenly update their >> metadata on some third party service by the end of week. People can >> *notify* you with "hey, I just updated my stuff over here, come and >> check it out!" You have the benefit of also keeping an eye on what's >> actively maintained. >> >> For dataset owners, notifying you can be automated in their tooling or >> done manually with a simple curl -X POST. >> >> You can take the notifications, process and manage them as you like. In >> fact, you can probably programmatically update the LOD cloud (SVG) >> through these notifications. One other benefit here is that other >> applications (from the community) can consume these notifications as >> well if you are inclined to make the inbox/notifications with public >> read access. You can serve the notifications as JSON-LD or if you allow >> content negotiation, have other RDF serialisations. >> >> That would be the Linked Data Notifications [1] approach for this. >> Compare this to the amount of manual intervention that's required of >> everyone (publisher and consumer) via datahub.io. >> >> I'd love to see this sort of "Webby" notifications, discovery, reuse >> going forward. >> >> If this way of working interests you and the community, let's do it. >> Happy to help out where necessary. See also [2] for existing LDN >> implementations where you might want to reuse code from. >> >> There we have it. It is all very simple. >> >> [1] https://www.w3.org/TR/ldn/ >> [2] https://linkedresearch.org/ldn/tests/summary >> >> -Sarven >> http://csarven.ca/#i >> >> > -- Michel Dumontier Distinguished Professor of Data Science Maastricht University http://dumontierlab.com
Received on Thursday, 22 June 2017 13:33:47 UTC