Re: Use the void and schema.org [Was; Re: LODAtlas 1.0 release - Browsing Linked Data Catalogs]

Hi Emmanuel,

I will try to answer your questions:

> - Has VoID achieved good coverage yet? (I remember reading that few datasets were exposing VoID descriptions, but that may have been several years ago and things might have changed)

Around December 2015, we wanted to evaluate the quality of the datasets in the LOD cloud. Back then we thought that one of the main entry points for a dataset would be via voID metadata. From 569 datasets, only around 98 datasets had a voID description defined (not necessarily in their datahub metadata though). Furthermore, we also found out that most voID metadata had broken links to their datasets (one in particular is DBpedia, whom I’ve recently asked to fixed - not sure if it is fixed or not now). We performed more experiments which we discussed in our paper [1]

> - How do we get those decentralized VoID descriptions? Is there a crawling service somewhere that would give us access to such descriptions?

Back when voID was introduced, the editors said that one best practice is to put the metadata in the /.well-known/void path of the fully qualified domain name [2]. We did check that but I don’t remember if we had any success with it.

Fast-forward to 2018. We have restarted our quality assessment. This time we are doing a monthly assessment. Getting access to these supposedly open datasets is still painful imo. I thought of doing what Alasdair suggested, however, I think it would only be beneficial if whoever is publishing datasets is also updating this metadata (void, schema, dcat.. whichever suits them best). Furthermore, if we need some sort of a crawling service, then I would prefer if the /.well-known mechanism is used for such metadata.

Best Regards,
Jeremy

[1] Debattista et al. - Evaluating the Quality of the LOD Cloud: An Empirical Investigation (http://www.semantic-web-journal.net/system/files/swj1757.pdf <http://www.semantic-web-journal.net/system/files/swj1757.pdf>)
[2] https://www.w3.org/TR/void/#discovery <https://www.w3.org/TR/void/#discovery>


> On 4 Oct 2018, at 12:00, emmanuel.pietriga@inria.fr wrote:
> 
> Hello Jerven,
> 
> This is definitely a valid point.
> 
> We feature basic VoID output for the selected datasets, but we indeed do no support direct VoID import so far. We have been focusing on the architecture and the user interface, and again, old.datahub seemed like a reasonable way of bootstrapping (we chose because that’s what was used for generating the LOD cloud in the first place).
> 
> I’m pretty sure it would be pretty straightforward to support VoID import. What’s less clear to me, however, is:
> - Has VoID achieved good coverage yet? (I remember reading that few datasets were exposing VoID descriptions, but that may have been several years ago and things might have changed)
> - How do we get those decentralized VoID descriptions? Is there a crawling service somewhere that would give us access to such descriptions?
> 
> regs,
> Emmanuel
> --
> Emmanuel Pietriga
> INRIA - ILDA
> http://pages.saclay.inria.fr/emmanuel.pietriga
> 
> 
> 
>> On 4 Oct 2018, at 09:33, Jerven Bolleman <me@jerven.eu> wrote:
>> 
>> Dear Emmanuel, Community,
>> 
>> This is a lovely UI, but let down by it's data. There is some much good data missing and misrepresented because of the bad, wrong and obsolete data in datahub.io. Datahub approaches do not scale and are not maintainable for data providers.
>> 
>> Like the LOD diagram it misses a lot of key datasets produced by professionals every month as these professionals don't have time to update ten forms in different places. However, these people do produce schema.org markup and/or extensive void files.
>> 
>> Why are you not parsing these? How come I need to fill in a form instead of just submitting an URI where there is schema.org or a void file? Why do I need to update your database by hand, again and again.
>> 
>> Why does the LOD community that produces tools not actually consume LOD to drive them?
>> 
>> Regards,
>> Jerven
>> 
>> On Wed, Oct 3, 2018 at 8:34 PM emmanuel.pietriga@inria.fr <emmanuel.pietriga@inria.fr> wrote:
>> [Continuing this thread on the public-lod list only]
>> 
>> This is a good point.
>> 
>> The short answer is: because we bootstrapped LODAtlas with content from old.datahub.io (and others), using datasets that were tagged as “lod”. Wikidata is in there, but is not tagged as “lod". Actually, it isn’t tagged with anything we had considered in the first place. See [1].
>> 
>> [1] https://old.datahub.io/dataset/wikidata
>> 
>> Longer answer: of course, you are right. It should definitely be in there. Like many others, probably. Regarding wikidata, we’ll add it ourselves. Other datasets that belong there: you can use the submission form to add them, or point us at them and we’ll see what we can do. We have limited resources to handle such requests for now though, but we’ll do the best we can.
>> 
>> best regards,
>> Emmanuel
>> --
>> Emmanuel Pietriga
>> INRIA - ILDA
>> http://pages.saclay.inria.fr/emmanuel.pietriga
>> 
>> 
>> 
>>> On 3 Oct 2018, at 19:07, Ettore RIZZA <ettorerizza@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> Thank you very much, this kind of tool interests me a lot. Just a note: I see that Wikidata is not there. Are not you afraid that an atlas of the LOD without a so huge dataset would be like an atlas of the world without China or Russia?
>>> 
>>> Best regards,
>>> 
>>> Ettore Rizza
>>> 
>>> On Wed, 3 Oct 2018 at 16:45, Hande Gözükan <hande.gozukan@inria.fr> wrote:
>>> Dear All, 
>>> 
>>> We are happy to announce the release of LODAtlas version 1.0 [1] developed by team Ilda at Inria [2].
>>> 
>>> LODAtlas is a Web tool that helps users find linked datasets of interest through faceted browsing + keyword & URI search on the datasets' metadata and their schema-level content. The tool provides a set of interactive visualization widgets that help compare datasets along different criteria (number of triples, last update, interlinking with other datasets in the LOD cloud, etc.). Users can also get an idea of the contents of a given dataset thanks to a visual summary of the statements it contains. LODAtlas also provides a REST API that provides programmatic access to most of the data that can be visualized [3].
>>> 
>>> A talk about LODAtlas will be given at ISWC next week in Monterey, CA, USA [4a] (full paper available from [4b]).
>>> 
>>> [1] http://lodatlas.lri.fr
>>> [2] https://ilda.saclay.inria.fr
>>> [3] http://lodatlas.lri.fr/api
>>> [4a] http://iswc2018.semanticweb.org/sessions/browsing-linked-data-catalogs-with-lodatlas/
>>> [4b] https://hal.inria.fr/hal-01827766/document
>>> 
>>> Short description:
>>> 
>>> LODAtlas takes as input CKAN [5] dataset descriptions. The LODAtlas instance at [1] gives access to the entire (old) DataHub catalogue [6], that of data.gov [7], and partial access to the EU data portal [8] (data processing is still ongoing).
>>> 
>>> [5] https://docs.ckan.org/en/2.8/contents.html
>>> [6] https://old.datahub.io
>>> [7] https://www.data.gov
>>> [8] https://www.europeandataportal.eu
>>> 
>>> Data processing comprises the following main steps:
>>> - Download the metadata describing linked datasets from the CKAN repository.
>>> - Download the associated RDF dump files (when available).
>>> - Process the dump files using LODStats [9] to extract classes, properties and vocabularies.
>>> - Process dump files together with schema/ontology files using the RDF Quotients framework [10] to generate visual summaries of the dumps’ contents.
>>> 
>>> LODAtlas is developed as an open source project under GNU General Public License v3.0. The source code is hosted on GitLab [11].
>>> 
>>> [9] https://github.com/AKSW/LODStats
>>> [10] https://hal.inria.fr/hal-01325900
>>> [11] https://gitlab.inria.fr/epietrig/LODAtlas
>>> 
>>> LODAtlas instances can be set up by anyone, using CKAN-compliant linked dataset descriptions. LODAtlas is available as a Docker image, or can be compiled locally.
>>> 
>>> Any feedback is appreciated!
>>> 
>>> Best regards,
>>> The LODAtlas team.
>> 
>> 
>> 
>> 
>> -- 
>> Jerven Bolleman
>> me@jerven.eu
> 
> 

Received on Thursday, 4 October 2018 12:55:46 UTC