W3C home > Mailing lists > Public > public-lod@w3.org > December 2010

Re: Quality Criteria for Linked Data sources

From: Muriel Foulonneau <muriel.foulonneau@gmail.com>
Date: Thu, 16 Dec 2010 15:35:32 +0100
Message-ID: <AANLkTikY_oaBMkO8k14A-JqiwFC6z_fhtSSa7ggr2tZF@mail.gmail.com>
To: Chris Bizer <chris@bizer.de>
Cc: Annika Flemming <annika.flemming@gmx.de>, public-lod@w3.org, christian.bizer@fu-berlin.de
Hi Annika,

Here is a quick feedback. This is an interesting summary.

I had a similar comment on the fact that you do not seem to focus on the
reuse of classes and properties from other models or the creation of links
between the local properties and classes and other properties and classes
defined in other models.
Also, the connection of the dataset to the different semantic Web tools
(e.g., local ontology recorded in an ontology repository, dataset
declared/found in existing tools, e.g. Sindice ... equivalent to search
engine optimization).
The number of internal and external links is a very important quality
criterion. I would also pick the number of links to the dataset (i.e. other
dataset having reused URIs defined in this dataset or defined specific
relations to those).
Finally, you seem to be addressing different things, such as timeliness of
the data but without reusing a complete data quality framework, including
accuracy for instance (see Diane Hillmann's work for instance on this
http://www.ecommons.cornell.edu/handle/1813/7895). So if you are looking for
criteria which are only specific to Linked data, then I am not certain this
one is so specific. Otherwise, then you probably would need other criteria
from traditional data quality metrics.

I hope that helps and keep us posted

Muriel Foulonneau
Tudor Research Centre

On Thu, Dec 16, 2010 at 10:19 AM, Chris Bizer <chris@bizer.de> wrote:

> Dear Annika,
> great work and a really nice fusion of the classic data quality criteria
> that one finds in the literature from databases with Linked Data specific
> aspects.
> Three comments:
> 1. Your criteria seam to focus mainly on the publication of instance data
> and do not say too much about the schema level. The overall goal of Linked
> Data is to publish data in a self-descriptive way [1], which means that you
> should not only set links on instance level, but you should also set links
> on schema level relating terms from different vocabularies to each other.
> This especially applies when you use proprietary terms, which cannot always
> be avoided. Thus, maybe you still want to add some criteria about providing
> definitions for proprietary vocabulary terms and setting links between
> different vocabularies to your list.
> 2. Your criteria in the category content are only a subset of the usual
> content-oriented criteria in literature (for summaries see for instance
> [2][3]). I guess you had reasons not to include all, but maybe you want to
> check against these lists again.
> 3. If you want talk in your thesis about the compliance of existing data
> sources on the Web with the quality criteria, the statistics about the
> compliance with different publishing best practices in the State of the LOD
> Cloud document [4] could be a good starting point.
> Please also circulate a link to your thesis on this list once you have
> finished it. It appears like this is going to be an interesting read :-)
> Cheers,
> Chris
> [1] http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html
> [2] http://portal.acm.org/citation.cfm?id=1791545
> [3] http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000002736
> [4] http://www4.wiwiss.fu-berlin.de/lodcloud/state/
> -----Ursprüngliche Nachricht-----
> Von: public-lod-request@w3.org [mailto:public-lod-request@w3.org] Im
> Auftrag von Annika Flemming
> Gesendet: Mittwoch, 15. Dezember 2010 20:50
> An: public-lod@w3.org
> Betreff: Quality Criteria for Linked Data sources
> Hi,
> I'm a student at the Humboldt University of Berlin and I'm currently
> writing my diploma thesis under the supervision of Olaf Hartig. The aim of
> my thesis is to draw up a set of criteria to assess the quality of Linked
> Data sources. My findings include eleven criteria grouped into four
> categories. Each criterion includes a set of so-called indicators. These
> indicators constitute a measurable aspect of a criterion and, thus, allow
> for the assessment of the quality of a data source w.r.t the criteria.
> I've written a summary of my findings, which can be accessed here:
> http://sourceforge.net/apps/mediawiki/trdf/index.php?title=Quality_Criteria_for_Linked_Data_sources
> To evaluate my findings, I decided to post this summary hoping to receive
> some feedback about the criteria and indicators I suggested. Moreover, I'd
> like to initiate a discussion about my findings, and about their
> applicability to a quality assessment of data sources.
> Your comments might be included in my thesis, but I won't add any names.
> A further summary will follow shortly, describing a formalism based on
> these criteria and its application to several data sources.
> Thanks to everyone participating,
> Annika
> --
> GRATIS! Movie-FLAT mit über 300 Videos.
> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome
Received on Thursday, 16 December 2010 18:31:29 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:10 UTC