- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 11 Apr 2011 07:39:43 -0400
- To: Marco Fossati <fossati@fbk.eu>
- CC: public-lod@w3.org, semantic-web@w3.org
- Message-ID: <4DA2E87F.3090603@openlinksw.com>
On 4/11/11 6:51 AM, Marco Fossati wrote: > Hi Christian and everyone, > > When working with strongly heterogeneous data, quality is fundamental: > it lets us exploiting the whole potential of the data. > As I am working with data coming from very local and domain-specific > realities (e.g. the tourism portal of a small geographical area), I > would like to stress real world usage and applications. With such a > focus in mind, in my opinion there are two ways to leverage quality: > > 1. Persuading the data publishers we are dealing with to expose > better quality data, because they can benefit from it; > 2. Performing a data restructuration on our own, by trying to find > out the rules that should fix the problems coming from data > sources (i.e. the problems created by the data publishers). > > At the moment, I am discarding the first point, as it seems much more > effort demanding than the second one. In practice, telling someone who > is not initiated to Semantic Web technologies that he has to expose > its data in RDF because he can earn some money in a short term is a > quite complex task. If you put "RDF" in the conversation you only increase the mental mirage quotient of your quest re. monetary value of published data. It's much easier if you talk about Linked Data as hyperlinks between disparate data items across the Web that deliver perpetual enrichment via network effects of the InterWeb. There are many simple examples with regards to lookups that can simplify the value prop. of Linked Data data esp. you have "Wikipedia as a Database" as an easy to demonstrate option via DBpedia, for instance. > Therefore, the creation of a data quality management ontology is very > interesting, even if I fear that it could add complexity to an already > complex issue. If we connect rather than disconnect with our target audiences by understanding their terminology first, we'll be more inclined to make terminology links between ours and those of the value prop. target. This is a zillion times better than reciting mantras that boil down to: our terminology or nothing. > In conclusion, the main question is: how could we write data quality > constraints (via an implementation of > http://semwebquality.org/documentation/primer/20101124/index.html for > example) for transforming data generally coming from non-RDF formats > (CSV, XML, microformats-annotated web pages)? Like most things amongst cognitive entities, we ultimately have to put conversations about data into the data itself. Beyond that, it just boils down to the subjective needs of the data beholder or consumer. Kingsley > > Cheers, > > Marco > FBK Web of Data unit > http://fbk.eu > http://wed.fbk.eu/ > > On 4/7/11 10:34 PM, Christian Fuerber wrote: >> Hi Kingsley, >> IMO, data quality is the degree to which data fulfills quality requirements. >> As you said, quality requirements are subjective and, therefore, can be >> very heterogeneous and contradictory, even in closed settings. In my eyes, >> the most effective way to get a hand on data quality is to explicitly >> represent, manage, and share quality requirements. This way, we can agree >> and disagree about them while we can always view each other's quality >> assumptions. This is particularly important, when making statements about >> the quality of a data source or ontology. >> >> Therefore, we have started to create a data quality management ontology >> which shall facilitate the representation and publication of quality >> requirements in RDF [1]. An overview presentation what we could do with such >> an ontology is available at [2]. As soon as we have a stable version, we >> plan to publish it athttp://semwebquality.org for public. >> >> However, no matter how hard we are trying to establish a high level of data >> quality, I believe that it is almost impossible to achieve 100 %, especially >> due to the heterogeneous requirements. But we should try to approximate and >> keep up a high level. >> >> Please, let me know what you think about our approach. >> >> [1]http://www.heppnetz.de/files/dataquality-vocab-lwdm2011.pdf >> [2] >> http://www.slideshare.net/cfuerber/towards-a-vocabulary-for-data-quality-man >> agement-in-semantic-web-architectures >> >> Cheers, >> Christian >> >> ------------------------------------------ >> Dipl.-Kfm. Christian Fürber >> Professur für Allgemeine BWL, insbesondere E-Business e-business& web >> science research group Universität der Bundeswehr München >> >> e-mail:c.fuerber@unibw.de >> www:http://www.unibw.de/ebusiness/ >> skype: c.fuerber >> twitter: cfuerber >> >> >> -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Monday, 11 April 2011 11:40:10 UTC