- From: Michael Schneider <schneid@fzi.de>
- Date: Mon, 19 Apr 2010 13:23:49 +0200
- To: "Polleres, Axel" <axel.polleres@deri.org>
- Cc: <semantic-web@w3.org>, <paoladimaio10@googlemail.com>
Hi! The quality-of-data question is not an easy one, and it's very vague what "good quality" means for data. What you are about there on pedantic-web.org [1] seems to be an effort to obtain some sort of "minimum practically achievable quality" for the data existing on the web. This is very important IMO, but other people won't probably be satisfied by this, because (amongst other things) this minimum standard won't match their tools' requirements. So, from the perspective of trying to make as many SemWeb tools as possible happy, an alternative quality criterion could be OWL 2 DL compliance. An OWL DL tool (parser, ontology management framework, editor, or reasoner) requires, in principle, that the data fed into it meets all syntactic restrictions defined in the OWL (2) DL specification [2][3]. But this is a much higher bar than what pedantic-web asks for, and is IMHO unlikely to be ever met by the majority of "real web data". Some OWL DL tools relax some of these strict requirements. For example, Pellet and, I think, the new OWL API as well apply some heuristics in order to "repair" input data [4]. But, while useful, these approaches will likely have their limits, in particular when being applied to that "chaotic" data achievable on the web, and may, in some cases, even lead to unintended results. In any case, these approaches are strictly tool-specific, so if someone authors data following these tool-specific relaxations, other standard-compliant OWL DL tools cannot be expected to be able to cope with this data. So, after all, if someone defines "data quality" in terms of OWL DL compliance, then this should really mean /full/ OWL 2 DL compliance. Which is, as said, tough to achieve on the web. As an aside: Point 5 about "Reasoning" in the pedantic-web FAQ [5] discusses problems with bogus values of inverse-functional data properties (IFDPs), such as foaf:mbox_sha1sum, and how to cope with it. From an "OWL DL data quality" perspective, this discussion would be largely redundant: The IFDPs used in FOAF (and some other vocabularies as well) make use of owl:FunctionalProperty, and this is not even allowed in OWL DL, where functional properties must not be data properties, i.e. must not be used with literals. There is an alternative in OWL 2 DL called "Keys", but I don't know of any vocabulary used on the web that applies this very new feature (for the RDF encoding of OWL 2 Keys, lookup the column for the term "owl:hasKey" in Table 16 of [3]). Michael [1] Pedantic-Web.org <http://pedantic-web.org> [2] OWL 2 Structural Specification <http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/> [3] OWL 2 Mapping to RDF Graphs <http://www.w3.org/TR/2009/REC-owl2-mapping-to-rdf-20091027/> [4] Pellet's relaxations <http://clarkparsia.com/pellet/faq/owl-full/> [5] IFDP discussion: <http://pedantic-web.org/fops.html#ifps> From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On Behalf Of Polleres, Axel Sent: Monday, April 19, 2010 12:10 PM To: paoladimaio10@googlemail.com; adam.saltiel@gmail.com; uk-government-data-developers@googlegroups.com Cc: semantic-web@w3.org Subject: Re: data quality Paola, You may want to check: http://www.pedantic-web.org/ on our efforts to improve data quality. We also have a paper on findings so far at LDOW [1]. Cheers, Axel 1. Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. Weaving the pedantic web. In 3rd International Workshop on Linked Data on the Web (LDOW2010) at WWW2010, Raleigh, USA, April 2010. ________________________________________ From: semantic-web-request@w3.org To: adasal ; uk-government-data-developers@googlegroups.com Cc: Semantic Web Sent: Mon Apr 19 10:51:14 2010 Subject: data quality Something else I wanted to add but forgot as it was late post: One of the issues that is coming up related to the discussion below, is the quaity of data (which came up in the gov data list a while back, hence in cc) A question then is: why (in some cases) is the data 'not fit for purpose?' Again several possible hypotheses in each case may need to be tested is the data inconsistent because the real world is inconsistent (the world seems to hang together even when it does not make sense to us while data models dont) - in which case maybe there is not much tha we can do, other than to continue to attempt creating plausible models of the world is the data any use before it is opened and rdfized? or does something happen in the rdfization process? lets not forget that to obtain meaningful outputs from dbases, a lot of work needs to go in it, I am thinking normalisation of schemas but also, data cleaning, which constitutes a majority of efforts in data mining I dont think the fact that data is expressed in RDF would automatically make it good Again, a good diggin of a significant set of examples of when 'data is not fit for purpose' could yield some clues as to what kind of work needs to be done So I would be inclined when something doesnt work, not just trhow it away, but study it systematically After all, most of what we know in medicine has com from dissecting corpses PDM -- Dipl.-Inform. Michael Schneider Research Scientist, Information Process Engineering (IPE) Tel : +49-721-9654-726 Fax : +49-721-9654-727 Email: michael.schneider@fzi.de WWW : http://www.fzi.de/michael.schneider ======================================================================= FZI Forschungszentrum Informatik an der Universität Karlsruhe Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe Tel.: +49-721-9654-0, Fax: +49-721-9654-959 Stiftung des bürgerlichen Rechts, Az 14-0563.1, RP Karlsruhe Vorstand: Prof. Dr.-Ing. Rüdiger Dillmann, Dipl. Wi.-Ing. Michael Flor, Prof. Dr. Dr. h.c. Wolffried Stucky, Prof. Dr. Rudi Studer Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus =======================================================================
Received on Monday, 19 April 2010 11:24:25 UTC