- From: Michael Schneider <schneid@fzi.de>
- Date: Wed, 21 Apr 2010 00:18:59 +0200
- To: <paoladimaio10@googlemail.com>
- Cc: <semantic-web@w3.org>, "Polleres, Axel" <axel.polleres@deri.org>
Hi Paola! The list of validators [1] that you mention is very heterogeneous, reaching from basic RDF syntax checkers, over special-purpose validators that look for certain flaws (such as Eyeball), up to a full-fledged OWL DL syntax-and-semantics validator (btw, the Pellet-based validator happens to be largely outdated). Each of these validators can be said to represent a certain notion of "data quality", but all these notions are pretty different from each other. So, does this help to answer the question what "good data quality" is? Or why data is sometimes "not fit for purpose"? No single one of those validators can claim to represent /the/ answer. An RDF parser may tell you that your data is fine, but Eyeball doesn't agree. Your data may even be OWL DL compliant, but some URIs are not dereferencible, or there is trouble with FOAF pragmatics being used, or the used prefixes look weird to some other validator, etc... Someone may define "good data quality" to be data that passes *all* these validators (and possibly additional ones not listed). But this would be an even much higher bar for web data to reach compared to what I exemplarily discussed in another post, namely to use full OWL DL compliance as a criterion for the quality of data, because OWL DL compliance would then be only part of this definition (represented by the Pellet validator). And I already claimed that hoping "only" for full OWL DL compliance of "real web" data (e.g. the LOD stuff) is pretty unrealistic for the majority of existing (and upcoming) web data. So, what is "good quality data" at the end? Why isn't data sometimes not fit for purpose? From this discussion, I can't really tell! As I stated in my first mail: The "data quality" question is a tricky one. :) Another point: I just asked you the question what you mean by "valid RDF", since I am using this term for syntactically correct RDF (according to the RDF spec). Now, if you really meant this, than this would rather be the /weakest/ possible criterion for data quality, since if a document that claims to be an RDF document turns out to not be parseable at all, then it's actually not RDF, and I would say that it doesn't really count as data (it's just an undefined soup of characters). So I would not really want to understand "valid RDF" as a criterion for "good quality data", similarly as I would not understand knowing the alphabet as a criterion for good writing. :-) In fact, it looks to me that the pedantic web people do not discuss the topic of invalid RDF documents at all. All the items in the FAQ at the pedantic-web page [2] already assume syntactically correctness of the investigated RDF data. The data problems discussed there are on a higher level, typically of a kind that is covered by one of the validators that look for specific flaws, such as literals of a datatype that does not match the range of the used property, etc. Sure, there will certainly be quite a bunch of broken RDF documents on the web. But it should be obvious to their authors that they need to be fixed, since otherwise they are simply invisible to tools that want to exploit existing data on the web. And fixing (only syntactically) broken RDF isn't that difficult, anyway, provided that one uses an appropriate RDF authoring tool. Hence, I see no necessity for the pedantic-web folks to put this issue on their list. Cheers, Michael [1] <http://pedantic-web.org/tools.html> [2] <http://pedantic-web.org/fops.html> From: paoladimaio10@googlemail.com [mailto:paoladimaio10@googlemail.com] On Behalf Of Paola Di Maio Sent: Monday, April 19, 2010 8:43 PM To: Michael Schneider Cc: semantic-web@w3.org; Polleres, Axel Subject: Re: data quality Hi Michael May I ask what you mean by "valid RDF" here? any RDF which does not validate You refer to "many validators"? Which? There are, indeed, many, for different languages. Do you only mean the RDF validators? sorry, maybe that was incorrect, I took the word validators from third tab on this page http://pedantic-web.org/ Maybe you can provide a serious example for what you mean by /invalid/ RDF? By "serious" I mean something that could really be found in some document on the web, where people believed that it would be valid, but it isn't (no typos). I personally have limited experience with RDF but I remember once one of the RDF elements (fields? properties?) was supposed to be a URI but the RDF generator we used did not specify it had to be uri, so we entered a word (literal?) and validation failed, when a valide URI was entered, the RDF validated I am sure the pedantic people will have compiled a catalogue of reasons why validation fails? hope I address your questions P -- Dipl.-Inform. Michael Schneider Research Scientist, Information Process Engineering (IPE) Tel : +49-721-9654-726 Fax : +49-721-9654-727 Email: michael.schneider@fzi.de WWW : http://www.fzi.de/michael.schneider ======================================================================= FZI Forschungszentrum Informatik an der Universität Karlsruhe Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe Tel.: +49-721-9654-0, Fax: +49-721-9654-959 Stiftung des bürgerlichen Rechts, Az 14-0563.1, RP Karlsruhe Vorstand: Prof. Dr.-Ing. Rüdiger Dillmann, Dipl. Wi.-Ing. Michael Flor, Prof. Dr. Dr. h.c. Wolffried Stucky, Prof. Dr. Rudi Studer Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus =======================================================================
Received on Tuesday, 20 April 2010 22:19:35 UTC