Re: data quality from Peter Ansell on 2010-04-19 (semantic-web@w3.org from April 2010)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Tue, 20 Apr 2010 08:40:48 +1000
To: adasal <adam.saltiel@gmail.com>
Cc: paoladimaio10@googlemail.com, Michael Schneider <schneid@fzi.de>, semantic-web@w3.org, "Polleres, Axel" <axel.polleres@deri.org>
Message-ID: <m2ta1be7e0e1004191540l687e38fcmed5359afc51aaea7@mail.gmail.com>

I can see the relevance from my point of view. If you are using
multiple datasources that are published and accessed in different
locations you may have a legitimate need to modify the RDF based on
some sort of rules to purely increase the quality of the data, and it
would be useful to have a place to accumulate this information. It
would also be useful to have a way of giving opinions about datasets
in terms of their syntactic quality.

Data validation is another step altogether. You need quality data
before semantic validation can occur.

You could highlight nonsense and plainly false statements at either
the syntax or the semantic level, but you could more easily fix it at
the syntax level before using OWL to validate the data if there was a
pattern that you could look for and remove using a simple rule.

Cheers,

Peter

On 20 April 2010 08:26, adasal <adam.saltiel@gmail.com> wrote:
> I can't see the relevance of this. Validators evolve for different reasons
> and in different ways depending on what is being validated.
> But we aren't interested in whether the RDF is valid, we assume it is, that
> is tools to create and tools to validate are in step. We should assume the
> same of any OWL too.
> What we are interested in is data validity, which when it comes to triples,
> whether the referenced datum is either or both as described by what
> references it or describing what is referenced.
> But within the confines of RDF, or OWL, it is possible to create correct
> nonsense. It is quite possible to create a pizza with additional toppings of
> pizza dough, or, say, water.
> Pizza Maker may want to pass off a tomato topping called 'made from newly
> grown from Garfagnana valley', when it is really from a mixed source can of
> unknown vintage.
> So far the concern has been mainly for the correctness of the metadata, some
> misclassification or interim change in what is referenced.
> But all of that can be in place correctly and results still be either false
> or falsified.
>
> Adam
>
>
> On 19 April 2010 19:43, Paola Di Maio <paola.dimaio@gmail.com> wrote:
>>
>> Hi Michael
>>
>>
>>
>>
>>>
>>> May I ask what you mean by "valid RDF" here?
>>
>> any RDF which does not validate
>>>
>>> You refer to "many validators"? Which? There are, indeed, many, for
>>> different languages. Do you only mean the RDF validators?
>>
>>
>> sorry, maybe that was incorrect,  I took the word validators from third
>> tab on this
>> page
>>
>>  http://pedantic-web.org/
>>>
>>> Maybe you can provide a serious example for what you mean by /invalid/
>>> RDF?
>>> By "serious" I mean something that could really be found in some document
>>> on
>>> the web, where people believed that it would be valid, but it isn't (no
>>> typos).
>>
>>
>> I personally have limited experience with RDF
>> but  I remember once one of the RDF elements (fields? properties?) was
>> supposed to be a URI
>> but the RDF generator we used did not specify it had to be uri, so we
>> entered a word (literal?)
>> and validation failed, when a valide URI was entered, the RDF validated
>> I am sure the pedantic people will have compiled a catalogue of reasons
>> why validation fails?
>>
>>
>> hope I address your questions
>>
>> P
>>>
>>>
>>> --
>
>

Received on Monday, 19 April 2010 22:41:21 UTC