Re: Data quality and requirements - discussion for F2F? from Andrea Perego on 2014-10-30 (public-dwbp-wg@w3.org from October 2014)

From: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
Date: Thu, 30 Oct 2014 17:29:06 +0100
To: Eric Stephan <ericphb@gmail.com>
Cc: Steven Adler <adler1@us.ibm.com>, Riccardo Albertoni <riccardo.albertoni@ge.imati.cnr.it>, Antoine Isaac <aisaac@few.vu.nl>, Bart van Leeuwen <bart_van_leeuwen@netage.nl>, "Debattista, Jeremy" <Jeremy.Debattista@iais-extern.fraunhofer.de>, Makx Dekkers <mail@makxdekkers.com>, Public DWBP WG <public-dwbp-wg@w3.org>, Riccardo Albertoni <riccardo.imati@gmail.com>
Message-id: <CAHzfgWAfnDBxOa6QC4-YW7Qwia_uaYcPZF0_aB8cshEtPcOL4A@mail.gmail.com>
Sorry to jump in this discussion, but I would like to contribute a
possibly relevant use case concerning INSPIRE metadata. and the work
under-way for their RDF-based representation.

(I see that INSPIRE is already mentioned in the use cases contributed
by Ghislain, Deirdre & Phil, and Riccardo [1,2,3], so I skip any
intro).

INSPIRE metadata include a section on conformity, indicating whether
the corresponding data (or service) has been tested or not against a
given specification, and, in the latter case, whether it is conformant
or not.

Although in INSPIRE the purpose of this concerns specifically data and
service interoperability, probably the same approach can be
generalised to other data quality issues. It might be a generic way to
indicate whether given quality criteria are met (or not).

The solution currently proposed in the RDF representation of INSPIRE
metadata is to use EARL [4] - e.g.:


[] a dcat:Dataset ;
    wdrs:describedby [ a earl:Assertion ;
            earl:result [ a earl:TestResult ;
                    earl:outcome
<http://inspire.ec.europa.eu/codelist/DegreeOfConformity/notConformant>
] ;
            earl:test
<http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32010R1089:EN:NOT>
] .

<http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32010R1089:EN:NOT>
    dcterms:issued
"2010-12-08"^^<http://www.w3.org/2001/XMLSchema-datatypes#date> ;
    dcterms:title "COMMISSION REGULATION (EU) No 1089/2010 of 23
November 2010 implementing Directive 2007/2/EC of the European
Parliament and of the Council as regards interoperability of spatial
data sets and services"@en .


Of course, dct:conformsTo can also be used for the same purpose, but
only to say that the criteria are met. EARL addresses the general use
case.

For more details:

https://ies-svn.jrc.ec.europa.eu/projects/metadata/wiki/INSPIRE_profile_of_DCAT-AP_-_Reference#Conformity

I'm keen to know whether you have any comment on this approach.

Thanks!

Andrea

---
[1]http://www.w3.org/TR/2014/WD-dwbp-ucr-20141014/#UC-ISOGEOStory
[2]http://www.w3.org/TR/2014/WD-dwbp-ucr-20141014/#UC-TrackingofDataUsage
[3]http://www.w3.org/2013/dwbp/wiki/Second-Round_Use_Cases#LuSTRE:_Linked_Thesaurus_fRamework_for_Environment
[4]http://www.w3.org/TR/EARL10-Schema/


On Thu, Oct 30, 2014 at 3:10 PM, Eric Stephan <ericphb@gmail.com> wrote:
> +1
>
>>> I recommend focusing on the details of data quality vocabularies and let
>>> vendors and community groups determine how they are tabulated into metrics.
>
>
>
> On Thu, Oct 30, 2014 at 6:50 AM, Steven Adler <adler1@us.ibm.com> wrote:
>>
>> Metrics change human behavior with superficial focus of attainment of
>> desired factors instead of deeper understanding of underlying issues.  We
>> all saw how this played out in banks prior to the Credit Crisis as CEO's
>> became obsessed with managing VAR (Value at Risk), even if most did not
>> understand how VAR was calculated.
>>
>> I recommend focusing on the details of data quality vocabularies and let
>> vendors and community groups determine how they are tabulated into metrics.
>>
>>
>> Best Regards,
>>
>> Steve
>>
>> Motto: "Do First, Think, Do it Again"
>>
>> Riccardo Albertoni ---10/30/2014 06:37:59 AM---Hi All, I basically agree
>> with  Jeremy, I think we should define how quality
>>
>>
>> From:
>>
>>
>> Riccardo Albertoni <riccardo.albertoni@ge.imati.cnr.it>
>>
>> To:
>>
>>
>> Makx Dekkers <mail@makxdekkers.com>
>>
>> Cc:
>>
>>
>> "Debattista, Jeremy" <Jeremy.Debattista@iais-extern.fraunhofer.de>, Bart
>> van Leeuwen <bart_van_leeuwen@netage.nl>, Public DWBP WG
>> <public-dwbp-wg@w3.org>, Antoine Isaac <aisaac@few.vu.nl>
>>
>> Date:
>>
>>
>> 10/30/2014 06:37 AM
>>
>> Subject:
>>
>>
>> Re: Data quality and requirements - discussion for F2F?
>> ________________________________
>>
>>
>>
>> Hi All,
>> I basically agree with  Jeremy, I think we should define how quality
>> metadata can be represented at an abstract level in a metadata model( e.g.
>> Ontology). In my opinion both human- focused information and metrics based
>> quality should be represented in the model provided that  there are use
>> cases grounding these needs.
>>
>> In order to make quality of dataset comparable and objective, I think It
>> would be great to have a set of recommended metrics and quality dimensions,
>> even if I am not sure such a set can be easily identified.
>>
>> Anyway, If a set of metrics is going to be defined and "recommended" I
>> think that set should be extensible, as I tried to  stress proposing  the
>> LuSTRE use case and the Q-MetricExtensibility requirement in my e-mail last
>> week (see Quality requirements and a new use case for UCR [1] ).
>>
>>
>> Regards,
>> Riccardo
>>
>> [1]
>> http://lists.w3.org/Archives/Public/public-dwbp-comments/2014Oct/0002.html
>>
>>
>> On 30 October 2014 12:58, Makx Dekkers <mail@makxdekkers.com> wrote:
>>
>> As I am following this discussion, it occurred to me that maybe we could
>> look also at who will use any statements about  and what for.
>>
>> On one hand, there is quality-related information that is for human
>> consumption, e.g. things like the information provided at
>> http://www.legislation.gov.uk/help#aboutChangesToLeg and other FAQ items on
>> that page. Such information can be used by humans to take decisions about
>> whether they want to use the data.
>>
>>
>>
>> On the other hand, precise metrics may be used by programs to pre-select
>> collections of data, but in that case we need to understand maybe a little
>> bit more what kind of programs or applications would consume the metrics and
>> for what purpose.
>>
>>
>>
>> It seems to me that maybe the human- focused information is a little
>> easier to define (e.g. using the legislation.gov.uk as a starting point). We
>> could start to define a small set of properties for those (either as text or
>> using some controlled vocabulary) and look at the metrics later on the basis
>> of existing applications that use quality metrics in practice. I agree that
>> metrics are not that easy to define, and probably also complex to use.
>>
>>
>>
>> Makx
>>
>>
>>
>> De: Debattista, Jeremy
>> [mailto:Jeremy.Debattista@iais-extern.fraunhofer.de]
>> Enviado el: jueves, 30 de octubre de 2014 11:11
>> Para: Bart van Leeuwen
>> CC: Public DWBP WG; Antoine Isaac
>> Asunto: Re: Data quality and requirements - discussion for F2F?
>>
>>
>>
>> Hi Bart, Antoine
>>
>>
>>
>> I agree with both of you that defining a vocabulary based on metrics is
>> hard. From my work on data quality, I realised that different domains, use
>> cases etc might require different metrics. Of course, there are those
>> metrics that would be suitable for most of the use cases. What I found
>> useful was to define how quality metadata should be represented at an
>> abstract level [1]. Then based on this abstract ontology, we defined a
>> number of quality metrics [2], some of which might be similar to those
>> extracted from the DWBP use cases. On the whole, my opinion is that we have
>> to provide a pragmatic solution that would be suitable for everyone within
>> the community, i.e. in the future other interested parties should be able to
>> define quality metrics that can be easily interoperable with other defined
>> quality metrics.
>>
>>
>>
>> I would gladly join the F2F discussion remotely, if it won’t be after 10pm
>> (CET) :).
>>
>>
>>
>> Cheers,
>>
>> Jer
>>
>>
>>
>>
>>
>> [1]
>> https://raw.githubusercontent.com/EIS-Bonn/Luzzu/master/luzzu-semantics/src/main/resources/vocabularies/daq/daq.trig
>>
>> [2]
>> https://raw.githubusercontent.com/diachron/quality/luzzu-integration/src/main/resources/vocabularies/dqm/dqm.trig
>>
>>
>>
>> On 29 Oct 2014, at 17:17, Bart van Leeuwen <bart_van_leeuwen@netage.nl>
>> wrote:
>>
>>
>> Hi Antoine,
>>
>> Last night I had a conversation with Bernadette on this topic which ended
>> up in a nice discussion.
>> I'm on the same page with you that I think the Quality vocabulary is
>> rather hard to define if we will focus on metrics.
>>
>> I Hope we have some good amount of time during the F2F to discuss it.
>>
>> Met Vriendelijke Groet / With Kind Regards
>> Bart van Leeuwen
>>
>> ##############################################################
>> # twitter: @semanticfire
>> # netage.nl
>> # http://netage.nl
>> # Enschedepad 76
>> # 1324 GJ Almere
>> # The Netherlands
>> # tel. +31(0)36-5347479
>> ##############################################################
>>
>>
>>
>> From:        Antoine Isaac <aisaac@few.vu.nl>
>> To:        Public DWBP WG <public-dwbp-wg@w3.org>
>> Date:        29-10-2014 17:07
>> Subject:        Data quality and requirements - discussion for F2F?
>>
>> ________________________________
>>
>>
>>
>>
>> Dear all,
>>
>> As a preparation to the F2F discussions on vocabularies, I have checked
>> the latest version of the UCR document [1]. The progress that has been made
>> on describing use cases and identifying requirements is impressive.
>> In particular, it is great the categorization of requirements to identify
>> requirements most important for our vocabulary work, including the one on
>> quality and granularity [2].
>>
>> Yet, I am still not sure of the scoping of the quality vocabulary. I've
>> looked at all requirements, one could say that many could impact the scope
>> of a vocabulary to be used to document quality. Some thoughts are on a new
>> wiki page [3]. I admittedly played the devil's advocate there, i.e. I was
>> very liberal when judging a requirement could impact quality and
>> granularity. But in fact when looking at what various UCs have to say about
>> quality, I am wondering whether I am the only one confused! I have compiled
>> a list of quotes from the UC descriptions [3], which shows that considering
>> all contributors, a very wide definition of quality is still on order.
>>
>> My wish for the F2F discussion would be that the group spend some time
>> going through the requirements, and discuss whether they should be in scope
>> of the vocabulary.
>> Or to put it in other words, decide whether the vocabulary should include
>> elements for documenting whether a dataset meet the considered requirements,
>> ie., there is metadata for data re-users to understand the performance of
>> the dataset against the requirements the group has identified.
>>
>> A reminder, all kind of pointers for the quality work are gathered at [4].
>> Including first vocabulary design by Phil.
>>
>> Best regards,
>>
>> Antoine
>>
>> [1] http://www.w3.org/TR/2014/WD-dwbp-ucr-20141014/
>> [2]
>> http://www.w3.org/TR/dwbp-ucr/#requirements-for-quality-and-granularity-description-vocabulary
>> [3] https://www.w3.org/2013/dwbp/wiki/UCRs_and_Quality
>> [4] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes
>>
>>
>>
>>
>>
>> --
>> This message was scanned by ESVA and is believed to be clean.
>> Click to report as spam. Segnala come spam.
>>
>>
>>
>>
>> --
>>
>> ----------------------------------------------------------------------------
>> Riccardo Albertoni
>> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
>> Magenes"
>> Consiglio Nazionale delle Ricerche
>> via de Marini 6 - 16149 GENOVA - ITALIA
>> tel. +39-010-6475624 - fax +39-010-6475660
>> e-mail: Riccardo.Albertoni@ge.imati.cnr.it
>> Skype: callto://riccardoalbertoni/
>> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
>> www: http://www.ge.imati.cnr.it/Albertoni
>> http://purl.oclc.org/NET/riccardoAlbertoni
>> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
>>
>> ----------------------------------------------------------------------------
>>
>



-- 
Andrea Perego, Ph.D.
European Commission DG JRC
Institute for Environment & Sustainability
Unit H06 - Digital Earth & Reference Data
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/

----
The views expressed are purely those of the writer and may
not in any circumstances be regarded as stating an official
position of the European Commission.
Received on Thursday, 30 October 2014 16:29:53 UTC