Re: Data quality and requirements - discussion for F2F? from Steven Adler on 2014-10-30 (public-dwbp-wg@w3.org from October 2014)

From: Steven Adler <adler1@us.ibm.com>
Date: Thu, 30 Oct 2014 06:50:53 -0700
To: Riccardo Albertoni <riccardo.albertoni@ge.imati.cnr.it>
Cc: Antoine Isaac <aisaac@few.vu.nl>, Bart van Leeuwen <bart_van_leeuwen@netage.nl>, "Debattista, Jeremy" <Jeremy.Debattista@iais-extern.fraunhofer.de>, Makx Dekkers <mail@makxdekkers.com>, Public DWBP WG <public-dwbp-wg@w3.org>, riccardo.imati@gmail.com
Message-ID: <OFE8984CB9.94704135-ON88257D81.004BB443-88257D81.004C1200@us.ibm.com>
Metrics change human behavior with superficial focus of attainment of
desired factors instead of deeper understanding of underlying issues.  We
all saw how this played out in banks prior to the Credit Crisis as CEO's
became obsessed with managing VAR (Value at Risk), even if most did not
understand how VAR was calculated.

I recommend focusing on the details of data quality vocabularies and let
vendors and community groups determine how they are tabulated into metrics.


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Riccardo Albertoni <riccardo.albertoni@ge.imati.cnr.it>                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Makx Dekkers <mail@makxdekkers.com>                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"Debattista, Jeremy" <Jeremy.Debattista@iais-extern.fraunhofer.de>, Bart van Leeuwen <bart_van_leeuwen@netage.nl>, Public DWBP WG                 |
  |<public-dwbp-wg@w3.org>, Antoine Isaac <aisaac@few.vu.nl>                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |10/30/2014 06:37 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: Data quality and requirements - discussion for F2F?                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Hi All,
I basically agree with  Jeremy, I think we should define how quality
metadata can be represented at an abstract level in a metadata model( e.g.
Ontology). In my opinion both human- focused information and metrics based
quality should be represented in the model provided that  there are use
cases grounding these needs.

In order to make quality of dataset comparable and objective, I think It
would be great to have a set of recommended metrics and quality dimensions,
even if I am not sure such a set can be easily identified.

Anyway, If a set of metrics is going to be defined and "recommended" I
think that set should be extensible, as I tried to  stress proposing  the
LuSTRE use case and the Q-MetricExtensibility requirement in my e-mail last
week (see Quality requirements and a new use case for UCR [1] ).


Regards,
Riccardo

[1]
http://lists.w3.org/Archives/Public/public-dwbp-comments/2014Oct/0002.html



On 30 October 2014 12:58, Makx Dekkers <mail@makxdekkers.com> wrote:
  As I am following this discussion, it occurred to me that maybe we could
  look also at who will use any statements about  and what for.


  On one hand, there is quality-related information that is for human
  consumption, e.g. things like the information provided at
  http://www.legislation.gov.uk/help#aboutChangesToLeg and other FAQ items
  on that page. Such information can be used by humans to take decisions
  about whether they want to use the data.





  On the other hand, precise metrics may be used by programs to pre-select
  collections of data, but in that case we need to understand maybe a
  little bit more what kind of programs or applications would consume the
  metrics and for what purpose.





  It seems to me that maybe the human- focused information is a little
  easier to define (e.g. using the legislation.gov.uk as a starting point).
  We could start to define a small set of properties for those (either as
  text or using some controlled vocabulary) and look at the metrics later
  on the basis of existing applications that use quality metrics in
  practice. I agree that metrics are not that easy to define, and probably
  also complex to use.





  Makx





  De: Debattista, Jeremy [mailto:
  Jeremy.Debattista@iais-extern.fraunhofer.de]
  Enviado el: jueves, 30 de octubre de 2014 11:11
  Para: Bart van Leeuwen
  CC: Public DWBP WG; Antoine Isaac
  Asunto: Re: Data quality and requirements - discussion for F2F?





  Hi Bart, Antoine





  I agree with both of you that defining a vocabulary based on metrics is
  hard. From my work on data quality, I realised that different domains,
  use cases etc might require different metrics. Of course, there are those
  metrics that would be suitable for most of the use cases. What I found
  useful was to define how quality metadata should be represented at an
  abstract level [1]. Then based on this abstract ontology, we defined a
  number of quality metrics [2], some of which might be similar to those
  extracted from the DWBP use cases. On the whole, my opinion is that we
  have to provide a pragmatic solution that would be suitable for everyone
  within the community, i.e. in the future other interested parties should
  be able to define quality metrics that can be easily interoperable with
  other defined quality metrics.





  I would gladly join the F2F discussion remotely, if it won’t be after
  10pm (CET) :).





  Cheers,


  Jer








  [1]
  https://raw.githubusercontent.com/EIS-Bonn/Luzzu/master/luzzu-semantics/src/main/resources/vocabularies/daq/daq.trig



  [2]
  https://raw.githubusercontent.com/diachron/quality/luzzu-integration/src/main/resources/vocabularies/dqm/dqm.trig






  On 29 Oct 2014, at 17:17, Bart van Leeuwen <bart_van_leeuwen@netage.nl>
  wrote:




        Hi Antoine,

        Last night I had a conversation with Bernadette on this topic which
        ended up in a nice discussion.
        I'm on the same page with you that I think the Quality vocabulary
        is rather hard to define if we will focus on metrics.

        I Hope we have some good amount of time during the F2F to discuss
        it.

        Met Vriendelijke Groet / With Kind Regards
        Bart van Leeuwen

        ##############################################################
        # twitter: @semanticfire
        # netage.nl
        # http://netage.nl

        # Enschedepad 76
        # 1324 GJ Almere
        # The Netherlands
        # tel. +31(0)36-5347479
        ##############################################################



        From:        Antoine Isaac <aisaac@few.vu.nl>
        To:        Public DWBP WG <public-dwbp-wg@w3.org>
        Date:        29-10-2014 17:07
        Subject:        Data quality and requirements - discussion for F2F?








        Dear all,

        As a preparation to the F2F discussions on vocabularies, I have
        checked the latest version of the UCR document [1]. The progress
        that has been made on describing use cases and identifying
        requirements is impressive.
        In particular, it is great the categorization of requirements to
        identify requirements most important for our vocabulary work,
        including the one on quality and granularity [2].

        Yet, I am still not sure of the scoping of the quality vocabulary.
        I've looked at all requirements, one could say that many could
        impact the scope of a vocabulary to be used to document quality.
        Some thoughts are on a new wiki page [3]. I admittedly played the
        devil's advocate there, i.e. I was very liberal when judging a
        requirement could impact quality and granularity. But in fact when
        looking at what various UCs have to say about quality, I am
        wondering whether I am the only one confused! I have compiled a
        list of quotes from the UC descriptions [3], which shows that
        considering all contributors, a very wide definition of quality is
        still on order.

        My wish for the F2F discussion would be that the group spend some
        time going through the requirements, and discuss whether they
        should be in scope of the vocabulary.
        Or to put it in other words, decide whether the vocabulary should
        include elements for documenting whether a dataset meet the
        considered requirements, ie., there is metadata for data re-users
        to understand the performance of the dataset against the
        requirements the group has identified.

        A reminder, all kind of pointers for the quality work are gathered
        at [4]. Including first vocabulary design by Phil.

        Best regards,

        Antoine

        [1] http://www.w3.org/TR/2014/WD-dwbp-ucr-20141014/

        [2]
        http://www.w3.org/TR/dwbp-ucr/#requirements-for-quality-and-granularity-description-vocabulary


        [3] https://www.w3.org/2013/dwbp/wiki/UCRs_and_Quality

        [4] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes








  --
  This message was scanned by ESVA and is believed to be clean.
  Click to report as spam. Segnala come spam.



--
----------------------------------------------------------------------------

Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624 - fax +39-010-6475660
e-mail: Riccardo.Albertoni@ge.imati.cnr.it
Skype: callto://riccardoalbertoni/
LinkedIn: http://www.linkedin.com/in/riccardoalbertoni

www: http://www.ge.imati.cnr.it/Albertoni

http://purl.oclc.org/NET/riccardoAlbertoni

FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf

----------------------------------------------------------------------------
Attachments

image/gif attachment: graycol.gif
image/gif attachment: ecblank.gif
Received on Thursday, 30 October 2014 13:52:40 UTC