Quality requirements and a new use case for UCR from Riccardo Albertoni on 2014-10-20 (public-dwbp-comments@w3.org from October 2014)

From: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
Date: Mon, 20 Oct 2014 16:15:01 +0200
To: public-dwbp-comments@w3.org
Message-ID: <CAOHhXmShxSEACXOOeH5xqyBy2RbuaNnbziC=On82QyM4iPdYpg@mail.gmail.com>
Dear All,
In reply of the  “Invitation to Review Use-Cases and Requirements (UCR) of
the
W3C Data on the Web Best Practices Working Group (DWBP)”, I have  proposed
a new scenario, named  "LuSTRE: Linked Thesaurus fRamework for
Environment”, which you can find added to the Second Round Use Cases.
https://www.w3.org/2013/dwbp/wiki/index.php?title=Second-Round_Use_Cases

After my participation to the last vocabulary call,  I've tried to match
the proposed use case with the requirements currently included in the UCR,
and  following a  suggestion  from  Phil Archer, I've  also started
wondering  if the “quality requirements”  expressed in the last version of
 UCR fully cover the LuSTRE scenario or, rather, some rephrasing /
requirements  should be  discussed.

I have to say that I have found extremely interesting the collection of
requirements included in the UCR till now. However, I’ve got the impression
 that the requirement “define general quality metrics, but allow for
inclusion of additional domain-specific metrics” which was already
mentioned in the quality note [1] is only partially represented in the
current UCR and I am wondering if it should be more explicitly stated.

 Quality is usually defined as “fitness for use”,  there are notions of
quality  that are general enough to apply to almost every use (let say
domain/application/technological neutral quality ??!), but at a given
point, when people considers data for some concrete applications and
technology, less neutral quality dimensions and metrics are needed.  So in
my opinion, extendability of quality dimensions and metrics to potentially
include these more specific quality measures should be carefully considered
when designing the Quality Vocabulary.

In this direction, the LuSTRE Use Case  can  ground the need for "quality
dimensions/metrics extensibility" at least in two ways,
a)It deals with a specific kind of  “open” data: thesauri and controlled
vocabulary encoded in SKOS which requires specific quality metrics (e.g.,
criteria and metrics suggested in qSKOS [2] ).
b) It deals with quality of datasets as well as quality of linksets. It
stresses that Linksets are as important as Dataset when it comes to the
joint exploitation of independently served datasets in linked data.  And
when we focus on linkset quality,  specific quality metrics can come into
the play, especially if we focus on specific linkset exploitation purposes
 such as dataset complementation [3].

The aforementioned metrics are just two examples of specific metrics that
can be needed when  dealing with use cases, and,   depending on the
applications and the domains considered by open data publishers and
consumers  I guess that a plenty of other specific metrics might be
required.

How and at what extent the Quality Vocabulary should represent these and
other example of domain/technological specific metrics?
Well,  I don’t know...  perhaps they should  be “directly” representable,
or some application profile of quality vocabulary  should be foreseen for
 those  quality dimensions and metrics that are considered too specific.
The how and at what extent is probably a sort of technicality that should
be addressed when  designing the quality vocabulary. However,  no matter
how the working group is going to pursue "extensibility",  the
extensibility requirement for  quality dimensions and metrics is still
there,  and   the  LuSTRE scenario can help to point this requirement out.

Coming to how the requirement “extensibility of  quality dimensions and
metrics" could be incorporated in the current UCR,  two alternatives come
to my mind:

Alternative a) let’s rephrase a little the current R-QualityMetrics
requirement.  R-QualityMetrics requirement is currently stated as

"R-QualityMetrics, Data should be associated with a set of standardized,
objective quality metrics"

It could be rephrased with

"R-QualityMetrics, Data should be associated with a set of standardized,
objective quality metrics. This set of standardized quality metrics can be
extended with further well-documented domain-specific metrics."

perhaps also the adjective “Standardized” should be rephrased in terms of
“well-known” and/or  “well-documented”..

I am not sure that  when it comes to open data  there is  an effective /
well established  set of Standardized metrics.   Probably, there is a set
of quite-know quality measures that have been developed in scientific
literature and by practitioners. However,   are these  metrics object of
 an actual standardization process?  as far as I know, quality is still a
quite open issue especially when it is referred to  data included in the
LOD.

If  there isn't any standardization process, I guess we  should think to
rephrase the  requirement, for example  as

"R-QualityMetrics, Data should be associated with a set of well-known and
documented  objective quality metrics. This set of quality metrics can
include  user defined/domain-specific metrics."


Alternative b) Instead of rephrasing R-QualityMetrics, let's  add a brand
new quality requirement.  What about “Q-MetricExtensibility” ?
It could be defined as

Q-MetricExtensibility: the set of metrics considered  in order to determine
and document  open data quality can be extended with well-documented
domain-specific metrics.

What do you think? Would it be worth  considering metrics extensibility in
a more explicit fashion?

Regards,
Riccardo Albertoni


References:
[1] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes
[2]Christian Mader, Bernhard Haslhofer, Antoine Isaac: Finding Quality
Issues in SKOS Vocabularies. TPDL 2012: 222-233
[3] Riccardo Albertoni, Asunción Gómez-Pérez: Assessing linkset quality for
complementing third-party datasets
<http://edbt.org/Proceedings/2013-Genova/papers/workshops/a8-albertoni.pdf>.
EDBT/ICDT Workshops 2013: 52-59
Received on Monday, 20 October 2014 14:15:29 UTC