- From: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
- Date: Mon, 20 Oct 2014 16:15:01 +0200
- To: public-dwbp-comments@w3.org
- Message-ID: <CAOHhXmShxSEACXOOeH5xqyBy2RbuaNnbziC=On82QyM4iPdYpg@mail.gmail.com>
Dear All, In reply of the “Invitation to Review Use-Cases and Requirements (UCR) of the W3C Data on the Web Best Practices Working Group (DWBP)”, I have proposed a new scenario, named "LuSTRE: Linked Thesaurus fRamework for Environment”, which you can find added to the Second Round Use Cases. https://www.w3.org/2013/dwbp/wiki/index.php?title=Second-Round_Use_Cases After my participation to the last vocabulary call, I've tried to match the proposed use case with the requirements currently included in the UCR, and following a suggestion from Phil Archer, I've also started wondering if the “quality requirements” expressed in the last version of UCR fully cover the LuSTRE scenario or, rather, some rephrasing / requirements should be discussed. I have to say that I have found extremely interesting the collection of requirements included in the UCR till now. However, I’ve got the impression that the requirement “define general quality metrics, but allow for inclusion of additional domain-specific metrics” which was already mentioned in the quality note [1] is only partially represented in the current UCR and I am wondering if it should be more explicitly stated. Quality is usually defined as “fitness for use”, there are notions of quality that are general enough to apply to almost every use (let say domain/application/technological neutral quality ??!), but at a given point, when people considers data for some concrete applications and technology, less neutral quality dimensions and metrics are needed. So in my opinion, extendability of quality dimensions and metrics to potentially include these more specific quality measures should be carefully considered when designing the Quality Vocabulary. In this direction, the LuSTRE Use Case can ground the need for "quality dimensions/metrics extensibility" at least in two ways, a)It deals with a specific kind of “open” data: thesauri and controlled vocabulary encoded in SKOS which requires specific quality metrics (e.g., criteria and metrics suggested in qSKOS [2] ). b) It deals with quality of datasets as well as quality of linksets. It stresses that Linksets are as important as Dataset when it comes to the joint exploitation of independently served datasets in linked data. And when we focus on linkset quality, specific quality metrics can come into the play, especially if we focus on specific linkset exploitation purposes such as dataset complementation [3]. The aforementioned metrics are just two examples of specific metrics that can be needed when dealing with use cases, and, depending on the applications and the domains considered by open data publishers and consumers I guess that a plenty of other specific metrics might be required. How and at what extent the Quality Vocabulary should represent these and other example of domain/technological specific metrics? Well, I don’t know... perhaps they should be “directly” representable, or some application profile of quality vocabulary should be foreseen for those quality dimensions and metrics that are considered too specific. The how and at what extent is probably a sort of technicality that should be addressed when designing the quality vocabulary. However, no matter how the working group is going to pursue "extensibility", the extensibility requirement for quality dimensions and metrics is still there, and the LuSTRE scenario can help to point this requirement out. Coming to how the requirement “extensibility of quality dimensions and metrics" could be incorporated in the current UCR, two alternatives come to my mind: Alternative a) let’s rephrase a little the current R-QualityMetrics requirement. R-QualityMetrics requirement is currently stated as "R-QualityMetrics, Data should be associated with a set of standardized, objective quality metrics" It could be rephrased with "R-QualityMetrics, Data should be associated with a set of standardized, objective quality metrics. This set of standardized quality metrics can be extended with further well-documented domain-specific metrics." perhaps also the adjective “Standardized” should be rephrased in terms of “well-known” and/or “well-documented”.. I am not sure that when it comes to open data there is an effective / well established set of Standardized metrics. Probably, there is a set of quite-know quality measures that have been developed in scientific literature and by practitioners. However, are these metrics object of an actual standardization process? as far as I know, quality is still a quite open issue especially when it is referred to data included in the LOD. If there isn't any standardization process, I guess we should think to rephrase the requirement, for example as "R-QualityMetrics, Data should be associated with a set of well-known and documented objective quality metrics. This set of quality metrics can include user defined/domain-specific metrics." Alternative b) Instead of rephrasing R-QualityMetrics, let's add a brand new quality requirement. What about “Q-MetricExtensibility” ? It could be defined as Q-MetricExtensibility: the set of metrics considered in order to determine and document open data quality can be extended with well-documented domain-specific metrics. What do you think? Would it be worth considering metrics extensibility in a more explicit fashion? Regards, Riccardo Albertoni References: [1] https://www.w3.org/2013/dwbp/wiki/Data_quality_notes [2]Christian Mader, Bernhard Haslhofer, Antoine Isaac: Finding Quality Issues in SKOS Vocabularies. TPDL 2012: 222-233 [3] Riccardo Albertoni, Asunción Gómez-Pérez: Assessing linkset quality for complementing third-party datasets <http://edbt.org/Proceedings/2013-Genova/papers/workshops/a8-albertoni.pdf>. EDBT/ICDT Workshops 2013: 52-59
Received on Monday, 20 October 2014 14:15:29 UTC