Re: More feedback on DQV

Hi Riccardo, all

First of all apologies for my very late reply. I was first busy with my ISWC paper (which I can share with you all [1]), and exactly afterwards I was on holidays. I just returned to the office today.

I was looking into [2], and I think the draft looks already great. I have some questions and comments:
 (1) What is the difference between dqv:CollectionOfMetricResults and dqv:Feedback? Does the former relate to "machine computed metrics" and the latter to "human assessed metrics"? Or will the Feedback concept be used for user reporting of issues - like github issues?
 (2) If we are interested in the reporting of issues, another kind of dqv:QualityData (issue #2 in dqv:QualityInfo and its specialisation section) could be the report of problems found in a dataset, when a dataset is assessed by machine computed metrics - see [3].
 (3) "is DQV connected to DAQ in the correct way?" -> Not sure I've understood it right but for me dqv:hasDimension property is redundant there. None of its subclasses (i.e. ServicesLeveAgreement, Feedback and Standard) would require the dimension aspect, whilst dqv:CollectionOfMetricResults seems to be a generalised daQ metadata, which contains the Category-Dimension-Metric relationship.
 (4) Wrt issue 164 [4], I think it is important that for example a consumer has an idea of how a dataset changed over time, therefore I guess that some statistics would not hurt - anyway we will get that for free with daQ.
 (5) Wrt issue #2 in the dqv:CollectionOfMetricResults, the idea behind daq:computedOn is that each observation has the dataset URI or base URI (which might be the RDF file, but not ideal) attached to the observation. Although in theory I agree with forcing a computed observation to be connected to a dcat:Dataset, I think it would not be practical approach. As I said in some previous email, we have to consider those datasets that have no DCAT metadata (or have other metadata such as voID). Though if we plan to have the data quality vocabulary as an extension to dcat, then this is a natural choice.

Minor Corrections (not typos or spelling):
 - The link in bullet 2 of Requirements section (i.e. see related action) seems to link to the meeting minutes and not to the action itself.
 - Constraint -> y a dqv:Standard . ('a' is missing)

I hope this helps a little bit.

Best Regards,
Jeremy

[1] http://arxiv.org/pdf/1412.3750v2.pdf
[2] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV)
[3] http://butterbur04.iai.uni-bonn.de/ontologies/qpro/qpro
[4] https://www.w3.org/2013/dwbp/track/issues/164

On 04 May 2015, at 19:01, Riccardo Albertoni <albertoni@ge.imati.cnr.it<mailto:albertoni@ge.imati.cnr.it>> wrote:

Dear Data Quality Vocabulary editors,

you can find a new version of data quality concept schema in the wiki  [1]. It  includes the changes I have done considering the Christophe's  feedbacks:

- renaming of  property "dqv:refersToDcatDataset" in "dqv:refersToDataset";
- daq:qualityGraph is now a direct specialisation of our dqv:CollectionOfMetricsResults (I have opened a issue [2] to remind that we should discuss if  dataset metrics are a kind of quality info and how to include them in the DVQ);
- added dqv:Feedback as a possible kind of dqv:QualityInfo. (I have opened an issue to figure out the relation between  dqv:Feedback  and duv:Feedback [3]).

The  conceptual schema is available as Google Drawing [4]. I guess there is still a lot to discuss about.
  I have reported In the wiki  part of the discussion we had with Christophe hoping this might encourage other people to chime in :-)

Regards,
Riccardo

[1] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV)
[2] https://www.w3.org/2013/dwbp/track/issues/164
[3] https://www.w3.org/2013/dwbp/track/issues/165
[4] http://goo.gl/277o2R

On 25 April 2015 at 13:22, Riccardo Albertoni <albertoni@ge.imati.cnr.it<mailto:albertoni@ge.imati.cnr.it>> wrote:
Hi Christophe,
Thanks a lot for your feedbacks, I am happy that the discussion  is progressing.

I think Jeremy is going to provide some feedback soon, so it is probably better to modify the  schema [2] "all at once" after we get the Jeremy's feedbacks.
The DUV vocabulary is using google graph, which  we might consider as a valid  alternative to  creately, in the next  DQV schema release.

In regards of  comments you made in [3]

- I agree on changing  "dqv:refersToDcatDataset" in "dqv:refersToDataset";

-  regarding your <<I think the class inheritance arrows for dqv:CollectionOfMetricsResults and dqv:Standard should be inversed.>>
I am not sure to understand  the rationale behind this suggestion, could you explain more?

- regarding your <<Besides, I don't see why the collection of metrics has to be a qb:Dataset>>
Ok, I think we can have daq:qualityGraph as direct specialisation of our dqv:CollectionOfMetricsResults.
However,  I suggest to  open an issue  we can probably   address after the DQV FPWD is released , saying "assuming  statistics about dataset  are  considered a kind of quality information, as indicated in the UC document, see use case bio2rdf [4], are statistics mentioned in  properly modelled in the DQV?"
do you agree?

A couple of quick and inline responses to the comments from your very last  email:


# dqv:QualityInfo and its specialisation

* is dqv:QualityInfo  the most appropriate name? would be dqv:QualityMetadata more appropriate?
Considering that in the BPs document we speak about meta-data maybe is Metadata the best. But on the other hand I'n not keen on hardcoding the notion of metadata in a vocabulary. Going for "QualityData" would be a middle way solution
mmm...  I have to admit that    I am not very comfortable in   calling it qualityData either.  I mean QualityData is  a possible middle way solution,   but dqv:QualityInfo/QualityData is currently a superclass of  dqv:ServiceLevelAgreement and dqv:Standards  which are not proper data.  for this reason,  I would prefer  dqv:QualityInfo rather  than QualityData.


* are there other kinds of quality info that is  worth to consider? e.g., opinions, report of known issues, ...
That would be a good opportunity to link to DUV if we assume that usage correlates positively with dataset quality.

Sure, I have noticed  that in DUV [5] there is a  class duv:Feedback which seems to be a good candidate to refer to.
In the current version of DUV,   duv:Feedback is related to dqg:QualityCriteria,  I am not sure what exactly dqg:QualityCriteria  stands for, but I guess it is supposed to be a class  in our DQV, and it might roughly correspond to our dqv:QualityInfo ( or whatever dqv:QualityInfo will be called at the end of our discussion :) )

So once we have a proper name for  our dqv:QualityInfo, and we have accomplished our  first internal revision of the DQV,    we can open an issues " let's find  an agreement on references between DQV and DUV", which should be discussed with the rest of the group, and we might suggest  to solve the issue:
-putting duv:Feedback as one of the possible specialization of dqv:QualityInfo in both DQV and DUV Schemas.
-deleting  dqv:QualityInfo and the relation "describes"  in DUV.

Does this sound reasonable?


* how the service level agreement can be represented? it is a document on the web to refer to or we want to refer to something more structured? is there any specific property we should add to dqv:ServiceLevelAgreement?
According to [1] "An SLA is best described as a collection of promises". It is also a document which just lists a couple of things. We could either focus on the document aspect as a whole or try to model the list of promises and the list of related concepts. My gut feeling is that this could lead to writing an elaborated vocabulary that would span out of our scope so I'd say we should rather not do that. But we should nonetheless anticipate that someone may some day want to work on that.
What about then either not indicating any range or set Resource as a range ? Then everyone is free to model an SLA as he wants. And in the BPs would could hint that structured data is better and a PDF also ok.

Very good point! I agree on leaving it open to further  modelling of promises by explicitly mention this possibility,  it  sounds like a very good idea. Concerning your proposal to not indicating any range or set Resource as a range, I think we should at least suggest a concrete  unique way to include  a sla human readable descriptions, so that people don't have the chance to be too much creative attaching  their html, pdf or  whatever in the quality-related  metadata.

* how standard are represented under dqv:Standard?  is the class dqv:Standard suitable to include ODI certificates,  to represent that a DCAT dataset has a certain   compliancy to  5 LOD stars, and other kind of bets practices? should we explicitly provide a list/taxonomy of standard/ certificated to consider?  is the class dqv:Standard really necessary or we can rely directly on dcterm:Standard?
We should be flexible here. If we impose a specific class then ODI certificates and 5star models will have to subclass from it, so we should make it generic enough conceptually so that this works. To that respect I think dcterm:Standard is quite nice so we may want to re-use it. Or subclass from it our own Standard which is a verbatim copy of it in case some day the meaning of dcterm:Standard changes in a way that brake our vocabulary.

I see you point,  I agree we have to be flexible and generic enough but I am not sure to understand what you are suggesting here.

* can we assume the following constraint ?
  x a dcat:Dataset. x dcterms:conformsTo y '''imply'''  x hasQualityInfo y. y dqv:Standard
Think so. We should have more of these BTW :-)

Sure, I think other constraints will come out quite naturally during the design process. I am not very concerned about it, by the way, I'd like that anyone working on DQV feels free to suggest some ;)

Have a nice weekend,
Riccardo


Cheers,
Christophe




[1] http://www.knowledgetransfer.net/dictionary/ITIL/en/Service_Level_Agreement.htm


--
Christophe Guéret

--
This message has been scanned for viruses and dangerous content by
E.F.A. Project<http://www.efa-project.org/>, and is believed to be clean.


[2] https://creately.com/diagram/i8lgl90p1/AXwUzXKQOHvEwUw9eJmxrndw%3D
[3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Apr/0209.html
[4] http://w3c.github.io/dwbp/usecasesv1.html#UC-Bio2RDF
[5] https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit

--
----------------------------------------------------------------------------
Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624<tel:%2B39-010-6475624> - fax +39-010-6475660<tel:%2B39-010-6475660>
e-mail: Riccardo.Albertoni@ge.imati.cnr.it<mailto:Riccardo.Albertoni@ge.imati.cnr.it>
Skype: callto://riccardoalbertoni/
LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
www: http://www.ge.imati.cnr.it/Albertoni
http://purl.oclc.org/NET/riccardoAlbertoni
FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf



--
----------------------------------------------------------------------------
Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624<tel:%2B39-010-6475624> - fax +39-010-6475660<tel:%2B39-010-6475660>
e-mail: Riccardo.Albertoni@ge.imati.cnr.it<mailto:Riccardo.Albertoni@ge.imati.cnr.it>
Skype: callto://riccardoalbertoni/
LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
www: http://www.ge.imati.cnr.it/Albertoni
http://purl.oclc.org/NET/riccardoAlbertoni
FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf

Received on Wednesday, 13 May 2015 10:18:42 UTC