- From: Debattista, Jeremy <Jeremy.Debattista@iais.fraunhofer.de>
- Date: Wed, 13 May 2015 10:18:04 +0000
- To: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
- CC: Christophe Guéret <christophe.gueret@dans.knaw.nl>, Antoine Isaac <aisaac@few.vu.nl>, Phil Archer <phila@w3.org>, "Deirdre Lee (Derilinx)" <deirdre@derilinx.com>, DWBP Public List <public-dwbp-wg@w3.org>
- Message-ID: <CF22C506-2FFD-4FA4-9560-CFFF828AC941@iais.fraunhofer.de>
Hi Riccardo, all First of all apologies for my very late reply. I was first busy with my ISWC paper (which I can share with you all [1]), and exactly afterwards I was on holidays. I just returned to the office today. I was looking into [2], and I think the draft looks already great. I have some questions and comments: (1) What is the difference between dqv:CollectionOfMetricResults and dqv:Feedback? Does the former relate to "machine computed metrics" and the latter to "human assessed metrics"? Or will the Feedback concept be used for user reporting of issues - like github issues? (2) If we are interested in the reporting of issues, another kind of dqv:QualityData (issue #2 in dqv:QualityInfo and its specialisation section) could be the report of problems found in a dataset, when a dataset is assessed by machine computed metrics - see [3]. (3) "is DQV connected to DAQ in the correct way?" -> Not sure I've understood it right but for me dqv:hasDimension property is redundant there. None of its subclasses (i.e. ServicesLeveAgreement, Feedback and Standard) would require the dimension aspect, whilst dqv:CollectionOfMetricResults seems to be a generalised daQ metadata, which contains the Category-Dimension-Metric relationship. (4) Wrt issue 164 [4], I think it is important that for example a consumer has an idea of how a dataset changed over time, therefore I guess that some statistics would not hurt - anyway we will get that for free with daQ. (5) Wrt issue #2 in the dqv:CollectionOfMetricResults, the idea behind daq:computedOn is that each observation has the dataset URI or base URI (which might be the RDF file, but not ideal) attached to the observation. Although in theory I agree with forcing a computed observation to be connected to a dcat:Dataset, I think it would not be practical approach. As I said in some previous email, we have to consider those datasets that have no DCAT metadata (or have other metadata such as voID). Though if we plan to have the data quality vocabulary as an extension to dcat, then this is a natural choice. Minor Corrections (not typos or spelling): - The link in bullet 2 of Requirements section (i.e. see related action) seems to link to the meeting minutes and not to the action itself. - Constraint -> y a dqv:Standard . ('a' is missing) I hope this helps a little bit. Best Regards, Jeremy [1] http://arxiv.org/pdf/1412.3750v2.pdf [2] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV) [3] http://butterbur04.iai.uni-bonn.de/ontologies/qpro/qpro [4] https://www.w3.org/2013/dwbp/track/issues/164 On 04 May 2015, at 19:01, Riccardo Albertoni <albertoni@ge.imati.cnr.it<mailto:albertoni@ge.imati.cnr.it>> wrote: Dear Data Quality Vocabulary editors, you can find a new version of data quality concept schema in the wiki [1]. It includes the changes I have done considering the Christophe's feedbacks: - renaming of property "dqv:refersToDcatDataset" in "dqv:refersToDataset"; - daq:qualityGraph is now a direct specialisation of our dqv:CollectionOfMetricsResults (I have opened a issue [2] to remind that we should discuss if dataset metrics are a kind of quality info and how to include them in the DVQ); - added dqv:Feedback as a possible kind of dqv:QualityInfo. (I have opened an issue to figure out the relation between dqv:Feedback and duv:Feedback [3]). The conceptual schema is available as Google Drawing [4]. I guess there is still a lot to discuss about. I have reported In the wiki part of the discussion we had with Christophe hoping this might encourage other people to chime in :-) Regards, Riccardo [1] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV) [2] https://www.w3.org/2013/dwbp/track/issues/164 [3] https://www.w3.org/2013/dwbp/track/issues/165 [4] http://goo.gl/277o2R On 25 April 2015 at 13:22, Riccardo Albertoni <albertoni@ge.imati.cnr.it<mailto:albertoni@ge.imati.cnr.it>> wrote: Hi Christophe, Thanks a lot for your feedbacks, I am happy that the discussion is progressing. I think Jeremy is going to provide some feedback soon, so it is probably better to modify the schema [2] "all at once" after we get the Jeremy's feedbacks. The DUV vocabulary is using google graph, which we might consider as a valid alternative to creately, in the next DQV schema release. In regards of comments you made in [3] - I agree on changing "dqv:refersToDcatDataset" in "dqv:refersToDataset"; - regarding your <<I think the class inheritance arrows for dqv:CollectionOfMetricsResults and dqv:Standard should be inversed.>> I am not sure to understand the rationale behind this suggestion, could you explain more? - regarding your <<Besides, I don't see why the collection of metrics has to be a qb:Dataset>> Ok, I think we can have daq:qualityGraph as direct specialisation of our dqv:CollectionOfMetricsResults. However, I suggest to open an issue we can probably address after the DQV FPWD is released , saying "assuming statistics about dataset are considered a kind of quality information, as indicated in the UC document, see use case bio2rdf [4], are statistics mentioned in properly modelled in the DQV?" do you agree? A couple of quick and inline responses to the comments from your very last email: # dqv:QualityInfo and its specialisation * is dqv:QualityInfo the most appropriate name? would be dqv:QualityMetadata more appropriate? Considering that in the BPs document we speak about meta-data maybe is Metadata the best. But on the other hand I'n not keen on hardcoding the notion of metadata in a vocabulary. Going for "QualityData" would be a middle way solution mmm... I have to admit that I am not very comfortable in calling it qualityData either. I mean QualityData is a possible middle way solution, but dqv:QualityInfo/QualityData is currently a superclass of dqv:ServiceLevelAgreement and dqv:Standards which are not proper data. for this reason, I would prefer dqv:QualityInfo rather than QualityData. * are there other kinds of quality info that is worth to consider? e.g., opinions, report of known issues, ... That would be a good opportunity to link to DUV if we assume that usage correlates positively with dataset quality. Sure, I have noticed that in DUV [5] there is a class duv:Feedback which seems to be a good candidate to refer to. In the current version of DUV, duv:Feedback is related to dqg:QualityCriteria, I am not sure what exactly dqg:QualityCriteria stands for, but I guess it is supposed to be a class in our DQV, and it might roughly correspond to our dqv:QualityInfo ( or whatever dqv:QualityInfo will be called at the end of our discussion :) ) So once we have a proper name for our dqv:QualityInfo, and we have accomplished our first internal revision of the DQV, we can open an issues " let's find an agreement on references between DQV and DUV", which should be discussed with the rest of the group, and we might suggest to solve the issue: -putting duv:Feedback as one of the possible specialization of dqv:QualityInfo in both DQV and DUV Schemas. -deleting dqv:QualityInfo and the relation "describes" in DUV. Does this sound reasonable? * how the service level agreement can be represented? it is a document on the web to refer to or we want to refer to something more structured? is there any specific property we should add to dqv:ServiceLevelAgreement? According to [1] "An SLA is best described as a collection of promises". It is also a document which just lists a couple of things. We could either focus on the document aspect as a whole or try to model the list of promises and the list of related concepts. My gut feeling is that this could lead to writing an elaborated vocabulary that would span out of our scope so I'd say we should rather not do that. But we should nonetheless anticipate that someone may some day want to work on that. What about then either not indicating any range or set Resource as a range ? Then everyone is free to model an SLA as he wants. And in the BPs would could hint that structured data is better and a PDF also ok. Very good point! I agree on leaving it open to further modelling of promises by explicitly mention this possibility, it sounds like a very good idea. Concerning your proposal to not indicating any range or set Resource as a range, I think we should at least suggest a concrete unique way to include a sla human readable descriptions, so that people don't have the chance to be too much creative attaching their html, pdf or whatever in the quality-related metadata. * how standard are represented under dqv:Standard? is the class dqv:Standard suitable to include ODI certificates, to represent that a DCAT dataset has a certain compliancy to 5 LOD stars, and other kind of bets practices? should we explicitly provide a list/taxonomy of standard/ certificated to consider? is the class dqv:Standard really necessary or we can rely directly on dcterm:Standard? We should be flexible here. If we impose a specific class then ODI certificates and 5star models will have to subclass from it, so we should make it generic enough conceptually so that this works. To that respect I think dcterm:Standard is quite nice so we may want to re-use it. Or subclass from it our own Standard which is a verbatim copy of it in case some day the meaning of dcterm:Standard changes in a way that brake our vocabulary. I see you point, I agree we have to be flexible and generic enough but I am not sure to understand what you are suggesting here. * can we assume the following constraint ? x a dcat:Dataset. x dcterms:conformsTo y '''imply''' x hasQualityInfo y. y dqv:Standard Think so. We should have more of these BTW :-) Sure, I think other constraints will come out quite naturally during the design process. I am not very concerned about it, by the way, I'd like that anyone working on DQV feels free to suggest some ;) Have a nice weekend, Riccardo Cheers, Christophe [1] http://www.knowledgetransfer.net/dictionary/ITIL/en/Service_Level_Agreement.htm -- Christophe Guéret -- This message has been scanned for viruses and dangerous content by E.F.A. Project<http://www.efa-project.org/>, and is believed to be clean. [2] https://creately.com/diagram/i8lgl90p1/AXwUzXKQOHvEwUw9eJmxrndw%3D [3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Apr/0209.html [4] http://w3c.github.io/dwbp/usecasesv1.html#UC-Bio2RDF [5] https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit -- ---------------------------------------------------------------------------- Riccardo Albertoni Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes" Consiglio Nazionale delle Ricerche via de Marini 6 - 16149 GENOVA - ITALIA tel. +39-010-6475624<tel:%2B39-010-6475624> - fax +39-010-6475660<tel:%2B39-010-6475660> e-mail: Riccardo.Albertoni@ge.imati.cnr.it<mailto:Riccardo.Albertoni@ge.imati.cnr.it> Skype: callto://riccardoalbertoni/ LinkedIn: http://www.linkedin.com/in/riccardoalbertoni www: http://www.ge.imati.cnr.it/Albertoni http://purl.oclc.org/NET/riccardoAlbertoni FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf -- ---------------------------------------------------------------------------- Riccardo Albertoni Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes" Consiglio Nazionale delle Ricerche via de Marini 6 - 16149 GENOVA - ITALIA tel. +39-010-6475624<tel:%2B39-010-6475624> - fax +39-010-6475660<tel:%2B39-010-6475660> e-mail: Riccardo.Albertoni@ge.imati.cnr.it<mailto:Riccardo.Albertoni@ge.imati.cnr.it> Skype: callto://riccardoalbertoni/ LinkedIn: http://www.linkedin.com/in/riccardoalbertoni www: http://www.ge.imati.cnr.it/Albertoni http://purl.oclc.org/NET/riccardoAlbertoni FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Wednesday, 13 May 2015 10:18:42 UTC