- From: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
- Date: Thu, 14 May 2015 16:47:20 +0200
- To: "Debattista, Jeremy" <Jeremy.Debattista@iais.fraunhofer.de>
- Cc: Christophe Guéret <christophe.gueret@dans.knaw.nl>, Antoine Isaac <aisaac@few.vu.nl>, Phil Archer <phila@w3.org>, "Deirdre Lee (Derilinx)" <deirdre@derilinx.com>, DWBP Public List <public-dwbp-wg@w3.org>
- Message-ID: <CAOHhXmRBC27TSOu6qRUu4Dst7k3aWyo5AZs=OgJ1kvZvzP4Agg@mail.gmail.com>
Hi Jeremy, Many thanks for your feedback! I think your comments provide interesting food for thought. As far as I know, in these days, Christopher and Antoine are producing an updated version of DQV. I guess most of your comments will be still relevant for their updated version, in any case, I plan to discuss your comments in more detail after reading the latest version. Best, Riccardo On 13 May 2015 at 12:18, Debattista, Jeremy < Jeremy.Debattista@iais.fraunhofer.de> wrote: > Hi Riccardo, all > > First of all apologies for my very late reply. I was first busy with my > ISWC paper (which I can share with you all [1]), and exactly afterwards I > was on holidays. I just returned to the office today. > > I was looking into [2], and I think the draft looks already great. I > have some questions and comments: > (1) What is the difference between dqv:CollectionOfMetricResults and > dqv:Feedback? Does the former relate to "machine computed metrics" and the > latter to "human assessed metrics"? Or will the Feedback concept be used > for user reporting of issues - like github issues? > (2) If we are interested in the reporting of issues, another kind of > dqv:QualityData (issue #2 in dqv:QualityInfo and its specialisation > section) could be the report of problems found in a dataset, when a dataset > is assessed by machine computed metrics - see [3]. > (3) "is DQV connected to DAQ in the correct way?" -> Not sure I've > understood it right but for me dqv:hasDimension property is redundant > there. None of its subclasses (i.e. ServicesLeveAgreement, Feedback and > Standard) would require the dimension aspect, whilst > dqv:CollectionOfMetricResults seems to be a generalised daQ metadata, which > contains the Category-Dimension-Metric relationship. > (4) Wrt issue 164 [4], I think it is important that for example a > consumer has an idea of how a dataset changed over time, therefore I guess > that some statistics would not hurt - anyway we will get that for free with > daQ. > (5) Wrt issue #2 in the dqv:CollectionOfMetricResults, the idea behind > daq:computedOn is that each observation has the dataset URI or base URI > (which might be the RDF file, but not ideal) attached to the observation. > Although in theory I agree with forcing a computed observation to be > connected to a dcat:Dataset, I think it would not be practical approach. As > I said in some previous email, we have to consider those datasets that have > no DCAT metadata (or have other metadata such as voID). Though if we plan > to have the data quality vocabulary as an extension to dcat, then this is a > natural choice. > > Minor Corrections (not typos or spelling): > - The link in bullet 2 of Requirements section (i.e. see related action) > seems to link to the meeting minutes and not to the action itself. > - Constraint -> y a dqv:Standard . ('a' is missing) > > I hope this helps a little bit. > > Best Regards, > Jeremy > > [1] http://arxiv.org/pdf/1412.3750v2.pdf > [2] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV) > [3] http://butterbur04.iai.uni-bonn.de/ontologies/qpro/qpro > [4] https://www.w3.org/2013/dwbp/track/issues/164 > > On 04 May 2015, at 19:01, Riccardo Albertoni <albertoni@ge.imati.cnr.it> > wrote: > > Dear Data Quality Vocabulary editors, > > you can find a new version of data quality concept schema in the wiki > [1]. It includes the changes I have done considering the Christophe's > feedbacks: > > - renaming of property "dqv:refersToDcatDataset" in > "dqv:refersToDataset"; > - daq:qualityGraph is now a direct specialisation of our dqv:CollectionOfMetricsResults > (I have opened a issue [2] to remind that we should discuss if dataset > metrics are a kind of quality info and how to include them in the DVQ); > - added dqv:Feedback as a possible kind of dqv:QualityInfo. (I have > opened an issue to figure out the relation between dqv:Feedback and duv:Feedback > [3]). > > The conceptual schema is available as Google Drawing [4]. I guess there > is still a lot to discuss about. > I have reported In the wiki part of the discussion we had with > Christophe hoping this might encourage other people to chime in :-) > > Regards, > Riccardo > > [1] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV) > [2] https://www.w3.org/2013/dwbp/track/issues/164 > [3] https://www.w3.org/2013/dwbp/track/issues/165 > [4] *MailScanner has detected definite fraud in the website at "goo.gl". > Do not trust this website:* http://goo.gl/277o2R <http://goo.gl/277o2R> > > On 25 April 2015 at 13:22, Riccardo Albertoni <albertoni@ge.imati.cnr.it> > wrote: > >> Hi Christophe, >> Thanks a lot for your feedbacks, I am happy that the discussion is >> progressing. >> >> I think Jeremy is going to provide some feedback soon, so it is >> probably better to modify the schema [2] "all at once" after we get the >> Jeremy's feedbacks. >> The DUV vocabulary is using google graph, which we might consider as a >> valid alternative to creately, in the next DQV schema release. >> >> In regards of comments you made in [3] >> >> - I agree on changing "dqv:refersToDcatDataset" in >> "dqv:refersToDataset"; >> >> - regarding your <<I think the class inheritance arrows for >> dqv:CollectionOfMetricsResults and dqv:Standard should be inversed.>> >> I am not sure to understand the rationale behind this suggestion, could >> you explain more? >> >> - regarding your <<Besides, I don't see why the collection of metrics >> has to be a qb:Dataset>> >> Ok, I think we can have daq:qualityGraph as direct specialisation of >> our dqv:CollectionOfMetricsResults. >> However, I suggest to open an issue we can probably address after >> the DQV FPWD is released , saying "assuming statistics about dataset are >> considered a kind of quality information, as indicated in the UC document, >> see use case bio2rdf [4], are statistics mentioned in properly modelled in >> the DQV?" >> do you agree? >> >> A couple of quick and inline responses to the comments from your very >> last email: >> >> >>> # dqv:QualityInfo and its specialisation >>> >>> * is dqv:QualityInfo the most appropriate name? would be >>> dqv:QualityMetadata more appropriate? >>> Considering that in the BPs document we speak about meta-data maybe is >>> Metadata the best. But on the other hand I'n not keen on hardcoding the >>> notion of metadata in a vocabulary. Going for "QualityData" would be a >>> middle way solution >>> >> mmm... I have to admit that I am not very comfortable in calling it >> qualityData either. I mean QualityData is a possible middle way solution, >> but dqv:QualityInfo/QualityData is currently a superclass of dqv:ServiceLevelAgreement >> and dqv:Standards which are not proper data. for this reason, I would >> prefer dqv:QualityInfo rather than QualityData. >> >> >>> * are there other kinds of quality info that is worth to consider? >>> e.g., opinions, report of known issues, ... >>> That would be a good opportunity to link to DUV if we assume that >>> usage correlates positively with dataset quality. >>> >> >> Sure, I have noticed that in DUV [5] there is a class duv:Feedback >> which seems to be a good candidate to refer to. >> In the current version of DUV, duv:Feedback is related >> to dqg:QualityCriteria, I am not sure what exactly dqg:QualityCriteria >> stands for, but I guess it is supposed to be a class in our DQV, and it >> might roughly correspond to our dqv:QualityInfo ( or whatever >> dqv:QualityInfo will be called at the end of our discussion :) ) >> >> So once we have a proper name for our dqv:QualityInfo, and we have >> accomplished our first internal revision of the DQV, we can open an >> issues " let's find an agreement on references between DQV and DUV", which >> should be discussed with the rest of the group, and we might suggest to >> solve the issue: >> -putting duv:Feedback as one of the possible specialization of >> dqv:QualityInfo in both DQV and DUV Schemas. >> -deleting dqv:QualityInfo and the relation "describes" in DUV. >> >> Does this sound reasonable? >> >> >>> * how the service level agreement can be represented? it is a document >>> on the web to refer to or we want to refer to something more structured? is >>> there any specific property we should add to dqv:ServiceLevelAgreement? >>> According to [1] "An SLA is best described as a collection of >>> promises". It is also a document which just lists a couple of things. We >>> could either focus on the document aspect as a whole or try to model the >>> list of promises and the list of related concepts. My gut feeling is that >>> this could lead to writing an elaborated vocabulary that would span out of >>> our scope so I'd say we should rather not do that. But we should >>> nonetheless anticipate that someone may some day want to work on that. >>> >> What about then either not indicating any range or set Resource as a >>> range ? Then everyone is free to model an SLA as he wants. And in the BPs >>> would could hint that structured data is better and a PDF also ok. >>> >>> Very good point! I agree on leaving it open to further modelling of >> promises by explicitly mention this possibility, it sounds like a very >> good idea. Concerning your proposal to not indicating any range or set >> Resource as a range, I think we should at least suggest a concrete unique >> way to include a sla human readable descriptions, so that people don't >> have the chance to be too much creative attaching their html, pdf or >> whatever in the quality-related metadata. >> >> >>> * how standard are represented under dqv:Standard? is the class >>> dqv:Standard suitable to include ODI certificates, to represent that a >>> DCAT dataset has a certain compliancy to 5 LOD stars, and other kind of >>> bets practices? should we explicitly provide a list/taxonomy of standard/ >>> certificated to consider? is the class dqv:Standard really necessary or we >>> can rely directly on dcterm:Standard? >>> We should be flexible here. If we impose a specific class then ODI >>> certificates and 5star models will have to subclass from it, so we should >>> make it generic enough conceptually so that this works. To that respect I >>> think dcterm:Standard is quite nice so we may want to re-use it. Or >>> subclass from it our own Standard which is a verbatim copy of it in case >>> some day the meaning of dcterm:Standard changes in a way that brake our >>> vocabulary. >>> >>> I see you point, I agree we have to be flexible and generic enough >> but I am not sure to understand what you are suggesting here. >> >> >>> * can we assume the following constraint ? >>> x a dcat:Dataset. x dcterms:conformsTo y '''imply''' x hasQualityInfo >>> y. y dqv:Standard >>> Think so. We should have more of these BTW :-) >>> >> >> Sure, I think other constraints will come out quite naturally during >> the design process. I am not very concerned about it, by the way, I'd like >> that anyone working on DQV feels free to suggest some ;) >> >> Have a nice weekend, >> Riccardo >> >> >>> >>> Cheers, >>> Christophe >>> >>> >>> >>> >>> [1] >>> http://www.knowledgetransfer.net/dictionary/ITIL/en/Service_Level_Agreement.htm >>> >>> >>> -- >>> Christophe Guéret >>> >>> -- >>> This message has been scanned for viruses and dangerous content by >>> *E.F.A. Project* <http://www.efa-project.org/>, and is believed to be >>> clean. >> >> >> >> [2] https://creately.com/diagram/i8lgl90p1/AXwUzXKQOHvEwUw9eJmxrndw%3D >> [3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Apr/0209.html >> [4] http://w3c.github.io/dwbp/usecasesv1.html#UC-Bio2RDF >> [5] *MailScanner has detected definite fraud in the website at >> "docs.google.com". Do not trust this website:* >> https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit >> <https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit> >> >> >> -- >> >> ---------------------------------------------------------------------------- >> Riccardo Albertoni >> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico >> Magenes" >> Consiglio Nazionale delle Ricerche >> via de Marini 6 - 16149 GENOVA - ITALIA >> tel. +39-010-6475624 - fax +39-010-6475660 >> e-mail: Riccardo.Albertoni@ge.imati.cnr.it >> Skype: callto://riccardoalbertoni/ >> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni >> www: http://www.ge.imati.cnr.it/Albertoni >> http://purl.oclc.org/NET/riccardoAlbertoni >> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf >> > > > > -- > > ---------------------------------------------------------------------------- > Riccardo Albertoni > Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico > Magenes" > Consiglio Nazionale delle Ricerche > via de Marini 6 - 16149 GENOVA - ITALIA > tel. +39-010-6475624 - fax +39-010-6475660 > e-mail: Riccardo.Albertoni@ge.imati.cnr.it > Skype: callto://riccardoalbertoni/ > LinkedIn: http://www.linkedin.com/in/riccardoalbertoni > www: http://www.ge.imati.cnr.it/Albertoni > http://purl.oclc.org/NET/riccardoAlbertoni > FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf > > > > -- > This message has been scanned for viruses and dangerous content by > *E.F.A. Project* <http://www.efa-project.org>, and is believed to be > clean. > -- ---------------------------------------------------------------------------- Riccardo Albertoni Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes" Consiglio Nazionale delle Ricerche via de Marini 6 - 16149 GENOVA - ITALIA tel. +39-010-6475624 - fax +39-010-6475660 e-mail: Riccardo.Albertoni@ge.imati.cnr.it Skype: callto://riccardoalbertoni/ LinkedIn: http://www.linkedin.com/in/riccardoalbertoni www: http://www.ge.imati.cnr.it/Albertoni http://purl.oclc.org/NET/riccardoAlbertoni FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Thursday, 14 May 2015 14:47:46 UTC