- From: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
- Date: Thu, 14 May 2015 16:47:20 +0200
- To: "Debattista, Jeremy" <Jeremy.Debattista@iais.fraunhofer.de>
- Cc: Christophe Guéret <christophe.gueret@dans.knaw.nl>, Antoine Isaac <aisaac@few.vu.nl>, Phil Archer <phila@w3.org>, "Deirdre Lee (Derilinx)" <deirdre@derilinx.com>, DWBP Public List <public-dwbp-wg@w3.org>
- Message-ID: <CAOHhXmRBC27TSOu6qRUu4Dst7k3aWyo5AZs=OgJ1kvZvzP4Agg@mail.gmail.com>
Hi Jeremy,
Many thanks for your feedback! I think your comments provide interesting
food for thought.
As far as I know, in these days, Christopher and Antoine are producing an
updated version of DQV. I guess most of your comments will be still
relevant for their updated version, in any case, I plan to discuss your
comments in more detail after reading the latest version.
Best,
Riccardo
On 13 May 2015 at 12:18, Debattista, Jeremy <
Jeremy.Debattista@iais.fraunhofer.de> wrote:
> Hi Riccardo, all
>
> First of all apologies for my very late reply. I was first busy with my
> ISWC paper (which I can share with you all [1]), and exactly afterwards I
> was on holidays. I just returned to the office today.
>
> I was looking into [2], and I think the draft looks already great. I
> have some questions and comments:
> (1) What is the difference between dqv:CollectionOfMetricResults and
> dqv:Feedback? Does the former relate to "machine computed metrics" and the
> latter to "human assessed metrics"? Or will the Feedback concept be used
> for user reporting of issues - like github issues?
> (2) If we are interested in the reporting of issues, another kind of
> dqv:QualityData (issue #2 in dqv:QualityInfo and its specialisation
> section) could be the report of problems found in a dataset, when a dataset
> is assessed by machine computed metrics - see [3].
> (3) "is DQV connected to DAQ in the correct way?" -> Not sure I've
> understood it right but for me dqv:hasDimension property is redundant
> there. None of its subclasses (i.e. ServicesLeveAgreement, Feedback and
> Standard) would require the dimension aspect, whilst
> dqv:CollectionOfMetricResults seems to be a generalised daQ metadata, which
> contains the Category-Dimension-Metric relationship.
> (4) Wrt issue 164 [4], I think it is important that for example a
> consumer has an idea of how a dataset changed over time, therefore I guess
> that some statistics would not hurt - anyway we will get that for free with
> daQ.
> (5) Wrt issue #2 in the dqv:CollectionOfMetricResults, the idea behind
> daq:computedOn is that each observation has the dataset URI or base URI
> (which might be the RDF file, but not ideal) attached to the observation.
> Although in theory I agree with forcing a computed observation to be
> connected to a dcat:Dataset, I think it would not be practical approach. As
> I said in some previous email, we have to consider those datasets that have
> no DCAT metadata (or have other metadata such as voID). Though if we plan
> to have the data quality vocabulary as an extension to dcat, then this is a
> natural choice.
>
> Minor Corrections (not typos or spelling):
> - The link in bullet 2 of Requirements section (i.e. see related action)
> seems to link to the meeting minutes and not to the action itself.
> - Constraint -> y a dqv:Standard . ('a' is missing)
>
> I hope this helps a little bit.
>
> Best Regards,
> Jeremy
>
> [1] http://arxiv.org/pdf/1412.3750v2.pdf
> [2] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV)
> [3] http://butterbur04.iai.uni-bonn.de/ontologies/qpro/qpro
> [4] https://www.w3.org/2013/dwbp/track/issues/164
>
> On 04 May 2015, at 19:01, Riccardo Albertoni <albertoni@ge.imati.cnr.it>
> wrote:
>
> Dear Data Quality Vocabulary editors,
>
> you can find a new version of data quality concept schema in the wiki
> [1]. It includes the changes I have done considering the Christophe's
> feedbacks:
>
> - renaming of property "dqv:refersToDcatDataset" in
> "dqv:refersToDataset";
> - daq:qualityGraph is now a direct specialisation of our dqv:CollectionOfMetricsResults
> (I have opened a issue [2] to remind that we should discuss if dataset
> metrics are a kind of quality info and how to include them in the DVQ);
> - added dqv:Feedback as a possible kind of dqv:QualityInfo. (I have
> opened an issue to figure out the relation between dqv:Feedback and duv:Feedback
> [3]).
>
> The conceptual schema is available as Google Drawing [4]. I guess there
> is still a lot to discuss about.
> I have reported In the wiki part of the discussion we had with
> Christophe hoping this might encourage other people to chime in :-)
>
> Regards,
> Riccardo
>
> [1] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV)
> [2] https://www.w3.org/2013/dwbp/track/issues/164
> [3] https://www.w3.org/2013/dwbp/track/issues/165
> [4] *MailScanner has detected definite fraud in the website at "goo.gl".
> Do not trust this website:* http://goo.gl/277o2R <http://goo.gl/277o2R>
>
> On 25 April 2015 at 13:22, Riccardo Albertoni <albertoni@ge.imati.cnr.it>
> wrote:
>
>> Hi Christophe,
>> Thanks a lot for your feedbacks, I am happy that the discussion is
>> progressing.
>>
>> I think Jeremy is going to provide some feedback soon, so it is
>> probably better to modify the schema [2] "all at once" after we get the
>> Jeremy's feedbacks.
>> The DUV vocabulary is using google graph, which we might consider as a
>> valid alternative to creately, in the next DQV schema release.
>>
>> In regards of comments you made in [3]
>>
>> - I agree on changing "dqv:refersToDcatDataset" in
>> "dqv:refersToDataset";
>>
>> - regarding your <<I think the class inheritance arrows for
>> dqv:CollectionOfMetricsResults and dqv:Standard should be inversed.>>
>> I am not sure to understand the rationale behind this suggestion, could
>> you explain more?
>>
>> - regarding your <<Besides, I don't see why the collection of metrics
>> has to be a qb:Dataset>>
>> Ok, I think we can have daq:qualityGraph as direct specialisation of
>> our dqv:CollectionOfMetricsResults.
>> However, I suggest to open an issue we can probably address after
>> the DQV FPWD is released , saying "assuming statistics about dataset are
>> considered a kind of quality information, as indicated in the UC document,
>> see use case bio2rdf [4], are statistics mentioned in properly modelled in
>> the DQV?"
>> do you agree?
>>
>> A couple of quick and inline responses to the comments from your very
>> last email:
>>
>>
>>> # dqv:QualityInfo and its specialisation
>>>
>>> * is dqv:QualityInfo the most appropriate name? would be
>>> dqv:QualityMetadata more appropriate?
>>> Considering that in the BPs document we speak about meta-data maybe is
>>> Metadata the best. But on the other hand I'n not keen on hardcoding the
>>> notion of metadata in a vocabulary. Going for "QualityData" would be a
>>> middle way solution
>>>
>> mmm... I have to admit that I am not very comfortable in calling it
>> qualityData either. I mean QualityData is a possible middle way solution,
>> but dqv:QualityInfo/QualityData is currently a superclass of dqv:ServiceLevelAgreement
>> and dqv:Standards which are not proper data. for this reason, I would
>> prefer dqv:QualityInfo rather than QualityData.
>>
>>
>>> * are there other kinds of quality info that is worth to consider?
>>> e.g., opinions, report of known issues, ...
>>> That would be a good opportunity to link to DUV if we assume that
>>> usage correlates positively with dataset quality.
>>>
>>
>> Sure, I have noticed that in DUV [5] there is a class duv:Feedback
>> which seems to be a good candidate to refer to.
>> In the current version of DUV, duv:Feedback is related
>> to dqg:QualityCriteria, I am not sure what exactly dqg:QualityCriteria
>> stands for, but I guess it is supposed to be a class in our DQV, and it
>> might roughly correspond to our dqv:QualityInfo ( or whatever
>> dqv:QualityInfo will be called at the end of our discussion :) )
>>
>> So once we have a proper name for our dqv:QualityInfo, and we have
>> accomplished our first internal revision of the DQV, we can open an
>> issues " let's find an agreement on references between DQV and DUV", which
>> should be discussed with the rest of the group, and we might suggest to
>> solve the issue:
>> -putting duv:Feedback as one of the possible specialization of
>> dqv:QualityInfo in both DQV and DUV Schemas.
>> -deleting dqv:QualityInfo and the relation "describes" in DUV.
>>
>> Does this sound reasonable?
>>
>>
>>> * how the service level agreement can be represented? it is a document
>>> on the web to refer to or we want to refer to something more structured? is
>>> there any specific property we should add to dqv:ServiceLevelAgreement?
>>> According to [1] "An SLA is best described as a collection of
>>> promises". It is also a document which just lists a couple of things. We
>>> could either focus on the document aspect as a whole or try to model the
>>> list of promises and the list of related concepts. My gut feeling is that
>>> this could lead to writing an elaborated vocabulary that would span out of
>>> our scope so I'd say we should rather not do that. But we should
>>> nonetheless anticipate that someone may some day want to work on that.
>>>
>> What about then either not indicating any range or set Resource as a
>>> range ? Then everyone is free to model an SLA as he wants. And in the BPs
>>> would could hint that structured data is better and a PDF also ok.
>>>
>>> Very good point! I agree on leaving it open to further modelling of
>> promises by explicitly mention this possibility, it sounds like a very
>> good idea. Concerning your proposal to not indicating any range or set
>> Resource as a range, I think we should at least suggest a concrete unique
>> way to include a sla human readable descriptions, so that people don't
>> have the chance to be too much creative attaching their html, pdf or
>> whatever in the quality-related metadata.
>>
>>
>>> * how standard are represented under dqv:Standard? is the class
>>> dqv:Standard suitable to include ODI certificates, to represent that a
>>> DCAT dataset has a certain compliancy to 5 LOD stars, and other kind of
>>> bets practices? should we explicitly provide a list/taxonomy of standard/
>>> certificated to consider? is the class dqv:Standard really necessary or we
>>> can rely directly on dcterm:Standard?
>>> We should be flexible here. If we impose a specific class then ODI
>>> certificates and 5star models will have to subclass from it, so we should
>>> make it generic enough conceptually so that this works. To that respect I
>>> think dcterm:Standard is quite nice so we may want to re-use it. Or
>>> subclass from it our own Standard which is a verbatim copy of it in case
>>> some day the meaning of dcterm:Standard changes in a way that brake our
>>> vocabulary.
>>>
>>> I see you point, I agree we have to be flexible and generic enough
>> but I am not sure to understand what you are suggesting here.
>>
>>
>>> * can we assume the following constraint ?
>>> x a dcat:Dataset. x dcterms:conformsTo y '''imply''' x hasQualityInfo
>>> y. y dqv:Standard
>>> Think so. We should have more of these BTW :-)
>>>
>>
>> Sure, I think other constraints will come out quite naturally during
>> the design process. I am not very concerned about it, by the way, I'd like
>> that anyone working on DQV feels free to suggest some ;)
>>
>> Have a nice weekend,
>> Riccardo
>>
>>
>>>
>>> Cheers,
>>> Christophe
>>>
>>>
>>>
>>>
>>> [1]
>>> http://www.knowledgetransfer.net/dictionary/ITIL/en/Service_Level_Agreement.htm
>>>
>>>
>>> --
>>> Christophe Guéret
>>>
>>> --
>>> This message has been scanned for viruses and dangerous content by
>>> *E.F.A. Project* <http://www.efa-project.org/>, and is believed to be
>>> clean.
>>
>>
>>
>> [2] https://creately.com/diagram/i8lgl90p1/AXwUzXKQOHvEwUw9eJmxrndw%3D
>> [3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Apr/0209.html
>> [4] http://w3c.github.io/dwbp/usecasesv1.html#UC-Bio2RDF
>> [5] *MailScanner has detected definite fraud in the website at
>> "docs.google.com". Do not trust this website:*
>> https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit
>> <https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit>
>>
>>
>> --
>>
>> ----------------------------------------------------------------------------
>> Riccardo Albertoni
>> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
>> Magenes"
>> Consiglio Nazionale delle Ricerche
>> via de Marini 6 - 16149 GENOVA - ITALIA
>> tel. +39-010-6475624 - fax +39-010-6475660
>> e-mail: Riccardo.Albertoni@ge.imati.cnr.it
>> Skype: callto://riccardoalbertoni/
>> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
>> www: http://www.ge.imati.cnr.it/Albertoni
>> http://purl.oclc.org/NET/riccardoAlbertoni
>> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
>>
>
>
>
> --
>
> ----------------------------------------------------------------------------
> Riccardo Albertoni
> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
> Magenes"
> Consiglio Nazionale delle Ricerche
> via de Marini 6 - 16149 GENOVA - ITALIA
> tel. +39-010-6475624 - fax +39-010-6475660
> e-mail: Riccardo.Albertoni@ge.imati.cnr.it
> Skype: callto://riccardoalbertoni/
> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
> www: http://www.ge.imati.cnr.it/Albertoni
> http://purl.oclc.org/NET/riccardoAlbertoni
> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
>
>
>
> --
> This message has been scanned for viruses and dangerous content by
> *E.F.A. Project* <http://www.efa-project.org>, and is believed to be
> clean.
>
--
----------------------------------------------------------------------------
Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624 - fax +39-010-6475660
e-mail: Riccardo.Albertoni@ge.imati.cnr.it
Skype: callto://riccardoalbertoni/
LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
www: http://www.ge.imati.cnr.it/Albertoni
http://purl.oclc.org/NET/riccardoAlbertoni
FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Thursday, 14 May 2015 14:47:46 UTC