Re: More feedback on DQV from Riccardo Albertoni on 2015-05-14 (public-dwbp-wg@w3.org from May 2015)

From: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
Date: Thu, 14 May 2015 16:47:20 +0200
To: "Debattista, Jeremy" <Jeremy.Debattista@iais.fraunhofer.de>
Cc: Christophe Guéret <christophe.gueret@dans.knaw.nl>, Antoine Isaac <aisaac@few.vu.nl>, Phil Archer <phila@w3.org>, "Deirdre Lee (Derilinx)" <deirdre@derilinx.com>, DWBP Public List <public-dwbp-wg@w3.org>
Message-ID: <CAOHhXmRBC27TSOu6qRUu4Dst7k3aWyo5AZs=OgJ1kvZvzP4Agg@mail.gmail.com>
Hi Jeremy,

Many thanks for your feedback! I think your comments provide  interesting
food for thought.
As far as I know, in these days, Christopher and Antoine are producing  an
updated version of DQV.  I guess most of your comments will be still
relevant for their updated version, in any case,  I plan to  discuss  your
comments in more  detail  after  reading the latest version.

Best,
Riccardo

On 13 May 2015 at 12:18, Debattista, Jeremy <
Jeremy.Debattista@iais.fraunhofer.de> wrote:

>  Hi Riccardo, all
>
>  First of all apologies for my very late reply. I was first busy with my
> ISWC paper (which I can share with you all [1]), and exactly afterwards I
> was on holidays. I just returned to the office today.
>
>  I was looking into [2], and I think the draft looks already great. I
> have some questions and comments:
>  (1) What is the difference between dqv:CollectionOfMetricResults and
> dqv:Feedback? Does the former relate to "machine computed metrics" and the
> latter to "human assessed metrics"? Or will the Feedback concept be used
> for user reporting of issues - like github issues?
>  (2) If we are interested in the reporting of issues, another kind of
> dqv:QualityData (issue #2 in dqv:QualityInfo and its specialisation
> section) could be the report of problems found in a dataset, when a dataset
> is assessed by machine computed metrics - see [3].
>  (3) "is DQV connected to DAQ in the correct way?" -> Not sure I've
> understood it right but for me dqv:hasDimension property is redundant
> there. None of its subclasses (i.e. ServicesLeveAgreement, Feedback and
> Standard) would require the dimension aspect, whilst
> dqv:CollectionOfMetricResults seems to be a generalised daQ metadata, which
> contains the Category-Dimension-Metric relationship.
>  (4) Wrt issue 164 [4], I think it is important that for example a
> consumer has an idea of how a dataset changed over time, therefore I guess
> that some statistics would not hurt - anyway we will get that for free with
> daQ.
>  (5) Wrt issue #2 in the dqv:CollectionOfMetricResults, the idea behind
> daq:computedOn is that each observation has the dataset URI or base URI
> (which might be the RDF file, but not ideal) attached to the observation.
> Although in theory I agree with forcing a computed observation to be
> connected to a dcat:Dataset, I think it would not be practical approach. As
> I said in some previous email, we have to consider those datasets that have
> no DCAT metadata (or have other metadata such as voID). Though if we plan
> to have the data quality vocabulary as an extension to dcat, then this is a
> natural choice.
>
>  Minor Corrections (not typos or spelling):
>  - The link in bullet 2 of Requirements section (i.e. see related action)
> seems to link to the meeting minutes and not to the action itself.
>  - Constraint -> y a dqv:Standard . ('a' is missing)
>
>  I hope this helps a little bit.
>
>  Best Regards,
> Jeremy
>
>  [1] http://arxiv.org/pdf/1412.3750v2.pdf
> [2] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV)
> [3] http://butterbur04.iai.uni-bonn.de/ontologies/qpro/qpro
> [4] https://www.w3.org/2013/dwbp/track/issues/164
>
>  On 04 May 2015, at 19:01, Riccardo Albertoni <albertoni@ge.imati.cnr.it>
> wrote:
>
>  Dear Data Quality Vocabulary editors,
>
>  you can find a new version of data quality concept schema in the wiki
>  [1]. It  includes the changes I have done considering the Christophe's
>  feedbacks:
>
> - renaming of  property "dqv:refersToDcatDataset" in
> "dqv:refersToDataset";
> - daq:qualityGraph is now a direct specialisation of our dqv:CollectionOfMetricsResults
> (I have opened a issue [2] to remind that we should discuss if  dataset
> metrics are a kind of quality info and how to include them in the DVQ);
> - added dqv:Feedback as a possible kind of dqv:QualityInfo. (I have
> opened an issue to figure out the relation between  dqv:Feedback  and duv:Feedback
> [3]).
>
>  The  conceptual schema is available as Google Drawing [4]. I guess there
> is still a lot to discuss about.
>    I have reported In the wiki  part of the discussion we had with
> Christophe hoping this might encourage other people to chime in :-)
>
>  Regards,
> Riccardo
>
>  [1] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_(DQV)
> [2] https://www.w3.org/2013/dwbp/track/issues/164
> [3] https://www.w3.org/2013/dwbp/track/issues/165
> [4] *MailScanner has detected definite fraud in the website at "goo.gl".
> Do not trust this website:* http://goo.gl/277o2R <http://goo.gl/277o2R>
>
> On 25 April 2015 at 13:22, Riccardo Albertoni <albertoni@ge.imati.cnr.it>
> wrote:
>
>> Hi Christophe,
>> Thanks a lot for your feedbacks, I am happy that the discussion  is
>> progressing.
>>
>>  I think Jeremy is going to provide some feedback soon, so it is
>> probably better to modify the  schema [2] "all at once" after we get the
>> Jeremy's feedbacks.
>> The DUV vocabulary is using google graph, which  we might consider as a
>> valid  alternative to  creately, in the next  DQV schema release.
>>
>>  In regards of  comments you made in [3]
>>
>>  - I agree on changing  "dqv:refersToDcatDataset" in
>> "dqv:refersToDataset";
>>
>>  -  regarding your <<I think the class inheritance arrows for
>> dqv:CollectionOfMetricsResults and dqv:Standard should be inversed.>>
>> I am not sure to understand  the rationale behind this suggestion, could
>> you explain more?
>>
>>  - regarding your <<Besides, I don't see why the collection of metrics
>> has to be a qb:Dataset>>
>>  Ok, I think we can have daq:qualityGraph as direct specialisation of
>> our dqv:CollectionOfMetricsResults.
>>  However,  I suggest to  open an issue  we can probably   address after
>> the DQV FPWD is released , saying "assuming  statistics about dataset  are
>>  considered a kind of quality information, as indicated in the UC document,
>> see use case bio2rdf [4], are statistics mentioned in  properly modelled in
>> the DQV?"
>> do you agree?
>>
>>  A couple of quick and inline responses to the comments from your very
>> last  email:
>>
>>
>>> # dqv:QualityInfo and its specialisation
>>>
>>> * is dqv:QualityInfo  the most appropriate name? would be
>>> dqv:QualityMetadata more appropriate?
>>>  Considering that in the BPs document we speak about meta-data maybe is
>>> Metadata the best. But on the other hand I'n not keen on hardcoding the
>>> notion of metadata in a vocabulary. Going for "QualityData" would be a
>>> middle way solution
>>>
>> mmm...  I have to admit that    I am not very comfortable in   calling it
>> qualityData either.  I mean QualityData is  a possible middle way solution,
>>   but dqv:QualityInfo/QualityData is currently a superclass of  dqv:ServiceLevelAgreement
>> and dqv:Standards  which are not proper data.  for this reason,  I would
>> prefer  dqv:QualityInfo rather  than QualityData.
>>
>>
>>> * are there other kinds of quality info that is  worth to consider?
>>> e.g., opinions, report of known issues, ...
>>>  That would be a good opportunity to link to DUV if we assume that
>>> usage correlates positively with dataset quality.
>>>
>>
>>  Sure, I have noticed  that in DUV [5] there is a  class duv:Feedback
>> which seems to be a good candidate to refer to.
>> In the current version of DUV,   duv:Feedback is related
>> to dqg:QualityCriteria,  I am not sure what exactly dqg:QualityCriteria
>>  stands for, but I guess it is supposed to be a class  in our DQV, and it
>> might roughly correspond to our dqv:QualityInfo ( or whatever
>> dqv:QualityInfo will be called at the end of our discussion :) )
>>
>>  So once we have a proper name for  our dqv:QualityInfo, and we have
>> accomplished our  first internal revision of the DQV,    we can open an
>> issues " let's find  an agreement on references between DQV and DUV", which
>> should be discussed with the rest of the group, and we might suggest  to
>> solve the issue:
>> -putting duv:Feedback as one of the possible specialization of
>> dqv:QualityInfo in both DQV and DUV Schemas.
>> -deleting  dqv:QualityInfo and the relation "describes"  in DUV.
>>
>>  Does this sound reasonable?
>>
>>
>>> * how the service level agreement can be represented? it is a document
>>> on the web to refer to or we want to refer to something more structured? is
>>> there any specific property we should add to dqv:ServiceLevelAgreement?
>>>  According to [1] "An SLA is best described as a collection of
>>> promises". It is also a document which just lists a couple of things. We
>>> could either focus on the document aspect as a whole or try to model the
>>> list of promises and the list of related concepts. My gut feeling is that
>>> this could lead to writing an elaborated vocabulary that would span out of
>>> our scope so I'd say we should rather not do that. But we should
>>> nonetheless anticipate that someone may some day want to work on that.
>>>
>>  What about then either not indicating any range or set Resource as a
>>> range ? Then everyone is free to model an SLA as he wants. And in the BPs
>>> would could hint that structured data is better and a PDF also ok.
>>>
>>>   Very good point! I agree on leaving it open to further  modelling of
>> promises by explicitly mention this possibility,  it  sounds like a very
>> good idea. Concerning your proposal to not indicating any range or set
>> Resource as a range, I think we should at least suggest a concrete  unique
>> way to include  a sla human readable descriptions, so that people don't
>> have the chance to be too much creative attaching  their html, pdf or
>>  whatever in the quality-related  metadata.
>>
>>
>>>   * how standard are represented under dqv:Standard?  is the class
>>> dqv:Standard suitable to include ODI certificates,  to represent that a
>>> DCAT dataset has a certain   compliancy to  5 LOD stars, and other kind of
>>> bets practices? should we explicitly provide a list/taxonomy of standard/
>>> certificated to consider?  is the class dqv:Standard really necessary or we
>>> can rely directly on dcterm:Standard?
>>>  We should be flexible here. If we impose a specific class then ODI
>>> certificates and 5star models will have to subclass from it, so we should
>>> make it generic enough conceptually so that this works. To that respect I
>>> think dcterm:Standard is quite nice so we may want to re-use it. Or
>>> subclass from it our own Standard which is a verbatim copy of it in case
>>> some day the meaning of dcterm:Standard changes in a way that brake our
>>> vocabulary.
>>>
>>>   I see you point,  I agree we have to be flexible and generic enough
>> but I am not sure to understand what you are suggesting here.
>>
>>
>>>  * can we assume the following constraint ?
>>>   x a dcat:Dataset. x dcterms:conformsTo y '''imply'''  x hasQualityInfo
>>> y. y dqv:Standard
>>>  Think so. We should have more of these BTW :-)
>>>
>>
>>  Sure, I think other constraints will come out quite naturally during
>> the design process. I am not very concerned about it, by the way, I'd like
>> that anyone working on DQV feels free to suggest some ;)
>>
>>  Have a nice weekend,
>> Riccardo
>>
>>
>>>
>>>  Cheers,
>>>  Christophe
>>>
>>>
>>>
>>>
>>> [1]
>>> http://www.knowledgetransfer.net/dictionary/ITIL/en/Service_Level_Agreement.htm
>>>
>>>
>>> --
>>> Christophe Guéret
>>>
>>> --
>>> This message has been scanned for viruses and dangerous content by
>>> *E.F.A. Project* <http://www.efa-project.org/>, and is believed to be
>>> clean.
>>
>>
>>
>>  [2] https://creately.com/diagram/i8lgl90p1/AXwUzXKQOHvEwUw9eJmxrndw%3D
>> [3] https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Apr/0209.html
>> [4] http://w3c.github.io/dwbp/usecasesv1.html#UC-Bio2RDF
>> [5] *MailScanner has detected definite fraud in the website at
>> "docs.google.com". Do not trust this website:*
>> https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit
>> <https://docs.google.com/drawings/d/1aq3vPcoj0SPs5BispD6umQNejrBTwkhsSYu6Y1adUjw/edit>
>>
>>
>>  --
>>
>> ----------------------------------------------------------------------------
>> Riccardo Albertoni
>> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
>> Magenes"
>> Consiglio Nazionale delle Ricerche
>> via de Marini 6 - 16149 GENOVA - ITALIA
>> tel. +39-010-6475624 - fax +39-010-6475660
>> e-mail: Riccardo.Albertoni@ge.imati.cnr.it
>> Skype: callto://riccardoalbertoni/
>> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
>> www: http://www.ge.imati.cnr.it/Albertoni
>> http://purl.oclc.org/NET/riccardoAlbertoni
>> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
>>
>
>
>
>  --
>
> ----------------------------------------------------------------------------
> Riccardo Albertoni
> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
> Magenes"
> Consiglio Nazionale delle Ricerche
> via de Marini 6 - 16149 GENOVA - ITALIA
> tel. +39-010-6475624 - fax +39-010-6475660
> e-mail: Riccardo.Albertoni@ge.imati.cnr.it
> Skype: callto://riccardoalbertoni/
> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
> www: http://www.ge.imati.cnr.it/Albertoni
> http://purl.oclc.org/NET/riccardoAlbertoni
> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
>
>
>
> --
> This message has been scanned for viruses and dangerous content by
> *E.F.A. Project* <http://www.efa-project.org>, and is believed to be
> clean.
>



-- 
----------------------------------------------------------------------------
Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico
Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624 - fax +39-010-6475660
e-mail: Riccardo.Albertoni@ge.imati.cnr.it
Skype: callto://riccardoalbertoni/
LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
www: http://www.ge.imati.cnr.it/Albertoni
http://purl.oclc.org/NET/riccardoAlbertoni
FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Thursday, 14 May 2015 14:47:46 UTC