W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > June 2015

Re: New DQV editor's draft

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Tue, 2 Jun 2015 23:11:28 +0200
Message-ID: <556E1C00.3010405@few.vu.nl>
To: <public-dwbp-wg@w3.org>
Hi Makx,

Thanks for the comment!
I understand the concern. But I thought that requiring the publisher of SLA metadata to type the object of dcterms:conformsTo statements as instance of dqv:ServiceLevelAgreement would be enough to distinguish the SLAs from other types of resources potentially appearing as object of dcterms:conformsTo.
To me adding a new sub-property of dcterms:conforms to would be slightly redundant, even.

What do you think?
We have not discussed it with Christophe and Riccardo yet.
Of course if you (and Ghislain) remain unconvinced then it's probably a sign that the current proposal is less effective than what I thought.

Best,

Antoine

On 5/29/15 1:57 PM, Makx Dekkers wrote:
> One comment: I see in the latest draft that the proposal is to use
> dct:conformsTo linking to dqv:ServiceLevelAgreement, subclass of
> dct:Standard.
>
> I'd like to warn against overloading dct:conformsTo. I have heard
> suggestions in various groups to use dct:conformsTo for linking to an
> ODI-style open data certificate, to a legal basis for the publication of a
> Dataset, to temporal and spatial reference systems and to other types of
> specifications that have relevance for the understanding of the Dataset. In
> addition, conformsTo has also been suggested to link CatalogRecord to
> implementation guidelines and to an Application Profile that the metadata is
> based on. I am afraid that the processing of this information will not be
> possible if all kinds of 'standards' are lumped together.
>
> Would it be better in this case to create a 'local' property dqv:hasSLA as a
> subproperty of dct:conformsTo with range dqv:SLA that is a subclass of
> dct:Standard? It is then clear what the relationship is and what it is used
> for.
>
> Makx.
>
>
>
>> -----Original Message-----
>> From: Antoine Isaac [mailto:aisaac@few.vu.nl]
>> Sent: 29 May 2015 00:02
>> To: Public DWBP WG
>> Subject: Re: New DQV editor's draft
>>
>> Hi Jeremy,
>>
>> Thanks a lot!
>>
>> I have produce a new diagram with the Dataset out of the MetadataQuality
>> graph (even though I'm still convinced in terms of RDF statements it
> doesn't
>> make any difference). And added dqv:hasMetadataQuality as you suggested.
>> It's at http://w3c.github.io/dwbp/vocab-dqg.html
>> I hope I've captured your suggestions right.
>>
>> About dqv:QualityMetadata and daq:QualityGraph, you say:
>> [
>> That is why I suggested that dqv:QualityMetadata to be a subclass of the
>> daq:QualityGraph instead of rdfg:Graph, because QualityMetadata will
>> contain what a daq:QualityGraph should have + more information such as
>> having dcterms:Standard, dqv:Feedback etc..
>> ]
>> In fact here we might have a different interpretation of the semantics of
>> asserting sub-class links between types of graphs.
>> daq:QualityGraph is defined by "Defines a quality graph which will contain
> all
>> metadata about quality metrics on the dataset. "
>> daq:QualityGraph is a subclass of qb:Dataset, itself defined by
> "Represents a
>> collection of observations, possibly organized into various slices,
> conforming
>> to some common dimensional structure"
>>
>> I'd understand from these definitions that it is *not* welcome that an
>> instance of daq:QualityGraph (or a subclass of it) would contain data that
> is
>> not about metrics. Hence I was lukewarm on declaring dqv:QualityMetadata
>> a subclass of daq:QualityGraph!
>>
>> But if everyone thinks that a subclass of a class of graphs that contain
>> statements of a certain type may contain statements of other types than
> the
>> ones its super-class contains (in addition to these), then I'm alright
> with your
>> suggestion!
>>
>> (this assumes that everyone can parse the above sentence of course  :-)
>>
>> cheers,
>>
>> Antoine
>>
>> On 5/28/15 1:03 PM, Debattista, Jeremy wrote:
>>> Hi Antoine,
>>>
>>> Thanks for your replies. Things are more clear for me now. I will reply
> to
>> some of your comments.
>>>
>>>> 1. I agree that the dcat:Dataset instance is not quality metadata per
> se.
>> We've got a representation problem... The idea was that it is the
> statements
>> (e.g. one dcterms:conformsTo) that are in the quality graph.
>>>> Actually I don't think an instance can be said to be in the graph. It's
> only
>> statements that are contained in the graph.
>>>> Now, I have to say that I'm not sure how best we could represent this,
>> graphically.
>>>> Has anyone got an idea?
>>>
>>> I gave it a go. See att1.jpg. Bascally I've pushed out the orange box,
> but still
>> left the conformsTo, hasQualityMeasure, hasFeedback statements inside the
>> graph. Would this be more clear? I've added the dqv:hasQualityMetadata as
>> well (this does not have to be in the quality metadata itself).
>>>
>>>> 2. "then the dcat:Dataset points to the quality metadata graph" calls
> for
>> introducing another property dqv:hasQualityMetadata or something like
> this.
>>>> This is an interesting idea. If there's enough positive feedback, we
> could
>> add it. I'm adding a note right now on it.
>>>>
>>>> But I wouldn't be in favour of using it as a replacement for the direct
> links
>> between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
>>>> The idea is indeed to have a pattern that allows containment of all
> quality
>> statements (to allow for provenance tracking) while not putting this
>> containment as a hurdle for these who are less interested in it.
>>>> Say, if a Dataset comes with a SLA, I prefer to have a direct statement
>> between the instance of dcat:Dataset and the instance of dqv:SLA.
>> Otherwise one would have to retrieve and combine two statements:
>>>> - a link between a Dataset and a QualityGraph
>>>> - a statement that relates the QualityGraph with the SLA.
>>>> Not only this is a longer path, but one of the nodes is a graph, and
>>>> this could raise issues for these who are less comfortable with
>>>> graphs (including all of these who don't want to handle RDF syntaxes
>>>> for graphs!)
>>>
>>> I agree with your concerns - I was not viewing it from the provenance
>> perspective. But on the other hand, in my opinion the extra property
>> (hasQualityMetadata) wouldn't hurt neither - even though it might be
>> redundant at the end of the day.
>>>
>>>> C. There is a raised issue that says:
>>>> [
>>>> The label of daq:QualityGraph does not fit well with the current model.
>> DAQ graphs are meant to contain measures. In our context a "quality graph"
>> has a wider scope: actually the role of representing overall quality
> graphs is
>> currently played by dqv:QualityMetadata.
>>>> ]
>>>> I think the same initial argument applies to the suggestion of making
>> dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph
>> contain metadata about quality metrics on the dataset. I believe that
> there is
>> quality metadata that is not metrics. At least that's how we have started
> to
>> approach the problem. I'd be very eager to hear whether you think this is
> not
>> right!
>>>
>>> To be honest, I prefer the dqv:QualityMetadata term and the idea behind
> it
>> much more. Its intended use is more suitable in this case than the
>> daq:QualityGraph. The daq:QualityGraph is just a "special" RDF graph which
> is
>> also a cube dataset and as you rightly pointed out, it contains metadata
> about
>> quality metrics. My understanding of subclasses is "inheriting from the
>> parent class and more". That is why I suggested that dqv:QualityMetadata
> to
>> be a subclass of the daq:QualityGraph instead of rdfg:Graph, because
>> QualityMetadata will contain what a daq:QualityGraph should have + more
>> information such as having dcterms:Standard, dqv:Feedback etc..  Am I
> right
>> about this?
>>>
>>> Cheers,
>>> Jer
>>>
>>>
>>>
>>> On 25 May 2015, at 23:26, Antoine Isaac <aisaac@few.vu.nl
>> <mailto:aisaac@few.vu.nl>> wrote:
>>>
>>>> Hi Jeremy,
>>>>
>>>>
>>>>
>>>> On 5/22/15 10:45 AM, Debattista, Jeremy wrote:
>>>>>
>>>>> This looks great already.
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>> I would like to point out two issues which are not clear to me as yet:
>>>>>
>>>>> 1) In the diagram, shouldn't the dcat:Dataset be "outside" of the
> quality
>> metadata (and especially outside of the QualityGraph containment), and
>> then the dcat:Dataset points to the quality metadata graph?
>>>>>
>>>>> I don't know if this was done on purpose there or should have been
>> placed outside. If a dcat:Dataset (or distribution) is inside the quality
>> metadata boundaries, then my understanding as a consumer (I might be a
>> machine) would be that a dcat:Dataset instance is some kind of quality
>> information.
>>>>
>>>>
>>>> This is a tricky issue, which calls on two answers:
>>>>
>>>> 1. I agree that the dcat:Dataset instance is not quality metadata per
> se.
>> We've got a representation problem... The idea was that it is the
> statements
>> (e.g. one dcterms:conformsTo) that are in the quality graph.
>>>> Actually I don't think an instance can be said to be in the graph. It's
> only
>> statements that are contained in the graph.
>>>> Now, I have to say that I'm not sure how best we could represent this,
>> graphically.
>>>> Has anyone got an idea?
>>>>
>>>>
>>>>
>>>> 2. "then the dcat:Dataset points to the quality metadata graph" calls
> for
>> introducing another property dqv:hasQualityMetadata or something like
> this.
>>>> This is an interesting idea. If there's enough positive feedback, we
> could
>> add it. I'm adding a note right now on it.
>>>>
>>>> But I wouldn't be in favour of using it as a replacement for the direct
> links
>> between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
>>>> The idea is indeed to have a pattern that allows containment of all
> quality
>> statements (to allow for provenance tracking) while not putting this
>> containment as a hurdle for these who are less interested in it.
>>>> Say, if a Dataset comes with a SLA, I prefer to have a direct statement
>> between the instance of dcat:Dataset and the instance of dqv:SLA.
>> Otherwise one would have to retrieve and combine two statements:
>>>> - a link between a Dataset and a QualityGraph
>>>> - a statement that relates the QualityGraph with the SLA.
>>>> Not only this is a longer path, but one of the nodes is a graph, and
>>>> this could raise issues for these who are less comfortable with
>>>> graphs (including all of these who don't want to handle RDF syntaxes
>>>> for graphs!)
>>>>
>>>>
>>>>> 2) How about doing dqv:QualityMetadata as a subclass of
>> daq:QualityGraph?
>>>>>
>>>>> There are a number of advantages of doing so. First of all we don't
> have
>> to rely on multiple graphs. Although nothing is wrong with that, this
> might
>> make querying a bit harder. The daq:QualityGraph is a specialisation of
> the
>> rdf:Graph which is also a qb:Dataset. In this case the qb:dataset property
> can
>> have dqv:QualityMeasure as domain and dqv:QualityMetadata as its range.
>> This way we can move dcat:Dataset from the graph containment, and
>> removing the property "dqv:hasQualityMeasure" (this becomes redundant
>> as it can be inferred, if there is some link between dcat:Dataset and
>> dqv:QualityMetadata).
>>>>>
>>>>
>>>>
>>>> This is quite related to the previous issues. Interesting discussion!
>>>> I've tried to make a graph representing your proposal, in the attached
> file.
>>>> It's quite hard to untangle though. I'll have a try, please tell me
>>>> if I'm making any sense :-)
>>>>
>>>> A. As said above the instance of dcat:Dataset is not meant to be
> contained
>> in the quality metadata graph. Maybe this alleviates some of your
> concerns...
>>>>
>>>> B. I agree that if there was a link between the instance of
> dcat:Dataset
>> and the (merged) instance dqv:QualityMetadata/daq:QualityGraph, then
>> with a qb:dataset statement between instances of dqv:QualityMeasure and
>> the instance of dqv:QualityMetadata/daq:QualityGraph one could indeed
>> find a connection between the dcat:Dataset and the instances of
>> dqv:QualityMeasure.
>>>> But as said above I don't like the idea of removing the direct link. If
> a
>> dataset has some measure, say, a number of incorrect triples, why remove
>> the direct link?
>>>> This would put the provenance info in the way of the applications that
> are
>> less concerned about provenance.
>>>> As also said, I'm not against having the link between dcat:Dataset and
> the
>> (merged) instance dqv:QualityMetadata/daq:QualityGraph. But I wouldn't
>> want to use it as a motivation for removing dqv:hasQualityMeasure.
>>>>
>>>> C. There is a raised issue that says:
>>>> [
>>>> The label of daq:QualityGraph does not fit well with the current model.
>> DAQ graphs are meant to contain measures. In our context a "quality graph"
>> has a wider scope: actually the role of representing overall quality
> graphs is
>> currently played by dqv:QualityMetadata.
>>>> ]
>>>> I think the same initial argument applies to the suggestion of making
>> dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph
>> contain metadata about quality metrics on the dataset. I believe that
> there is
>> quality metadata that is not metrics. At least that's how we have started
> to
>> approach the problem. I'd be very eager to hear whether you think this is
> not
>> right!
>>>>
>>>>
>>>>>
>>>>> Once we have a first draft of the RDF schema, I will be happy to
> support
>> it in our Quality Assessment Framework.
>>>>
>>>>
>>>> This would be great!
>>>>
>>>> Thanks again for the comments - I hope I will not have discouraged
>>>> you by the length of the answers :)
>>>>
>>>> Cheers,
>>>>
>>>> Antoine
>>>>
>>>>>
>>>>> On 20 May 2015, at 23:39, Antoine Isaac <aisaac@few.vu.nl
>> <mailto:aisaac@few.vu.nl>> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> We've created a new editor's draft of the Data Quality Vocabulary on
>> Github [1].
>>>>>>
>>>>>> Most of it is in the diagram in section 3. We have placeholder for
>> material in other sections, but this is still work in progress.
>>>>>>
>>>>>> As you can see the diagram and the doc still have a lot of open
>>>>>> issues and questions. But we believe it's a positive evolution from
> the
>> previous version [2]. The patterns that we would like to use are
> stabilizing
>> Actually I'm curious to see how much of Jeremy's last comments [3] would
>> still apply!
>>>>>>
>>>>>> Needless to say, everyone else's feedback is highly welcome!
>>>>>>
>>>>>> Please excuse the discussion notes in the diagram itself. We thought
> of
>> creating a wiki page as we had done previously [2]. But I lacked the time
> to
>> do it. Maybe in the coming days, depending on how the discussion
> evolves...
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Antoine, on behalf of co-editors Riccardo and Christophe
>>>>>>
>>>>>> [1] http://w3c.github.io/dwbp/vocab-dqg.html
>>>>>> [2]
>>>>>>
>> https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_%28DQV%
>> 29
>>>>>> [3]
>>>>>> http://lists.w3.org/Archives/Public/public-dwbp-
>> wg/2015May/0037.htm
>>>>>> l
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> <JeremysProposal-150525.png>
>>>
>
>
>
Received on Tuesday, 2 June 2015 21:12:02 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 2 June 2015 21:12:03 UTC