Re: New DQV editor's draft from Antoine Isaac on 2015-05-28 (public-dwbp-wg@w3.org from May 2015)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Fri, 29 May 2015 00:02:02 +0200
To: Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <5567905A.3010809@few.vu.nl>
Hi Jeremy,

Thanks a lot!

I have produce a new diagram with the Dataset out of the MetadataQuality graph (even though I'm still convinced in terms of RDF statements it doesn't make any difference). And added dqv:hasMetadataQuality as you suggested. It's at
http://w3c.github.io/dwbp/vocab-dqg.html
I hope I've captured your suggestions right.

About dqv:QualityMetadata and daq:QualityGraph, you say:
[
That is why I suggested that dqv:QualityMetadata to be a subclass of the daq:QualityGraph instead of rdfg:Graph, because QualityMetadata will contain what a daq:QualityGraph should have + more information such as having dcterms:Standard, dqv:Feedback etc..
]
In fact here we might have a different interpretation of the semantics of asserting sub-class links between types of graphs.
daq:QualityGraph is defined by "Defines a quality graph which will contain all metadata about quality metrics on the dataset. "
daq:QualityGraph is a subclass of qb:Dataset, itself defined by "Represents a collection of observations, possibly organized into various slices, conforming to some common dimensional structure"

I'd understand from these definitions that it is *not* welcome that an instance of daq:QualityGraph (or a subclass of it) would contain data that is not about metrics. Hence I was lukewarm on declaring dqv:QualityMetadata a subclass of daq:QualityGraph!

But if everyone thinks that a subclass of a class of graphs that contain statements of a certain type may contain statements of other types than the ones its super-class contains (in addition to these), then I'm alright with your suggestion!

(this assumes that everyone can parse the above sentence of course  :-)

cheers,

Antoine

On 5/28/15 1:03 PM, Debattista, Jeremy wrote:
> Hi Antoine,
>
> Thanks for your replies. Things are more clear for me now. I will reply to some of your comments.
>
>> 1. I agree that the dcat:Dataset instance is not quality metadata per se. We've got a representation problem... The idea was that it is the statements (e.g. one dcterms:conformsTo) that are in the quality graph.
>> Actually I don't think an instance can be said to be in the graph. It's only statements that are contained in the graph.
>> Now, I have to say that I'm not sure how best we could represent this, graphically.
>> Has anyone got an idea?
>
> I gave it a go. See att1.jpg. Bascally I’ve pushed out the orange box, but still left the conformsTo, hasQualityMeasure, hasFeedback statements inside the graph. Would this be more clear? I’ve added the dqv:hasQualityMetadata as well (this does not have to be in the quality metadata itself).
>
>> 2. "then the dcat:Dataset points to the quality metadata graph" calls for introducing another property dqv:hasQualityMetadata or something like this.
>> This is an interesting idea. If there's enough positive feedback, we could add it. I'm adding a note right now on it.
>>
>> But I wouldn't be in favour of using it as a replacement for the direct links between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
>> The idea is indeed to have a pattern that allows containment of all quality statements (to allow for provenance tracking) while not putting this containment as a hurdle for these who are less interested in it.
>> Say, if a Dataset comes with a SLA, I prefer to have a direct statement between the instance of dcat:Dataset and the instance of dqv:SLA. Otherwise one would have to retrieve and combine two statements:
>> - a link between a Dataset and a QualityGraph
>> - a statement that relates the QualityGraph with the SLA.
>> Not only this is a longer path, but one of the nodes is a graph, and this could raise issues for these who are less comfortable with graphs (including all of these who don't want to handle RDF syntaxes for graphs!)
>
> I agree with your concerns - I was not viewing it from the provenance perspective. But on the other hand, in my opinion the extra property (hasQualityMetadata) wouldn’t hurt neither - even though it might be redundant at the end of the day.
>
>> C. There is a raised issue that says:
>> [
>> The label of daq:QualityGraph does not fit well with the current model. DAQ graphs are meant to contain measures. In our context a "quality graph" has a wider scope: actually the role of representing overall quality graphs is currently played by dqv:QualityMetadata.
>> ]
>> I think the same initial argument applies to the suggestion of making dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph contain metadata about quality metrics on the dataset. I believe that there is quality metadata that is not metrics. At least that's how we have started to approach the problem. I'd be very eager to hear whether you think this is not right!
>
> To be honest, I prefer the dqv:QualityMetadata term and the idea behind it much more. Its intended use is more suitable in this case than the daq:QualityGraph. The daq:QualityGraph is just a “special” RDF graph which is also a cube dataset and as you rightly pointed out, it contains metadata about quality metrics. My understanding of subclasses is “inheriting from the parent class and more”. That is why I suggested that dqv:QualityMetadata to be a subclass of the daq:QualityGraph instead of rdfg:Graph, because QualityMetadata will contain what a daq:QualityGraph should have + more information such as having dcterms:Standard, dqv:Feedback etc..  Am I right about this?
>
> Cheers,
> Jer
>
>
>
> On 25 May 2015, at 23:26, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>> wrote:
>
>> Hi Jeremy,
>>
>>
>>
>> On 5/22/15 10:45 AM, Debattista, Jeremy wrote:
>>>
>>> This looks great already.
>>
>>
>> Thanks!
>>
>>
>>> I would like to point out two issues which are not clear to me as yet:
>>>
>>> 1) In the diagram, shouldn't the dcat:Dataset be "outside" of the quality metadata (and especially outside of the QualityGraph containment), and then the dcat:Dataset points to the quality metadata graph?
>>>
>>> I don't know if this was done on purpose there or should have been placed outside. If a dcat:Dataset (or distribution) is inside the quality metadata boundaries, then my understanding as a consumer (I might be a machine) would be that a dcat:Dataset instance is some kind of quality information.
>>
>>
>> This is a tricky issue, which calls on two answers:
>>
>> 1. I agree that the dcat:Dataset instance is not quality metadata per se. We've got a representation problem... The idea was that it is the statements (e.g. one dcterms:conformsTo) that are in the quality graph.
>> Actually I don't think an instance can be said to be in the graph. It's only statements that are contained in the graph.
>> Now, I have to say that I'm not sure how best we could represent this, graphically.
>> Has anyone got an idea?
>>
>>
>>
>> 2. "then the dcat:Dataset points to the quality metadata graph" calls for introducing another property dqv:hasQualityMetadata or something like this.
>> This is an interesting idea. If there's enough positive feedback, we could add it. I'm adding a note right now on it.
>>
>> But I wouldn't be in favour of using it as a replacement for the direct links between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
>> The idea is indeed to have a pattern that allows containment of all quality statements (to allow for provenance tracking) while not putting this containment as a hurdle for these who are less interested in it.
>> Say, if a Dataset comes with a SLA, I prefer to have a direct statement between the instance of dcat:Dataset and the instance of dqv:SLA. Otherwise one would have to retrieve and combine two statements:
>> - a link between a Dataset and a QualityGraph
>> - a statement that relates the QualityGraph with the SLA.
>> Not only this is a longer path, but one of the nodes is a graph, and this could raise issues for these who are less comfortable with graphs (including all of these who don't want to handle RDF syntaxes for graphs!)
>>
>>
>>> 2) How about doing dqv:QualityMetadata as a subclass of daq:QualityGraph?
>>>
>>> There are a number of advantages of doing so. First of all we don't have to rely on multiple graphs. Although nothing is wrong with that, this might make querying a bit harder. The daq:QualityGraph is a specialisation of the rdf:Graph which is also a qb:Dataset. In this case the qb:dataset property can have dqv:QualityMeasure as domain and dqv:QualityMetadata as its range. This way we can move dcat:Dataset from the graph containment, and removing the property "dqv:hasQualityMeasure" (this becomes redundant as it can be inferred, if there is some link between dcat:Dataset and dqv:QualityMetadata).
>>>
>>
>>
>> This is quite related to the previous issues. Interesting discussion!
>> I've tried to make a graph representing your proposal, in the attached file.
>> It's quite hard to untangle though. I'll have a try, please tell me if I'm making any sense :-)
>>
>> A. As said above the instance of dcat:Dataset is not meant to be contained in the quality metadata graph. Maybe this alleviates some of your concerns...
>>
>> B. I agree that if there was a link between the instance of dcat:Dataset and the (merged) instance dqv:QualityMetadata/daq:QualityGraph, then with a qb:dataset statement between instances of dqv:QualityMeasure and the instance of dqv:QualityMetadata/daq:QualityGraph one could indeed find a connection between the dcat:Dataset and the instances of dqv:QualityMeasure.
>> But as said above I don't like the idea of removing the direct link. If a dataset has some measure, say, a number of incorrect triples, why remove the direct link?
>> This would put the provenance info in the way of the applications that are less concerned about provenance.
>> As also said, I'm not against having the link between dcat:Dataset and the (merged) instance dqv:QualityMetadata/daq:QualityGraph. But I wouldn't want to use it as a motivation for removing dqv:hasQualityMeasure.
>>
>> C. There is a raised issue that says:
>> [
>> The label of daq:QualityGraph does not fit well with the current model. DAQ graphs are meant to contain measures. In our context a "quality graph" has a wider scope: actually the role of representing overall quality graphs is currently played by dqv:QualityMetadata.
>> ]
>> I think the same initial argument applies to the suggestion of making dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph contain metadata about quality metrics on the dataset. I believe that there is quality metadata that is not metrics. At least that's how we have started to approach the problem. I'd be very eager to hear whether you think this is not right!
>>
>>
>>>
>>> Once we have a first draft of the RDF schema, I will be happy to support it in our Quality Assessment Framework.
>>
>>
>> This would be great!
>>
>> Thanks again for the comments - I hope I will not have discouraged you by the length of the answers :)
>>
>> Cheers,
>>
>> Antoine
>>
>>>
>>> On 20 May 2015, at 23:39, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> We've created a new editor's draft of the Data Quality Vocabulary on Github [1].
>>>>
>>>> Most of it is in the diagram in section 3. We have placeholder for material in other sections, but this is still work in progress.
>>>>
>>>> As you can see the diagram and the doc still have a lot of open issues and questions. But we believe it's a positive evolution from the previous version [2]. The patterns that we would like to use are stabilizing
>>>> Actually I'm curious to see how much of Jeremy's last comments [3] would still apply!
>>>>
>>>> Needless to say, everyone else's feedback is highly welcome!
>>>>
>>>> Please excuse the discussion notes in the diagram itself. We thought of creating a wiki page as we had done previously [2]. But I lacked the time to do it. Maybe in the coming days, depending on how the discussion evolves...
>>>>
>>>> Cheers,
>>>>
>>>> Antoine, on behalf of co-editors Riccardo and Christophe
>>>>
>>>> [1] http://w3c.github.io/dwbp/vocab-dqg.html
>>>> [2] https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_%28DQV%29
>>>> [3] http://lists.w3.org/Archives/Public/public-dwbp-wg/2015May/0037.html
>>>>
>>>
>>>
>>>
>> <JeremysProposal-150525.png>
>
Received on Thursday, 28 May 2015 22:02:32 UTC