Re: New DQV editor's draft from Antoine Isaac on 2015-05-25 (public-dwbp-wg@w3.org from May 2015)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Mon, 25 May 2015 23:26:19 +0200
To: "Debattista, Jeremy" <Jeremy.Debattista@iais.fraunhofer.de>, "Public DWBP WG" <public-dwbp-wg@w3.org>
Message-ID: <5563937B.9000601@few.vu.nl>
Hi Jeremy,



On 5/22/15 10:45 AM, Debattista, Jeremy wrote:
>
> This looks great already.


Thanks!


> I would like to point out two issues which are not clear to me as yet:
>
> 1) In the diagram, shouldn't the dcat:Dataset be "outside" of the quality metadata (and especially outside of the QualityGraph containment), and then the dcat:Dataset points to the quality metadata graph?
>
> I don't know if this was done on purpose there or should have been placed outside. If a dcat:Dataset (or distribution) is inside the quality metadata boundaries, then my understanding as a consumer (I might be a machine) would be that a dcat:Dataset instance is some kind of quality information.


This is a tricky issue, which calls on two answers:

1. I agree that the dcat:Dataset instance is not quality metadata per se. We've got a representation problem... The idea was that it is the statements (e.g. one dcterms:conformsTo) that are in the quality graph.
Actually I don't think an instance can be said to be in the graph. It's only statements that are contained in the graph.
Now, I have to say that I'm not sure how best we could represent this, graphically.
Has anyone got an idea?



2. "then the dcat:Dataset points to the quality metadata graph" calls for introducing another property dqv:hasQualityMetadata or something like this.
This is an interesting idea. If there's enough positive feedback, we could add it. I'm adding a note right now on it.

But I wouldn't be in favour of using it as a replacement for the direct links between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
The idea is indeed to have a pattern that allows containment of all quality statements (to allow for provenance tracking) while not putting this containment as a hurdle for these who are less interested in it.
Say, if a Dataset comes with a SLA, I prefer to have a direct statement between the instance of dcat:Dataset and the instance of dqv:SLA. Otherwise one would have to retrieve and combine two statements:
- a link between a Dataset and a QualityGraph
- a statement that relates the QualityGraph with the SLA.
Not only this is a longer path, but one of the nodes is a graph, and this could raise issues for these who are less comfortable with graphs (including all of these who don't want to handle RDF syntaxes for graphs!)


> 2) How about doing dqv:QualityMetadata as a subclass of daq:QualityGraph?
>
> There are a number of advantages of doing so. First of all we don't have to rely on multiple graphs. Although nothing is wrong with that, this might make querying a bit harder. The daq:QualityGraph is a specialisation of the rdf:Graph which is also a qb:Dataset. In this case the qb:dataset property can have dqv:QualityMeasure as domain and dqv:QualityMetadata as its range. This way we can move dcat:Dataset from the graph containment, and removing the property "dqv:hasQualityMeasure" (this becomes redundant as it can be inferred, if there is some link between dcat:Dataset and dqv:QualityMetadata).
>


This is quite related to the previous issues. Interesting discussion!
I've tried to make a graph representing your proposal, in the attached file.
It's quite hard to untangle though. I'll have a try, please tell me if I'm making any sense :-)

A. As said above the instance of dcat:Dataset is not meant to be contained in the quality metadata graph. Maybe this alleviates some of your concerns...

B. I agree that if there was a link between the instance of dcat:Dataset and the (merged) instance dqv:QualityMetadata/daq:QualityGraph, then with a qb:dataset statement between instances of dqv:QualityMeasure and the instance of dqv:QualityMetadata/daq:QualityGraph one could indeed find a connection between the dcat:Dataset and the instances of dqv:QualityMeasure.
But as said above I don't like the idea of removing the direct link. If a dataset has some measure, say, a number of incorrect triples, why remove the direct link?
This would put the provenance info in the way of the applications that are less concerned about provenance.
As also said, I'm not against having the link between dcat:Dataset and the (merged) instance dqv:QualityMetadata/daq:QualityGraph. But I wouldn't want to use it as a motivation for removing dqv:hasQualityMeasure.

C. There is a raised issue that says:
[
The label of daq:QualityGraph does not fit well with the current model. DAQ graphs are meant to contain measures. In our context a "quality graph" has a wider scope: actually the role of representing overall quality graphs is currently played by dqv:QualityMetadata.
]
I think the same initial argument applies to the suggestion of making dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph contain metadata about quality metrics on the dataset. I believe that there is quality metadata that is not metrics. At least that's how we have started to approach the problem. I'd be very eager to hear whether you think this is not right!


>
> Once we have a first draft of the RDF schema, I will be happy to support it in our Quality Assessment Framework.


This would be great!

Thanks again for the comments - I hope I will not have discouraged you by the length of the answers :)

Cheers,

Antoine

>
> On 20 May 2015, at 23:39, Antoine Isaac <aisaac@few.vu.nl> wrote:
>
>> Dear all,
>>
>> We've created a new editor's draft of the Data Quality Vocabulary on Github [1].
>>
>> Most of it is in the diagram in section 3. We have placeholder for material in other sections, but this is still work in progress.
>>
>> As you can see the diagram and the doc still have a lot of open issues and questions. But we believe it's a positive evolution from the previous version [2]. The patterns that we would like to use are stabilizing
>> Actually I'm curious to see how much of Jeremy's last comments [3] would still apply!
>>
>> Needless to say, everyone else's feedback is highly welcome!
>>
>> Please excuse the discussion notes in the diagram itself. We thought of creating a wiki page as we had done previously [2]. But I lacked the time to do it. Maybe in the coming days, depending on how the discussion evolves...
>>
>> Cheers,
>>
>> Antoine, on behalf of co-editors Riccardo and Christophe
>>
>> [1]  http://w3c.github.io/dwbp/vocab-dqg.html
>> [2]  https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_%28DQV%29
>> [3]  http://lists.w3.org/Archives/Public/public-dwbp-wg/2015May/0037.html
>>
>
>
>
Attachments

image/png attachment: JeremysProposal-150525.png
Received on Monday, 25 May 2015 21:26:54 UTC