RE: New DQV editor's draft

One comment: I see in the latest draft that the proposal is to use
dct:conformsTo linking to dqv:ServiceLevelAgreement, subclass of
dct:Standard.

I'd like to warn against overloading dct:conformsTo. I have heard
suggestions in various groups to use dct:conformsTo for linking to an
ODI-style open data certificate, to a legal basis for the publication of a
Dataset, to temporal and spatial reference systems and to other types of
specifications that have relevance for the understanding of the Dataset. In
addition, conformsTo has also been suggested to link CatalogRecord to
implementation guidelines and to an Application Profile that the metadata is
based on. I am afraid that the processing of this information will not be
possible if all kinds of 'standards' are lumped together.

Would it be better in this case to create a 'local' property dqv:hasSLA as a
subproperty of dct:conformsTo with range dqv:SLA that is a subclass of
dct:Standard? It is then clear what the relationship is and what it is used
for.

Makx.



> -----Original Message-----
> From: Antoine Isaac [mailto:aisaac@few.vu.nl]
> Sent: 29 May 2015 00:02
> To: Public DWBP WG
> Subject: Re: New DQV editor's draft
> 
> Hi Jeremy,
> 
> Thanks a lot!
> 
> I have produce a new diagram with the Dataset out of the MetadataQuality
> graph (even though I'm still convinced in terms of RDF statements it
doesn't
> make any difference). And added dqv:hasMetadataQuality as you suggested.
> It's at http://w3c.github.io/dwbp/vocab-dqg.html
> I hope I've captured your suggestions right.
> 
> About dqv:QualityMetadata and daq:QualityGraph, you say:
> [
> That is why I suggested that dqv:QualityMetadata to be a subclass of the
> daq:QualityGraph instead of rdfg:Graph, because QualityMetadata will
> contain what a daq:QualityGraph should have + more information such as
> having dcterms:Standard, dqv:Feedback etc..
> ]
> In fact here we might have a different interpretation of the semantics of
> asserting sub-class links between types of graphs.
> daq:QualityGraph is defined by "Defines a quality graph which will contain
all
> metadata about quality metrics on the dataset. "
> daq:QualityGraph is a subclass of qb:Dataset, itself defined by
"Represents a
> collection of observations, possibly organized into various slices,
conforming
> to some common dimensional structure"
> 
> I'd understand from these definitions that it is *not* welcome that an
> instance of daq:QualityGraph (or a subclass of it) would contain data that
is
> not about metrics. Hence I was lukewarm on declaring dqv:QualityMetadata
> a subclass of daq:QualityGraph!
> 
> But if everyone thinks that a subclass of a class of graphs that contain
> statements of a certain type may contain statements of other types than
the
> ones its super-class contains (in addition to these), then I'm alright
with your
> suggestion!
> 
> (this assumes that everyone can parse the above sentence of course  :-)
> 
> cheers,
> 
> Antoine
> 
> On 5/28/15 1:03 PM, Debattista, Jeremy wrote:
> > Hi Antoine,
> >
> > Thanks for your replies. Things are more clear for me now. I will reply
to
> some of your comments.
> >
> >> 1. I agree that the dcat:Dataset instance is not quality metadata per
se.
> We've got a representation problem... The idea was that it is the
statements
> (e.g. one dcterms:conformsTo) that are in the quality graph.
> >> Actually I don't think an instance can be said to be in the graph. It's
only
> statements that are contained in the graph.
> >> Now, I have to say that I'm not sure how best we could represent this,
> graphically.
> >> Has anyone got an idea?
> >
> > I gave it a go. See att1.jpg. Bascally I've pushed out the orange box,
but still
> left the conformsTo, hasQualityMeasure, hasFeedback statements inside the
> graph. Would this be more clear? I've added the dqv:hasQualityMetadata as
> well (this does not have to be in the quality metadata itself).
> >
> >> 2. "then the dcat:Dataset points to the quality metadata graph" calls
for
> introducing another property dqv:hasQualityMetadata or something like
this.
> >> This is an interesting idea. If there's enough positive feedback, we
could
> add it. I'm adding a note right now on it.
> >>
> >> But I wouldn't be in favour of using it as a replacement for the direct
links
> between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
> >> The idea is indeed to have a pattern that allows containment of all
quality
> statements (to allow for provenance tracking) while not putting this
> containment as a hurdle for these who are less interested in it.
> >> Say, if a Dataset comes with a SLA, I prefer to have a direct statement
> between the instance of dcat:Dataset and the instance of dqv:SLA.
> Otherwise one would have to retrieve and combine two statements:
> >> - a link between a Dataset and a QualityGraph
> >> - a statement that relates the QualityGraph with the SLA.
> >> Not only this is a longer path, but one of the nodes is a graph, and
> >> this could raise issues for these who are less comfortable with
> >> graphs (including all of these who don't want to handle RDF syntaxes
> >> for graphs!)
> >
> > I agree with your concerns - I was not viewing it from the provenance
> perspective. But on the other hand, in my opinion the extra property
> (hasQualityMetadata) wouldn't hurt neither - even though it might be
> redundant at the end of the day.
> >
> >> C. There is a raised issue that says:
> >> [
> >> The label of daq:QualityGraph does not fit well with the current model.
> DAQ graphs are meant to contain measures. In our context a "quality graph"
> has a wider scope: actually the role of representing overall quality
graphs is
> currently played by dqv:QualityMetadata.
> >> ]
> >> I think the same initial argument applies to the suggestion of making
> dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph
> contain metadata about quality metrics on the dataset. I believe that
there is
> quality metadata that is not metrics. At least that's how we have started
to
> approach the problem. I'd be very eager to hear whether you think this is
not
> right!
> >
> > To be honest, I prefer the dqv:QualityMetadata term and the idea behind
it
> much more. Its intended use is more suitable in this case than the
> daq:QualityGraph. The daq:QualityGraph is just a "special" RDF graph which
is
> also a cube dataset and as you rightly pointed out, it contains metadata
about
> quality metrics. My understanding of subclasses is "inheriting from the
> parent class and more". That is why I suggested that dqv:QualityMetadata
to
> be a subclass of the daq:QualityGraph instead of rdfg:Graph, because
> QualityMetadata will contain what a daq:QualityGraph should have + more
> information such as having dcterms:Standard, dqv:Feedback etc..  Am I
right
> about this?
> >
> > Cheers,
> > Jer
> >
> >
> >
> > On 25 May 2015, at 23:26, Antoine Isaac <aisaac@few.vu.nl
> <mailto:aisaac@few.vu.nl>> wrote:
> >
> >> Hi Jeremy,
> >>
> >>
> >>
> >> On 5/22/15 10:45 AM, Debattista, Jeremy wrote:
> >>>
> >>> This looks great already.
> >>
> >>
> >> Thanks!
> >>
> >>
> >>> I would like to point out two issues which are not clear to me as yet:
> >>>
> >>> 1) In the diagram, shouldn't the dcat:Dataset be "outside" of the
quality
> metadata (and especially outside of the QualityGraph containment), and
> then the dcat:Dataset points to the quality metadata graph?
> >>>
> >>> I don't know if this was done on purpose there or should have been
> placed outside. If a dcat:Dataset (or distribution) is inside the quality
> metadata boundaries, then my understanding as a consumer (I might be a
> machine) would be that a dcat:Dataset instance is some kind of quality
> information.
> >>
> >>
> >> This is a tricky issue, which calls on two answers:
> >>
> >> 1. I agree that the dcat:Dataset instance is not quality metadata per
se.
> We've got a representation problem... The idea was that it is the
statements
> (e.g. one dcterms:conformsTo) that are in the quality graph.
> >> Actually I don't think an instance can be said to be in the graph. It's
only
> statements that are contained in the graph.
> >> Now, I have to say that I'm not sure how best we could represent this,
> graphically.
> >> Has anyone got an idea?
> >>
> >>
> >>
> >> 2. "then the dcat:Dataset points to the quality metadata graph" calls
for
> introducing another property dqv:hasQualityMetadata or something like
this.
> >> This is an interesting idea. If there's enough positive feedback, we
could
> add it. I'm adding a note right now on it.
> >>
> >> But I wouldn't be in favour of using it as a replacement for the direct
links
> between the dcat:Dataset and dcterms:Standard, dqv:QualityMeasure, etc.
> >> The idea is indeed to have a pattern that allows containment of all
quality
> statements (to allow for provenance tracking) while not putting this
> containment as a hurdle for these who are less interested in it.
> >> Say, if a Dataset comes with a SLA, I prefer to have a direct statement
> between the instance of dcat:Dataset and the instance of dqv:SLA.
> Otherwise one would have to retrieve and combine two statements:
> >> - a link between a Dataset and a QualityGraph
> >> - a statement that relates the QualityGraph with the SLA.
> >> Not only this is a longer path, but one of the nodes is a graph, and
> >> this could raise issues for these who are less comfortable with
> >> graphs (including all of these who don't want to handle RDF syntaxes
> >> for graphs!)
> >>
> >>
> >>> 2) How about doing dqv:QualityMetadata as a subclass of
> daq:QualityGraph?
> >>>
> >>> There are a number of advantages of doing so. First of all we don't
have
> to rely on multiple graphs. Although nothing is wrong with that, this
might
> make querying a bit harder. The daq:QualityGraph is a specialisation of
the
> rdf:Graph which is also a qb:Dataset. In this case the qb:dataset property
can
> have dqv:QualityMeasure as domain and dqv:QualityMetadata as its range.
> This way we can move dcat:Dataset from the graph containment, and
> removing the property "dqv:hasQualityMeasure" (this becomes redundant
> as it can be inferred, if there is some link between dcat:Dataset and
> dqv:QualityMetadata).
> >>>
> >>
> >>
> >> This is quite related to the previous issues. Interesting discussion!
> >> I've tried to make a graph representing your proposal, in the attached
file.
> >> It's quite hard to untangle though. I'll have a try, please tell me
> >> if I'm making any sense :-)
> >>
> >> A. As said above the instance of dcat:Dataset is not meant to be
contained
> in the quality metadata graph. Maybe this alleviates some of your
concerns...
> >>
> >> B. I agree that if there was a link between the instance of
dcat:Dataset
> and the (merged) instance dqv:QualityMetadata/daq:QualityGraph, then
> with a qb:dataset statement between instances of dqv:QualityMeasure and
> the instance of dqv:QualityMetadata/daq:QualityGraph one could indeed
> find a connection between the dcat:Dataset and the instances of
> dqv:QualityMeasure.
> >> But as said above I don't like the idea of removing the direct link. If
a
> dataset has some measure, say, a number of incorrect triples, why remove
> the direct link?
> >> This would put the provenance info in the way of the applications that
are
> less concerned about provenance.
> >> As also said, I'm not against having the link between dcat:Dataset and
the
> (merged) instance dqv:QualityMetadata/daq:QualityGraph. But I wouldn't
> want to use it as a motivation for removing dqv:hasQualityMeasure.
> >>
> >> C. There is a raised issue that says:
> >> [
> >> The label of daq:QualityGraph does not fit well with the current model.
> DAQ graphs are meant to contain measures. In our context a "quality graph"
> has a wider scope: actually the role of representing overall quality
graphs is
> currently played by dqv:QualityMetadata.
> >> ]
> >> I think the same initial argument applies to the suggestion of making
> dqv:QualityMedata a subclass of daq:QualityGraph. DAQ's quality graph
> contain metadata about quality metrics on the dataset. I believe that
there is
> quality metadata that is not metrics. At least that's how we have started
to
> approach the problem. I'd be very eager to hear whether you think this is
not
> right!
> >>
> >>
> >>>
> >>> Once we have a first draft of the RDF schema, I will be happy to
support
> it in our Quality Assessment Framework.
> >>
> >>
> >> This would be great!
> >>
> >> Thanks again for the comments - I hope I will not have discouraged
> >> you by the length of the answers :)
> >>
> >> Cheers,
> >>
> >> Antoine
> >>
> >>>
> >>> On 20 May 2015, at 23:39, Antoine Isaac <aisaac@few.vu.nl
> <mailto:aisaac@few.vu.nl>> wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> We've created a new editor's draft of the Data Quality Vocabulary on
> Github [1].
> >>>>
> >>>> Most of it is in the diagram in section 3. We have placeholder for
> material in other sections, but this is still work in progress.
> >>>>
> >>>> As you can see the diagram and the doc still have a lot of open
> >>>> issues and questions. But we believe it's a positive evolution from
the
> previous version [2]. The patterns that we would like to use are
stabilizing
> Actually I'm curious to see how much of Jeremy's last comments [3] would
> still apply!
> >>>>
> >>>> Needless to say, everyone else's feedback is highly welcome!
> >>>>
> >>>> Please excuse the discussion notes in the diagram itself. We thought
of
> creating a wiki page as we had done previously [2]. But I lacked the time
to
> do it. Maybe in the coming days, depending on how the discussion
evolves...
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Antoine, on behalf of co-editors Riccardo and Christophe
> >>>>
> >>>> [1] http://w3c.github.io/dwbp/vocab-dqg.html
> >>>> [2]
> >>>>
> https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_%28DQV%
> 29
> >>>> [3]
> >>>> http://lists.w3.org/Archives/Public/public-dwbp-
> wg/2015May/0037.htm
> >>>> l
> >>>>
> >>>
> >>>
> >>>
> >> <JeremysProposal-150525.png>
> >

Received on Friday, 29 May 2015 11:57:35 UTC