Re: [publishing-statistical-data] W3C Data Cube Last Call from Dave Reynolds on 2013-04-12 (public-gld-comments@w3.org from April 2013)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Fri, 12 Apr 2013 08:23:28 +0100
To: Cotton Franck <franck.cotton@insee.fr>
CC: "public-gld-comments@w3.org" <public-gld-comments@w3.org>
Message-ID: <5167B670.9050803@gmail.com>
Dear Franck,

Thank you again for your valuable feedback on the Last Call Working 
Draft of the Data Cube specification.

The bulk of your comments were editorial in nature, we will take care of 
these and follow up shortly once we have completed the changes.

One of your comments, that on qb:HierarchicalCodeList, would require a 
substantive change to the specification so this has been recorded and 
tracked as ISSUE-59 [1].

The working group has examined this issue and I'd like to report back to 
you on behalf on the working group.

We understand your hesitation over qb:HierarchicalCodeList. We 
appreciate that data publishers must be careful to distinguish between 
real world things and codes in code lists. However, on balance we feel 
that qb:HierarchicalCodeList is a valuable feature which addresses a 
genuine requirement and would prefer to retain it. We'd like to outline 
our arguments and ask if you would be prepared to accept this outcome.

1. This feature was added as a result of implementation experience and 
requests from users. We have received one Last Call comment which 
specifically endorsed the addition of this feature [2]. Removing it, and 
thus triggering another Last Call, might attract feedback asking that it 
be reinstated.

2. We feel it is important best practice to reuse existing identifiers 
where possible and that qb:HierarchicalCodeList makes it possible to 
reuse identifiers such as admin-geographic datasets that are 
authoritatively maintained. While it might be technically possible to 
duplicate such existing identifiers into a SKOS hierarchy that can lead 
to technical problems of maintenance of the duplicate and 
social/political problems that the duplicate would not be from the same 
authoritative source.

3. There are genuine cases where simple SKOS broader / narrower 
relationships are not sufficient to express the relationships. For 
example, there can be multiple hierarchies that apply over the same set 
of concepts (geographic containment v. administrative authority for 
admin-geographic hierarchies). We have feedback from other Data Cube 
users that "SKOS is not enough" and that they find they have to express 
more nuanced relationships using other vocabularies. The 
qb:HierarchicalCodeList provides a mechanism to connect such beyond-SKOS 
representations to Data Cube publications.

4. We do understand that there is a risk that some publishers may treat 
this facility as a simple way to get something published where it would 
have been more technically correct to publish a new SKOS hierarchy as 
well and use that. On balance we think this simplification is useful 
even if it is not always strictly appropriate and that the gains 
outweigh the risks.

So our proposal is to retain qb:HierarchicalCodeList but expand the 
editorial text to clarify that this construct should not be used in 
cases where a suitable SKOS concept scheme exists or could reasonably be 
created.

Given our arguments above, would you be prepared to accept the Data Cube 
vocabulary progressing to the next stage in the standards process in 
this form?

Best wishes,
Dave Reynolds, GLD Working group

[1] http://www.w3.org/2011/gld/track/issues/59

[2]
http://lists.w3.org/Archives/Public/public-gld-comments/2013Apr/0021.html

On 08/04/13 10:02, Cotton Franck wrote:
> Hi all
>
> I'm happy to see the Data Cube spec progress on the standard track,
> thank you for the good job. Here are some comments on the document. If
> they are not clear, don't hesitate to get back to me.
>
> Cheers
>
> Franck
>
> ------------------------------------------
>
> *Figure *picturing the vocabulary should have at least a title. It
> should also make appearent that qb:Slice is a sub-class of
> qb:ObservationGroup.
>
> Vocabulary *index* : I don't think that qb:Attachable and
> qb:ComponentSet are mentionned in the specification (except in the
> vocabulary reference). For example, the end of the third bullet in 6.4
> would be a good place to mention Attachable.
>
> *2.2* SDMX and related standards
>
>
> - First line: "organisations" -> "organizations" (sorry)
>
>
> - It is said that Data Cube builds on SDMX 2.0. Could it be instead
> related to version 2.1?
>
> - [end of the section] "RDF versions of [the COG] terms are available
> separately": where?
>
> *5 *Data Cubes
>
>
> - It is redundant to mark all sub-sections as normative since the whole
> section is marked so.
>
> - End of section 5.2: "sub classes", wouldn't you rather write
> "subclass" or "sub-class"?
>
> *6* Creating DSDs
>
>
> - second bullet in introduction: UI construction is just one example, so
> maybe it would be more appropriate to say something like "...simplifies
> data consumption, for example for UI construction"
>
>
> - third bullet in introduction: SDMX data flow is a new notion not
> introduced before, and it is explained in the next paragraph, so maybe a
> reference to this paragraph should be made.
>
> - Penultimate paragraph of section 6.1: "... information can encoded...
> " -> "be" is missing.
>
> - Section 6.3, first paragraph: maybe "... sex of the population units"
> rather the "... sex of the population".
>
> - 6.3 Example. I don't see where it is expressed that the measure is a
> means over the three-year period
>
> - 6.4, first bullet, remove closing chevron at the end of second
> sentence. Also, in the Turtle example 4, I think that the bracket
> contents should start with a space (same is true for example 11 in
> section 7.2).
>
> - 6.5 (Handling multiple measures)
>
> . I find the expression a bit sloppy in this section. In particular,
> third sentence of second paragraph is not clear to me.
> . Third paragraph: "data cube" -> "Data Cube".
> . Fourth paragraph (just before 6.5.1): lots of "then" that don't help
> to clarify.
>
> - 6.5.2 (Measure dimension). There is a mention of a "SDMX-in-RDF
> extension vocabulary". I'm not sure of how this is linked to what is
> described at the end of section 2.2 or in section 6.2. Where is this
> extension defined or available?
>
> *7 *Expressing data sets
>
>
> - The definition of "Observations" talks about numbers. Observations are
> not necessarily numbers, so I would replace the two occurrences of
> "numbers" by "values".
>
> - 7.1, Example 10. The example is described as an improved version of
> Example 9, but as you say normalized datasets have advantages and
> inconveniences, so maybe "shorter" would be a better adjective than
> "improved".
>
> - 7.2
> . Second sentence. "This not intended" -> "is" is missing
> . Last paragraph before Example 11, first sentence. First "which" is
> probably "with". There are a lot of "which" in this paragraph.
>
> - 8.2
> Two points on the third sentence ("Hierarchical code lists..."):
> . are you saying that skos:broader should not be used?
> . maybe explicitely say that sub-properties of skos:narrower can be used
> (for Richard: I'm thinking XKOS here)
>
> - 8.3
> . I'm not sure I understand the case in the third bullet (exhaustivity
> and mutual exclusion). In XKOS, these notions are expressed at the
> skos:ConceptScheme (or xkos:ClassificationLevel) level, so it's just a
> qualifier of SKOS concept schemes. You don't really need "non-SKOS
> hierarchies" here.
>
> . On the whole, I am very hesitant about the introduction of the
> qb:HierarchicalCodeList class and associated properties. This raises the
> more general problem of how to derive SKOS concept schemes from sets of
> resources that have some kind of "real world" hierarchical relations
> that are not hierarchies between codes in a code list, but hierarchies
> in some specific sense between objects. The example of geographic
> territories is a good one: here you have the territorial inclusion
> relation that induces a broader/narrower relation between associated
> items in a code list, but of course we do not want to make a confusion
> between a region, for example, and an item in a code scheme, nor between
> territorial inclusion and "broader concept". It seems to me that the
> approach you describe fuels a bit the confusion.
>
> A different approach would be to explicitely generate a SKOS concept
> scheme parallel to the "real world" hierarchy and to use a property like
> foaf:focus to link the concepts to the things they represent (see
> http://lists.w3.org/Archives/Public/public-esw-thes/2010Aug/0002.html).
>
> I think that this is a very general problem that should be addressed in
> an ad hoc W3C or other group aimed at developing a recommanded practice,
> rather than treated here and there in different fashions.
>
> ----------------------
>
> Also, some comment on the Turtle resources (from Laurent Bihanic, Atos
> and Datalift project). I include the comment on sdmx-subject.ttl
> although I don't see it explicitely referred to in the spec.
>
> *1.* Some URIs are not valid Turtle QNames, in sdmx-subject.ttl. In
> Turtle, the name part of a QName can't start with a number (see
> http://www.w3.org/TeamSubmission/turtle/#name)
> For theses, full URIs should be used, e.g.
> <http://purl.org/linked-data/sdmx/2009/subject#1> instead of sdmx-subject:1
>
> *2.* The URN of some objects include a space characters :
>
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> ADV_NOTICE";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> OBS_VALUE";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> STAT_POP";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> TIMELINESS";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> TIME_OUTPUT";
>
> ------------------------------------------------------------------------
> *De :* Guillaume Duffes [guillaume.duffes@gmail.com]
> *Date d'envoi :* vendredi 5 avril 2013 14:48
> *À :* Dave Reynolds
> *Cc:* public-gld-comments@w3.org; Cotton Franck
> *Objet :* Re: [publishing-statistical-data] W3C Data Cube Last Call
>
> Hi,
>
> Yes, the additional paragraph addresses this issue.
>
> Thank you for that.
>
> Guillaume
>
>
> 2013/4/5 Dave Reynolds <dave.e.reynolds@gmail.com
> <mailto:dave.e.reynolds@gmail.com>>
>
>     Dear Guillaume,
>
>     Thank you very much for your helpful comments on the Data Cube last
>     call.
>
>     We will give a formal response to the various issues you raise in
>     due course.
>
>     In the meantime I wonder if I could ask a clarifying question.
>
>         _*6.4*_ : “ /In a data set with multiple observations
>         //*[measures ??]*//**//then we add an additional dimension whose
>         value
>
>         indicates the measure. This is appropriate for applications
>         where the
>         measures are separate aggregate statistics“/→ I do not
>         completely agree
>
>         with that.
>
>         First, I guess you meant multiple measures instead of observations.
>
>         The above-mentioned “ /additional dimension/ “, that is the measure
>         dimension is defined in SDMX 2.1 as “/ is a special type of
>         dimension
>
>         which defines multiple measures in a data structure definition.
>         [..].
>         Note that it is necessary that these representations are
>         compliant (the
>         same or derived from) with that of the primary measure.” /The
>         primary
>
>         measure which represents the value of the phenomenon to be
>         measured via
>         a reference to a concept, is mandatory and can take its semantic
>         from
>         any concept, although it is provided as a fixed identifier
>         (OBS_VALUE).
>
>         The SDMX MeasureDimension is above all a dimension, admittedly of a
>         particular type, whereas it seems to me that the RDF Data Cube
>         MeasureDimension, declared as a qb:MeasureType is primarily a
>         measure.
>         In my mind it is exemplified by the fact that the qb:MeasureType
>         component is a dimension property with an implicit code list whereas
>         SDMX requires a reference to an explicit ConceptScheme whether its
>         representation be made explicit or not. I think it would be worth
>         mentioning this slight difference.
>
>
>     I do agree that qb:MeasureType is unusual in this respect of having
>     an implicit code list, despite being a qb:DimensionProperty.
>
>     This is called out in section 6.5.2 [1] third paragraph.
>
>     Is that explanatory paragraph sufficient if we clarify that this
>     notion of an implicit code list for qb:MeasureType is a small
>     divergence from SDMX?
>
>     Thanks,
>     Dave
>
>     [1]
>     http://www.w3.org/TR/2013/WD-__vocab-data-cube-20130312/#dsd-__mm-dim <http://www.w3.org/TR/2013/WD-vocab-data-cube-20130312/#dsd-mm-dim>
>
>
Received on Friday, 12 April 2013 07:24:00 UTC