RE: [publishing-statistical-data] W3C Data Cube Last Call from Cotton Franck on 2013-04-15 (public-gld-comments@w3.org from April 2013)

From: Cotton Franck <franck.cotton@insee.fr>
Date: Mon, 15 Apr 2013 06:37:31 +0000
To: Dave Reynolds <dave.e.reynolds@gmail.com>
CC: "public-gld-comments@w3.org" <public-gld-comments@w3.org>
Message-ID: <6BCE2054632CC547A08449591889EC5C4888065B@S90X3BAL3.ad.insee.intra>
Dear Dave

Thank you for your detailed answer.

I certainly don't want to complicate the progress of the Data Cube specification, so I do accept your proposal to retain qb:HierarchicalCodeList but expand the editorial text.

Nethertheless, I think that there is still room for debate on the subject after our first exchange, and I am still convinced that this is an area where guidelines and good practise would be useful, especially for Governments which possess lots of reference codes and classifications.

Sincerely
Franck  

-----Message d'origine-----
De : Dave Reynolds [mailto:dave.e.reynolds@gmail.com] 
Envoyé : vendredi 12 avril 2013 09:23
À : Cotton Franck
Cc : public-gld-comments@w3.org
Objet : Re: [publishing-statistical-data] W3C Data Cube Last Call

Dear Franck,

Thank you again for your valuable feedback on the Last Call Working Draft of the Data Cube specification.

The bulk of your comments were editorial in nature, we will take care of these and follow up shortly once we have completed the changes.

One of your comments, that on qb:HierarchicalCodeList, would require a substantive change to the specification so this has been recorded and tracked as ISSUE-59 [1].

The working group has examined this issue and I'd like to report back to you on behalf on the working group.

We understand your hesitation over qb:HierarchicalCodeList. We appreciate that data publishers must be careful to distinguish between real world things and codes in code lists. However, on balance we feel that qb:HierarchicalCodeList is a valuable feature which addresses a genuine requirement and would prefer to retain it. We'd like to outline our arguments and ask if you would be prepared to accept this outcome.

1. This feature was added as a result of implementation experience and requests from users. We have received one Last Call comment which specifically endorsed the addition of this feature [2]. Removing it, and thus triggering another Last Call, might attract feedback asking that it be reinstated.

2. We feel it is important best practice to reuse existing identifiers where possible and that qb:HierarchicalCodeList makes it possible to reuse identifiers such as admin-geographic datasets that are authoritatively maintained. While it might be technically possible to duplicate such existing identifiers into a SKOS hierarchy that can lead to technical problems of maintenance of the duplicate and social/political problems that the duplicate would not be from the same authoritative source.

3. There are genuine cases where simple SKOS broader / narrower relationships are not sufficient to express the relationships. For example, there can be multiple hierarchies that apply over the same set of concepts (geographic containment v. administrative authority for admin-geographic hierarchies). We have feedback from other Data Cube users that "SKOS is not enough" and that they find they have to express more nuanced relationships using other vocabularies. The qb:HierarchicalCodeList provides a mechanism to connect such beyond-SKOS representations to Data Cube publications.

4. We do understand that there is a risk that some publishers may treat this facility as a simple way to get something published where it would have been more technically correct to publish a new SKOS hierarchy as well and use that. On balance we think this simplification is useful even if it is not always strictly appropriate and that the gains outweigh the risks.

So our proposal is to retain qb:HierarchicalCodeList but expand the editorial text to clarify that this construct should not be used in cases where a suitable SKOS concept scheme exists or could reasonably be created.

Given our arguments above, would you be prepared to accept the Data Cube vocabulary progressing to the next stage in the standards process in this form?

Best wishes,
Dave Reynolds, GLD Working group

[1] http://www.w3.org/2011/gld/track/issues/59


[2]
http://lists.w3.org/Archives/Public/public-gld-comments/2013Apr/0021.html


On 08/04/13 10:02, Cotton Franck wrote:
> Hi all
>
> I'm happy to see the Data Cube spec progress on the standard track, 
> thank you for the good job. Here are some comments on the document. If 
> they are not clear, don't hesitate to get back to me.
>
> Cheers
>
> Franck
>
> ------------------------------------------
>
> *Figure *picturing the vocabulary should have at least a title. It 
> should also make appearent that qb:Slice is a sub-class of 
> qb:ObservationGroup.
>
> Vocabulary *index* : I don't think that qb:Attachable and 
> qb:ComponentSet are mentionned in the specification (except in the 
> vocabulary reference). For example, the end of the third bullet in 6.4 
> would be a good place to mention Attachable.
>
> *2.2* SDMX and related standards
>
>
> - First line: "organisations" -> "organizations" (sorry)
>
>
> - It is said that Data Cube builds on SDMX 2.0. Could it be instead 
> related to version 2.1?
>
> - [end of the section] "RDF versions of [the COG] terms are available
> separately": where?
>
> *5 *Data Cubes
>
>
> - It is redundant to mark all sub-sections as normative since the 
> whole section is marked so.
>
> - End of section 5.2: "sub classes", wouldn't you rather write 
> "subclass" or "sub-class"?
>
> *6* Creating DSDs
>
>
> - second bullet in introduction: UI construction is just one example, 
> so maybe it would be more appropriate to say something like 
> "...simplifies data consumption, for example for UI construction"
>
>
> - third bullet in introduction: SDMX data flow is a new notion not 
> introduced before, and it is explained in the next paragraph, so maybe 
> a reference to this paragraph should be made.
>
> - Penultimate paragraph of section 6.1: "... information can encoded...
> " -> "be" is missing.
>
> - Section 6.3, first paragraph: maybe "... sex of the population units"
> rather the "... sex of the population".
>
> - 6.3 Example. I don't see where it is expressed that the measure is a 
> means over the three-year period
>
> - 6.4, first bullet, remove closing chevron at the end of second 
> sentence. Also, in the Turtle example 4, I think that the bracket 
> contents should start with a space (same is true for example 11 in 
> section 7.2).
>
> - 6.5 (Handling multiple measures)
>
> . I find the expression a bit sloppy in this section. In particular, 
> third sentence of second paragraph is not clear to me.
> . Third paragraph: "data cube" -> "Data Cube".
> . Fourth paragraph (just before 6.5.1): lots of "then" that don't help 
> to clarify.
>
> - 6.5.2 (Measure dimension). There is a mention of a "SDMX-in-RDF 
> extension vocabulary". I'm not sure of how this is linked to what is 
> described at the end of section 2.2 or in section 6.2. Where is this 
> extension defined or available?
>
> *7 *Expressing data sets
>
>
> - The definition of "Observations" talks about numbers. Observations 
> are not necessarily numbers, so I would replace the two occurrences of 
> "numbers" by "values".
>
> - 7.1, Example 10. The example is described as an improved version of 
> Example 9, but as you say normalized datasets have advantages and 
> inconveniences, so maybe "shorter" would be a better adjective than 
> "improved".
>
> - 7.2
> . Second sentence. "This not intended" -> "is" is missing . Last 
> paragraph before Example 11, first sentence. First "which" is probably 
> "with". There are a lot of "which" in this paragraph.
>
> - 8.2
> Two points on the third sentence ("Hierarchical code lists..."):
> . are you saying that skos:broader should not be used?
> . maybe explicitely say that sub-properties of skos:narrower can be 
> used (for Richard: I'm thinking XKOS here)
>
> - 8.3
> . I'm not sure I understand the case in the third bullet (exhaustivity 
> and mutual exclusion). In XKOS, these notions are expressed at the 
> skos:ConceptScheme (or xkos:ClassificationLevel) level, so it's just a 
> qualifier of SKOS concept schemes. You don't really need "non-SKOS 
> hierarchies" here.
>
> . On the whole, I am very hesitant about the introduction of the 
> qb:HierarchicalCodeList class and associated properties. This raises 
> the more general problem of how to derive SKOS concept schemes from 
> sets of resources that have some kind of "real world" hierarchical 
> relations that are not hierarchies between codes in a code list, but 
> hierarchies in some specific sense between objects. The example of 
> geographic territories is a good one: here you have the territorial 
> inclusion relation that induces a broader/narrower relation between 
> associated items in a code list, but of course we do not want to make 
> a confusion between a region, for example, and an item in a code 
> scheme, nor between territorial inclusion and "broader concept". It 
> seems to me that the approach you describe fuels a bit the confusion.
>
> A different approach would be to explicitely generate a SKOS concept 
> scheme parallel to the "real world" hierarchy and to use a property 
> like foaf:focus to link the concepts to the things they represent (see 
> http://lists.w3.org/Archives/Public/public-esw-thes/2010Aug/0002.html).
>
> I think that this is a very general problem that should be addressed 
> in an ad hoc W3C or other group aimed at developing a recommanded 
> practice, rather than treated here and there in different fashions.
>
> ----------------------
>
> Also, some comment on the Turtle resources (from Laurent Bihanic, Atos 
> and Datalift project). I include the comment on sdmx-subject.ttl 
> although I don't see it explicitely referred to in the spec.
>
> *1.* Some URIs are not valid Turtle QNames, in sdmx-subject.ttl. In 
> Turtle, the name part of a QName can't start with a number (see
> http://www.w3.org/TeamSubmission/turtle/#name)
> For theses, full URIs should be used, e.g.
> <http://purl.org/linked-data/sdmx/2009/subject#1> instead of 
> sdmx-subject:1
>
> *2.* The URN of some objects include a space characters :
>
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> ADV_NOTICE";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> OBS_VALUE";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> STAT_POP";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> TIMELINESS";
> sdmx-concept.ttl: skos:notation
> "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0].
> TIME_OUTPUT";
>
> ----------------------------------------------------------------------
> -- *De :* Guillaume Duffes [guillaume.duffes@gmail.com] *Date d'envoi 
> :* vendredi 5 avril 2013 14:48 *À :* Dave Reynolds
> *Cc:* public-gld-comments@w3.org; Cotton Franck *Objet :* Re: 
> [publishing-statistical-data] W3C Data Cube Last Call
>
> Hi,
>
> Yes, the additional paragraph addresses this issue.
>
> Thank you for that.
>
> Guillaume
>
>
> 2013/4/5 Dave Reynolds <dave.e.reynolds@gmail.com 
> <mailto:dave.e.reynolds@gmail.com>>
>
>     Dear Guillaume,
>
>     Thank you very much for your helpful comments on the Data Cube last
>     call.
>
>     We will give a formal response to the various issues you raise in
>     due course.
>
>     In the meantime I wonder if I could ask a clarifying question.
>
>         _*6.4*_ : “ /In a data set with multiple observations
>         //*[measures ??]*//**//then we add an additional dimension whose
>         value
>
>         indicates the measure. This is appropriate for applications
>         where the
>         measures are separate aggregate statistics“/→ I do not
>         completely agree
>
>         with that.
>
>         First, I guess you meant multiple measures instead of observations.
>
>         The above-mentioned “ /additional dimension/ “, that is the measure
>         dimension is defined in SDMX 2.1 as “/ is a special type of
>         dimension
>
>         which defines multiple measures in a data structure definition.
>         [..].
>         Note that it is necessary that these representations are
>         compliant (the
>         same or derived from) with that of the primary measure.” /The
>         primary
>
>         measure which represents the value of the phenomenon to be
>         measured via
>         a reference to a concept, is mandatory and can take its semantic
>         from
>         any concept, although it is provided as a fixed identifier
>         (OBS_VALUE).
>
>         The SDMX MeasureDimension is above all a dimension, admittedly of a
>         particular type, whereas it seems to me that the RDF Data Cube
>         MeasureDimension, declared as a qb:MeasureType is primarily a
>         measure.
>         In my mind it is exemplified by the fact that the qb:MeasureType
>         component is a dimension property with an implicit code list whereas
>         SDMX requires a reference to an explicit ConceptScheme whether its
>         representation be made explicit or not. I think it would be worth
>         mentioning this slight difference.
>
>
>     I do agree that qb:MeasureType is unusual in this respect of having
>     an implicit code list, despite being a qb:DimensionProperty.
>
>     This is called out in section 6.5.2 [1] third paragraph.
>
>     Is that explanatory paragraph sufficient if we clarify that this
>     notion of an implicit code list for qb:MeasureType is a small
>     divergence from SDMX?
>
>     Thanks,
>     Dave
>
>     [1]
>     
> http://www.w3.org/TR/2013/WD-__vocab-data-cube-20130312/#dsd-__mm-dim 
> <http://www.w3.org/TR/2013/WD-vocab-data-cube-20130312/#dsd-mm-dim>
>
>
Received on Monday, 15 April 2013 06:38:27 UTC