[publishing-statistical-data] W3C Data Cube Last Call

From: Cotton Franck <franck.cotton@insee.fr>
Date: Mon, 8 Apr 2013 09:02:33 +0000
To: "public-gld-comments@w3.org" <public-gld-comments@w3.org>, Dave Reynolds <dave.e.reynolds@gmail.com>
CC: Guillaume Duffes <guillaume.duffes@gmail.com>
Message-ID: <6BCE2054632CC547A08449591889EC5C4887AD5B@S90X3BAL3.ad.insee.intra>
Hi all

I'm happy to see the Data Cube spec progress on the standard track, thank you for the good job. Here are some comments on the document. If they are not clear, don't hesitate to get back to me.




Figure picturing the vocabulary should have at least a title. It should also make appearent that qb:Slice is a sub-class of qb:ObservationGroup.

Vocabulary index : I don't think that qb:Attachable and qb:ComponentSet are mentionned in the specification (except in the vocabulary reference). For example, the end of the third bullet in 6.4 would be a good place to mention Attachable.

2.2 SDMX and related standards

- First line: "organisations" -> "organizations" (sorry)

- It is said that Data Cube builds on SDMX 2.0. Could it be instead related to version 2.1?

- [end of the section] "RDF versions of [the COG] terms are available separately": where?

5 Data Cubes

- It is redundant to mark all sub-sections as normative since the whole section is marked so.

- End of section 5.2: "sub classes", wouldn't you rather write "subclass" or "sub-class"?

6 Creating DSDs

- second bullet in introduction: UI construction is just one example, so maybe it would be more appropriate to say something like "...simplifies data consumption, for example for UI construction"

- third bullet in introduction: SDMX data flow is a new notion not introduced before, and it is explained in the next paragraph, so maybe a reference to this paragraph should be made.

- Penultimate paragraph of section 6.1: "... information can encoded... " -> "be" is missing.

- Section 6.3, first paragraph: maybe "... sex of the population units" rather the "... sex of the population".

- 6.3 Example. I don't see where it is expressed that the measure is a means over the three-year period

- 6.4, first bullet, remove closing chevron at the end of second sentence. Also, in the Turtle example 4, I think that the bracket contents should start with a space (same is true for example 11 in section 7.2).

- 6.5 (Handling multiple measures)

. I find the expression a bit sloppy in this section. In particular, third sentence of second paragraph is not clear to me.
. Third paragraph: "data cube" -> "Data Cube".
. Fourth paragraph (just before 6.5.1): lots of "then" that don't help to clarify.

- 6.5.2 (Measure dimension). There is a mention of a "SDMX-in-RDF extension vocabulary". I'm not sure of how this is linked to what is described at the end of section 2.2 or in section 6.2. Where is this extension defined or available?

7 Expressing data sets

- The definition of "Observations" talks about numbers. Observations are not necessarily numbers, so I would replace the two occurrences of "numbers" by "values".

- 7.1, Example 10. The example is described as an improved version of Example 9, but as you say normalized datasets have advantages and inconveniences, so maybe "shorter" would be a better adjective than "improved".

- 7.2
. Second sentence. "This not intended" -> "is" is missing
. Last paragraph before Example 11, first sentence. First "which" is probably "with". There are a lot of "which" in this paragraph.

- 8.2
Two points on the third sentence ("Hierarchical code lists..."):
. are you saying that skos:broader should not be used?
. maybe explicitely say that sub-properties of skos:narrower can be used (for Richard: I'm thinking XKOS here)

- 8.3
. I'm not sure I understand the case in the third bullet (exhaustivity and mutual exclusion). In XKOS, these notions are expressed at the skos:ConceptScheme (or xkos:ClassificationLevel) level, so it's just a qualifier of SKOS concept schemes. You don't really need "non-SKOS hierarchies" here.

. On the whole, I am very hesitant about the introduction of the qb:HierarchicalCodeList class and associated properties. This raises the more general problem of how to derive SKOS concept schemes from sets of resources that have some kind of "real world" hierarchical relations that are not hierarchies between codes in a code list, but hierarchies in some specific sense between objects. The example of geographic territories is a good one: here you have the territorial inclusion relation that induces a broader/narrower relation between associated items in a code list, but of course we do not want to make a confusion between a region, for example, and an item in a code scheme, nor between territorial inclusion and "broader concept". It seems to me that the approach you describe fuels a bit the confusion.

A different approach would be to explicitely generate a SKOS concept scheme parallel to the "real world" hierarchy and to use a property like foaf:focus to link the concepts to the things they represent (see http://lists.w3.org/Archives/Public/public-esw-thes/2010Aug/0002.html).

I think that this is a very general problem that should be addressed in an ad hoc W3C or other group aimed at developing a recommanded practice, rather than treated here and there in different fashions.


Also, some comment on the Turtle resources (from Laurent Bihanic, Atos and Datalift project). I include the comment on sdmx-subject.ttl although I don't see it explicitely referred to in the spec.

1. Some URIs are not valid Turtle QNames, in sdmx-subject.ttl. In Turtle, the name part of a QName can't start with a number (see http://www.w3.org/TeamSubmission/turtle/#name)
For theses, full URIs should be used, e.g. <http://purl.org/linked-data/sdmx/2009/subject#1> instead of sdmx-subject:1

2. The URN of some objects include a space characters :

sdmx-concept.ttl: skos:notation "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0]. ADV_NOTICE";
sdmx-concept.ttl: skos:notation "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0]. OBS_VALUE";
sdmx-concept.ttl: skos:notation "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0]. STAT_POP";
sdmx-concept.ttl: skos:notation "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0]. TIMELINESS";
sdmx-concept.ttl: skos:notation "urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=SDMX:CROSS_DOMAIN_CONCEPTS[1.0]. TIME_OUTPUT";

De : Guillaume Duffes [guillaume.duffes@gmail.com]
Date d'envoi : vendredi 5 avril 2013 14:48
À : Dave Reynolds
Cc: public-gld-comments@w3.org; Cotton Franck
Objet : Re: [publishing-statistical-data] W3C Data Cube Last Call


Yes, the additional paragraph addresses this issue.

Thank you for that.


2013/4/5 Dave Reynolds <dave.e.reynolds@gmail.com<mailto:dave.e.reynolds@gmail.com>>
Dear Guillaume,

Thank you very much for your helpful comments on the Data Cube last call.

We will give a formal response to the various issues you raise in due course.

In the meantime I wonder if I could ask a clarifying question.

_*6.4*_ : “ /In a data set with multiple observations
//*[measures ??]*//**//then we add an additional dimension whose value

indicates the measure. This is appropriate for applications where the
measures are separate aggregate statistics“/→ I do not completely agree

with that.

First, I guess you meant multiple measures instead of observations.

The above-mentioned “ /additional dimension/ “, that is the measure
dimension is defined in SDMX 2.1 as “/ is a special type of dimension

which defines multiple measures in a data structure definition. [..].
Note that it is necessary that these representations are compliant (the
same or derived from) with that of the primary measure.” /The primary

measure which represents the value of the phenomenon to be measured via
a reference to a concept, is mandatory and can take its semantic from
any concept, although it is provided as a fixed identifier (OBS_VALUE).

The SDMX MeasureDimension is above all a dimension, admittedly of a
particular type, whereas it seems to me that the RDF Data Cube
MeasureDimension, declared as a qb:MeasureType is primarily a measure.
In my mind it is exemplified by the fact that the qb:MeasureType
component is a dimension property with an implicit code list whereas
SDMX requires a reference to an explicit ConceptScheme whether its
representation be made explicit or not. I think it would be worth
mentioning this slight difference.

I do agree that qb:MeasureType is unusual in this respect of having an implicit code list, despite being a qb:DimensionProperty.

This is called out in section 6.5.2 [1] third paragraph.

Is that explanatory paragraph sufficient if we clarify that this notion of an implicit code list for qb:MeasureType is a small divergence from SDMX?


[1] http://www.w3.org/TR/2013/WD-vocab-data-cube-20130312/#dsd-mm-dim

Received on Monday, 8 April 2013 09:03:05 UTC

