AW: [QB-UCR] Review comments from Benedikt Kaempgen on 2013-05-28 (public-gld-wg@w3.org from May 2013)

From: Benedikt Kaempgen <kaempgen@fzi.de>
Date: Tue, 28 May 2013 11:49:38 +0000
To: Dave Reynolds <dave.e.reynolds@gmail.com>, "Government Linked Data Working Group" <public-gld-wg@w3.org>
Message-ID: <0D7BFFD7C415144DA75C3D49C46AC21512ACD121@ex-ms-1a.fzi.de>
Dear Dave (Dear Richard, Phil, all),

I have started to implement your feedback [1]. I have not finished, yet, but before I continue I would like to check back with you whether my changes make sense and would make you agree with publishing the UCR document as a WG Note. 

See inline my (planned) changes to the document:

> # High level comments
> 
> 1. The requirements identified here are primarily those for
> additional
> work within GLD and not for Data Cube itself. That needs to be made
> clearer. At a minimum by improving the phrasing in the introduction.

You are right. In fact, I think we are actually describing "Use Cases and Lessons for the Data Cube Vocabulary".

I changed the title accordingly as well as the phrasing in the introduction:

        "The
	document describes use cases that would benefit from using the vocabulary.
	In particular, the document identifies possible benefits and challenges in using
	such a vocabulary for representing statistics. Also, it derives lessons that
	can be used for future work on the vocabulary as well as for useful
	tools complementing the vocabulary."

Consequently, I also turned the requirements into lessons:

* "4.1 Vocabulary should build upon the SDMX information model" changed to "There is a putative requirement to update to SDMX 2.1 if there are specific use cases that demand it" <= Lesson: SDMX 2.1 is out and extends vocabulary, The vocabulary so far builds upon Version 2.0, if specific use cases derived from Version 2.1 become available, working groups may consider extending the vocabulary upon Version 2.1. Add link to issue.
* "4.2 Vocabulary should clarify the use of subsets of observations" changed to "Publishers may need more guidance in creating and managing slices or arbitrary groups of observations" <= In the end, usage is most important. So consumers also need to know what to do.
* "4.3 Vocabulary should recommend a mechanism to support hierarchical code lists" changed to: "Publishers may need more guidance to decide which representation of hierarchies is most suitable for their use case"
* "4.4 Vocabulary should define relationship to ISO19156 - Observations & Measurements" changed to: "Modelers using ISO19156 - Observations & Measurements may need clarification regarding the relationship to the data cube vocabulary"
* "4.5 There should be a recommended mechanism to allow for publication of aggregates which cross multiple dimensions" changed to: "Publishers may need guidance in how to represent common analytical operations such as Slice, Dice, Rollup on data cubes"
* "4.6 There should be a recommended way of declaring relations between cubes"  changed to: "Publishers may need guidance in making transparent the pre-processing of aggregate statistics"
* "4.7 There should be criteria for well-formedness and assumptions consumers can make about published data"  changed to: "Publishers and consumers may need guidance in checking and making use of well-formedness of published data using data cube"
* "4.8 There should be mechanisms and recommendations regarding publication and consumption of large amounts of statistical data" changed to: "Publishers and consumers may need more guidance in efficiently processing data using the data cube vocabulary"
" "4.9 There should be a recommended way to communicate the availability of published statistical data to external parties and to allow automatic discovery of statistical data" changed to: "Publishers may need guidance in communicating the availability of published statistical data to external parties and to allow automatic discovery of statistical data"


> 
> 2. Some requirements here are not requirements for the data cube
> vocabulary but for associated tools/services which have not been
> considered as part of our work. They should either not be here or
> they
> need to be pulled out separately (specifically 4.8 and 4.9).

See changes above.

> 
> 3. In some of the use cases it's not clear how the use case motivates
> the requirement associated with it.

I now started to have the "lessons" directly explained in the "challenges" section of each use case, if that is ok.

> 
> 4. A number of the diagrams appear to have been copy from other
> publications. This may be a copyright violation. Publication can not
> proceed until this at least is resolved.

I removed those figures.

> 
> 5. Given the structure of the document it is not obvious that
> ACTION-92
> makes sense. Need to either add a new use case in which the O&M
> relationship can then be illustrated (e.g. the MetOffice usage) or
> revert to having the O&M discussion as a separate note or drop it.
> I'm
> inclined towards the first of these but will need to think more about
> it.


I renamed "Publisher Use Case: Publishing slices of data about UK Bathing Water Quality" to "Publisher Use Case: Publishing Observational Data Sets about UK Bathing Water Quality" to better fit the use case.

We now have a lesson "Modelers using ISO19156 - Observations & Measurements may need clarification regarding the relationship to the data cube vocabulary" instead of the requirement. Would be great if your additions fit in here better.

> 
> 6. Some of the "use cases" represent actual deployed systems. For
> those
> then it may be appropriate to highlight "Lessons" or "Insights"
> rather
> than just focus on unmet requirements.

We now describe more general lessons instead of requirements, see above.

> 
> # Detailed/minor comments (ordered and numbered by section)
> 
> ## 1
> 
> o The first paragraph needs to be clarified to make the nature of the
> requirements clearer (high level comment #1).

Done, see above.

> 
> ## 1.1
> 
> o Second bullet. s/"dimensions", e.g., the specific phenomenon
> "weight"/
> "dimensions", the specific phenomenon e.g. "weight"/  (otherwise it
> sounds like the measure is an example of a dimension)

Made descriptions clearer:

"To allow correct interpretation of the value, the observation needs to be further described by "dimensions" such as the specific phenomenon, e.g., "weight", the time the observation is valid, e.g., "January 2013" or a location the observation was done, e.g., "New York"."

"To further improve interpretation of the value, attributes such as presentational information, e.g. a series title "COINS 2010 to 2013" or critical information to understanding the data, e.g. the unit of measure "miles" can be given to observations.


> o Figure. Not sure I see the value of this figure. Not all
> observations
> are made by a person. Aggregate statistics are not observations in
> this
> sense.

Removed the figure. I can add a figure illustrating the example described above, if useful.

> ## 3.1
> 
> o The figure here (it would help if the figures were numbered) seems
> to
> be a direct copy of that in reference [SMDX 2.1]. I'm not sure what
> the
> W3C processes would be for obtaining and registering clearance for
> such
> reproduction but suggest that we simply not go there. Either drop the
> diagram or do a new one.  The flows in the current diagram are
> confusing
> anyway.

Removed figure. Use case should still be comprehensible.

> 
> ## 3.2
> 
> o The requirement listed is not a requirement of this use case. COINS
> works fine based on slices. I would be inclined to label this as "No
> additional requirements beyond Data Cube". I would instead add a
> subsection about "Lessons" and for COINS the lessen is that data cube
> can be successfully used for publishing financial data, not just
> statistics.

I added this as a (solved) challenge: "Although not originally not intended, the data cube vocabulary could be successfully used for publishing financial data,	not just statistics." 

But I can still add it as an own lesson, if more appropriate.

> 
> ## 3.3
> 
> o This use case seems to be a mix of a specific application (Dutch
> historical census) and a generic use case (publishing spreadsheets).
> It
> would be clearer if picked one or the other as the framing. I would
> suggest the more concrete one.

You are right, but for the Google Public Data Explorer it is similar. I think the more general Excel Spreadsheet use case for now makes sense as it is (although it surely can be improved).

> 
> o It is not clear from the write up why the first requirement emerges
> from this use case. Why does this data need non-SKOS hierarchies?

This now shall simply refer to the lesson: "Publishers may need more guidance to decide which representation of hierarchies is most suitable for their use case".

> 
> ## 3.4
> 
> o In the Turtle example the URIs need "<" replacing by "&lt;" so they
> show up. The sdmx prefix probably should be defined, though maybe the
> claim that this is "pseduo-turtle" is sufficient to duck that.

Done.

> 
> ## 3.5
> 
> o This example does not really motivate the first requirement
> (relationship to O&M). This publisher does not use ISO19156 here and
> does not care about the relationship. In fact this is not directly a
> sensor network problem (the data is the result of lab-based analysis,
> validation and classification before publication).

Still I think we are publishing statistics about sensor data which may lead to the question of how other potential publishers of sensor data statistics may use the vocabulary. Since we have rephrased the requirements into lessons, the hopefully suits you better.

> 
> o If you do add "Lessons" sections for some of the examples then the
> lesson here is that "Data Cube can be successfully use for
> observation
> and measurement data, as well as statistical data".

Similar to COINS, I would add this as a (solved) challenge: "Although not originally not intended, the data cube vocabulary could be successfully used for publishing observation and measurement data, not just statistics." 

> 
> ## 3.6
> 
> o Trivial s/have to/have too/

Done.

> 
> o Qcrumb.com is not explained or linked to, and I couldn't find any
> useful information via Google.

Qcrumb.com actually simply is a web based triple store to dynamically load RDF data into and query. It is not of high relevance to the use case. Rather, the file structure is relevant, so I would add "Eurostat Linked Data Wrapper provides resolvable URIs to datasets that return all observations of the dataset. Also, every dataset serves the URI of its data structure definition (dsd). The dsd URI returns all RDF describing the dataset. Publishing the data in this way allows for example to first gather the dsd and only for actual query execution resolve ds URIs."

> 
> o The first requirement seems to be about publishing advice or
> tooling
> requirements, not a requirement on the vocabulary design.

Requirements changed to lessons.

> 
> ## 3.7
> 
> o Is copyright OK on that diagram? Seems possible in that case.

Not fully sure, so I would remove it.

> 
> ## 3.8
> 
> o Is copyright OK on those diagrams? Seems possible in that case.

Here it should be ok, I have created those screenshots and demos.

> 
> o Not quite clear how that requirement flows from the use case, but
> can
> believe it does.

Requirements changed to lessons.

> 
> ## 3.9
> 
> o The requirement does not flow from the use case. It seems like the
> requirement that would follow is "Develop a mapping between Data Cube
> as
> DSPL". That would be a reasonable requirement for the Data Cube
> eco-system but not one for the vocabulary itself.
> 

I could either describe that as a challenge (similar as for COINS and Water Bathing UCs) or as a lesson. Challenge for now would require less work.

> ## 3.11
> 
> o The requirement is reasonable but it is not a requirement on Data
> Cube
> vocabulary.

Requirements changed to lessons.

> 
> ## 4
> 
> o Suggest separating this section into requirements on the Data Cube
> vocabulary update and other associated requirements (4.8, 4.9, maybe
> DSPL one from 3.9).

Requirements changed to lessons. I do not separate lessons about future work of the data cube vocabulary from future work on tools and services complementing the vocabulary.

> 
> ## 4.1
> 
> o This requirement is confusing. It is not a requirement of the GLD
> work
> to build upon SMDX, Data Cube was already built upon SDMX. If we list
> that they would would need to list all the other requirements that
> pre-GLD Data Cube had to meet.

Requirements changed to lessons. In the lesson, I focus on SDMX 2.1 now.

>    I understand you to be saying there is a putative requirement to
> update to SDMX 2.1 if there are specific use cases that demand it. If
> so
> then should be made clearer in the title and drop the link to COINS -
> there is no motivation to use SDMX 2.1 from the COINS use case.

Requirements changed to lessons.

Best,

Benedikt

[1] <https://dvcs.w3.org/hg/gld/raw-file/default/data-cube-ucr/index.html>

________________________________________
Von: Dave Reynolds [dave.e.reynolds@gmail.com]
Gesendet: Donnerstag, 23. Mai 2013 15:14
An: Government Linked Data Working Group
Betreff: [QB-UCR] Review comments

I'm sorry about this Benedikt but having finally cleared some time to
look at the document I don't think it is ready to go.

# High level comments

1. The requirements identified here are primarily those for additional
work within GLD and not for Data Cube itself. That needs to be made
clearer. At a minimum by improving the phrasing in the introduction.

2. Some requirements here are not requirements for the data cube
vocabulary but for associated tools/services which have not been
considered as part of our work. They should either not be here or they
need to be pulled out separately (specifically 4.8 and 4.9).

3. In some of the use cases it's not clear how the use case motivates
the requirement associated with it.

4. A number of the diagrams appear to have been copy from other
publications. This may be a copyright violation. Publication can not
proceed until this at least is resolved.

5. Given the structure of the document it is not obvious that ACTION-92
makes sense. Need to either add a new use case in which the O&M
relationship can then be illustrated (e.g. the MetOffice usage) or
revert to having the O&M discussion as a separate note or drop it. I'm
inclined towards the first of these but will need to think more about it.

6. Some of the "use cases" represent actual deployed systems. For those
then it may be appropriate to highlight "Lessons" or "Insights" rather
than just focus on unmet requirements.

# Detailed/minor comments (ordered and numbered by section)

## 1

o The first paragraph needs to be clarified to make the nature of the
requirements clearer (high level comment #1).

## 1.1

o Second bullet. s/"dimensions", e.g., the specific phenomenon "weight"/
"dimensions", the specific phenomenon e.g. "weight"/  (otherwise it
sounds like the measure is an example of a dimension)

o Figure. Not sure I see the value of this figure. Not all observations
are made by a person. Aggregate statistics are not observations in this
sense.

## 3.1

o The figure here (it would help if the figures were numbered) seems to
be a direct copy of that in reference [SMDX 2.1]. I'm not sure what the
W3C processes would be for obtaining and registering clearance for such
reproduction but suggest that we simply not go there. Either drop the
diagram or do a new one.  The flows in the current diagram are confusing
anyway.

## 3.2

o The requirement listed is not a requirement of this use case. COINS
works fine based on slices. I would be inclined to label this as "No
additional requirements beyond Data Cube". I would instead add a
subsection about "Lessons" and for COINS the lessen is that data cube
can be successfully used for publishing financial data, not just statistics.

## 3.3

o This use case seems to be a mix of a specific application (Dutch
historical census) and a generic use case (publishing spreadsheets). It
would be clearer if picked one or the other as the framing. I would
suggest the more concrete one.

o It is not clear from the write up why the first requirement emerges
from this use case. Why does this data need non-SKOS hierarchies?

## 3.4

o In the Turtle example the URIs need "<" replacing by "&lt;" so they
show up. The sdmx prefix probably should be defined, though maybe the
claim that this is "pseduo-turtle" is sufficient to duck that.

## 3.5

o This example does not really motivate the first requirement
(relationship to O&M). This publisher does not use ISO19156 here and
does not care about the relationship. In fact this is not directly a
sensor network problem (the data is the result of lab-based analysis,
validation and classification before publication).

o If you do add "Lessons" sections for some of the examples then the
lesson here is that "Data Cube can be successfully use for observation
and measurement data, as well as statistical data".

## 3.6

o Trivial s/have to/have too/

o Qcrumb.com is not explained or linked to, and I couldn't find any
useful information via Google.

o The first requirement seems to be about publishing advice or tooling
requirements, not a requirement on the vocabulary design.

## 3.7

o Is copyright OK on that diagram? Seems possible in that case.

## 3.8

o Is copyright OK on those diagrams? Seems possible in that case.

o Not quite clear how that requirement flows from the use case, but can
believe it does.

## 3.9

o The requirement does not flow from the use case. It seems like the
requirement that would follow is "Develop a mapping between Data Cube as
DSPL". That would be a reasonable requirement for the Data Cube
eco-system but not one for the vocabulary itself.

## 3.11

o The requirement is reasonable but it is not a requirement on Data Cube
vocabulary.

## 4

o Suggest separating this section into requirements on the Data Cube
vocabulary update and other associated requirements (4.8, 4.9, maybe
DSPL one from 3.9).

## 4.1

o This requirement is confusing. It is not a requirement of the GLD work
to build upon SMDX, Data Cube was already built upon SDMX. If we list
that they would would need to list all the other requirements that
pre-GLD Data Cube had to meet.
   I understand you to be saying there is a putative requirement to
update to SDMX 2.1 if there are specific use cases that demand it. If so
then should be made clearer in the title and drop the link to COINS -
there is no motivation to use SDMX 2.1 from the COINS use case.


Dave
Received on Tuesday, 28 May 2013 11:50:04 UTC