- From: Ulrich <ulrich.atz@theodi.org>
- Date: Fri, 5 Apr 2013 12:13:12 +0100
- To: public-gld-comments@w3.org
- Cc: Jeni Tennison <jeni@theodi.org>
- Message-Id: <2B68B904-5CC9-4ED5-9F67-F81A01DD8397@theodi.org>
Dear Editors, Jeni brought this draft to my attention. After reading the document, here are a few comments/suggestions. Bear in mind that I have little experience in W3C standards, so they may seem obvious or irrelevant. Overall comment: Data Cube seems to be geared towards official statistics; other audiences may find it harder to grasp. Your first priority are the comments referring to example 5.3 and similar. Link to the SDMX User Guide 2.1, especially 2.2 Background eases understanding for newcomers. As an applied statistician, in my simple world, I think of datasets in tabular format: rows and columns (+ metadata). E.g. observers values of individuals (rows) across characteristics such as age (columns). Of course, a dataset may also consist of aggregated data. The point is, if the concept of a dataset is used in a more general format, it may be misinterpreted. Make examples earlier. I'd recommend avoiding the term "non-statistical data" as I have only heard it in the context of official statistics. Or what exactly makes data statistical? (see e.g. section 5.1) 2.3 Audience: expand with examples? Section 5.1 "A set of values for all the dimension components is sufficient to identify a single observation." Would that imply there cannot be two individuals with the same characteristics? Or that such as dataset must include an unique ID even if created artificially? What about data that is anonymised? 5.1 may also need more examples, say factor variables such as gender. They are usually stored as binaries or 2/1 and come with a label for female/male. Make explicit what would be the measure and attribute components. Or refer to a later section that addresses the sex. The slice example is good, but could do with a shorter sentence. I'm confused about the use of "metadata" now. Is it metadata about the whole dataset or/and about a single observations? For practical use example 5.3 is unwieldy; the long format is arguably more common. [1] What we see here is I'd refer to as a data table not dataset. Most statistical programs only read data in tabular format. Perhaps include what happens to the metadata in example 5.3 as well (" StatsWales report number 003311" etc.) Example 5.3 -- can we have some actual final code in there? Even if it anticipates some sections. Example 6.3 I find it hard to see where we define the nested structure of the data - include reference to example 4 or call it something more telling than "example". Section 7. (before 7.1 - generally I'd avoid sections without subheaders) Can these definitions come earlier? -- suddenly explained a lot more. So qb:dataSet and qb:DataSet are different… Unfortunately, I cannot comment on section 10 and 11. Reading this guide, it might be relatively easy to provide a tool (algorithm) which translates a simple and "well-behaved" dataset into a DataCube syntax. This would greatly invite new users to play around and familiarise themselves with the vocabulary. There may be substantial reasons (e.g. manual specifications) why this is not possible, but I am not aware of the details. Hope that helps, Ulrich --- Ulrich Atz, Statistician at the ODI +44 (0) 20 3598 9395 @panoramadata The ODI, 65 Clifton Street, London EC2A 4JE [1] The data table in example 5.3 as a classical dataset: Long format Region Years Male Female Newport 2004-2006 76.7 80.7 Cardiff 2004-2006 78.7 83.3 Monmouthshire 2004-2006 76.6 81.3 Merthyr Tydfil 2004-2006 75.5 79.1 Newport 2005-2007 77.1 80.9 Cardiff 2005-2007 78.6 83.7 Monmouthshire 2005-2007 76.5 81.5 Merthyr Tydfil 2005-2007 75.5 79.4 Newport 2006-2008 77.0 81.5 Cardiff 2006-2008 78.7 83.4 Monmouthshire 2006-2008 76.6 81.7 Merthyr Tydfil 2006-2008 74.9 79.6 And the less common wide format Region Male2004-2006 Female2004-2006 Male2005-2007 Female2005-2007 Male2006-2008 Female2006-2008 Newport 76.7 80.7 77.1 80.9 77.0 81.5 Cardiff 78.7 83.3 78.6 83.7 78.7 83.4 Monmouthshire 76.6 81.3 76.5 81.5 76.6 81.7 Merthyr Tydfil 75.5 79.1 75.5 79.4 74.9 79.6
Received on Friday, 5 April 2013 20:54:54 UTC