Re: [Data Cubes] Why this kind of Data Structure Definition from Dave Reynolds on 2012-08-16 (public-gld-comments@w3.org from August 2012)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Thu, 16 Aug 2012 12:31:03 +0100
To: Thomas Bandholtz <thomas.bandholtz@innoq.com>
CC: public-gld-comments@w3.org
Message-ID: <502CD9F7.1040409@gmail.com>
Hi Thomas,

On 15/08/12 12:22, Thomas Bandholtz wrote:
> Am 15.08.2012 12:43, schrieb Dave Reynolds:

>> For the UoM for multi-measure cubes you could use an attribute on the
>> ComponentProperties as I mentioned before.
>
> I did not forget that. But I am looking for a more compact way so I can
> provide multiple measures in a single dataset though they have different
> UOM.

The standard route is reasonably compact, just one line for each measure.

    eg:hatSizeInCm  a qb:MeasureProperty;
        rdfs:range xsd:decimal;
        sdmx-attribute:unitMeasure  unit:Centimeter .

Then to be clean in your DSD have one extra line to declare that that's 
where the unitMeasure is situated for all measures.

    eg:mydataset  qb:structure [
        qb:component [qb:measure   eg:hatSizeInCm; ];
        qb:component [qb:attribute sdmx-attribute:unitMeasure ;
                      qb:componentAttachment qb:MeasureProperty; ]
    ]

> Secondly, the UOM may change per measure in my use case.

> That is why I
> have to provide the UoMs per dataset  (may be even per observation) not
> per measure property.

Having versions of measures for particular UoM seems reasonable to me, 
you always use qb:concept to relate those to a common concept being 
measured.

To vary per observation then you would have to use a single measure per 
observation and attach the attribute to the observation (the normal 
approach).

The alternative is to represent your values as structured objects with 
an associated UoM of instead of as plain literals. For example using 
qudt:QuantityValue.

>> Did you mean things like:
>>      myDef:year    rdfs:domain myDef:MixedObservation .
>>      myDef:age     rdfs:domain myDef:MixedObservation .
>>      myDef:hatSize rdfs:domain myDef:MixedObservation .
>> ?
>
> Exactly. Thanks.
>
>>
>> If so then that kind of works but now you can't use myDef:year etc on
>> another cube of a different shape. For many of our use cases we want
>> to reuse dimension and measure properties across cubes.
>
> I don't see the problem. If I publish such definitions under a distinct
> namespace (MyDef:), anybody can reuse them just like DSDs.

No they cross-contaminate. If you then have a second cube and try to 
define it's structure by, for example:

    myDef:year        rdfs:domain myDef:OtherCubeObservation .
    myDef:population  rdfs:domain myDef:OtherCubeObservation .

Then you now have two domain declarations for myDef:year and every 
instance of myDef:OtherCubeObservation will be inferred to be also a 
myDef:MixedObservation (because it will have myDef:year values) and vice 
versa.

Hence the need for OWL restrictions rather than global domain/range.

>> If the attribute mechanism is proving cumbersome for you then the
>> thing to consider would be to add a qb2:uom property to
>> qb:ComponentSpecification so that you can define the UoM as part of
>> the DSD:
>>
>> myDef:dataset a qb:DataSet ;
>>      qb:structure [ a qb:DataStructureDefinition;
>>        qb:component [qb:dimension myDef:year;    qb2:uom x:years] ;
>>        qb:component [qb:dimension myDef:region;  qb2:uom x:place] ;
>>        qb:component [qb:measure   myDef:hatSize; qb2:uom x:cm] ;
>>         ...
>>       ];
>>
>> Though I'm not yet convinced that is preferable to attaching the UOM
>> directly to the component property.
>
> So I would need a different DSD whenever a measure property switches UoM.
> I thought it would better to handle this on the datset level than in the
> DSD.

Having a new DSD in such cases seems reasonable to me.

> So, to focus this:
>
> What do you think of attaching multiple UoM to multiple measures on the
> dataset level, each by a pair of (measure property, UoM)?

I can see why you've ended up with that proposal but I worry that it is 
too different to the existing mechanism without a strong enough reason 
for the divergence.

The principle at the moment is the the DSD gives you all the information 
you need to interpret the data - it tells you how to determine where the 
observations lie (dimensions), what their values are (measures) and how 
to interpret those values (attributes).

Unit Of Measure is "just" one of the attributes and for single measure 
cubes or for multi-measure cubes where the measures are used uniformly 
the existing attribute mechanism seems to suffice.

I hear you that you would prefer to have the qb:MeasureProperty itself 
to be agnostic about units and associate the unit with the cube somehow.
If we were really to want to support that then the information should 
still be part of the DSD, because that's where the interpretation is 
supposed to be defined. Hence my sketch above.  However, personally I 
prefer the clarity of directly associating the unit with the measure and 
would it need more thought and feedback from other users before going 
down the qb2:uom approach.

I do agree that in a case where you have multiple measures and where the 
UoM is not uniform across the cube then there is a challenge but 
attaching UoM to the DataSet doesn't solve that. See earlier comments on 
how that can be approached.

Does that make sense?

> Thanks for being so patient!

Thanks for sharing your use case and requirements with us.

Dave
Received on Thursday, 16 August 2012 11:31:37 UTC