Re: Absence of mention of units of measure for columns is very surprising

On 14 February 2016 at 06:51, Simon Cox <dr.shorthair@gmail.com> wrote:
> Thanks Jeni
>
> From what I can see you have provided some guidance that was missing before.
> However, I don't think it is complete yet.

Quick thought - is there anything in the Data Cube examples in
https://github.com/w3c/csvw/blob/gh-pages/examples/rdf-data-cube-example.md
that address any of the points below? --Dan

> Firstly, I'd like to have seen 'units of measure' discussed in a wider
> context, of 'reference systems and scales'. Most tabular data (which I think
> is the scope of CSVW) is made up of columns which don't only share semantics
> (the column label), but whose value uses a common reference system. Term
> values are usually taken from a common vocabulary (aka 'nominal reference
> system') which may also be ordered (i.e 'ordinal reference system', e.g.
> geologic timescale). Numeric values may be quantitative (with a unit of
> measure or scale) or positional (also with a datum and direction, as well as
> scale). A general treatment would enable any of these kinds of reference
> system to be associated with all the values in a column. This unifies
> consideration of a ubiquitous feature of tabular data. Given where you have
> gotten to with CSVW I understand this would have to wait for a second
> edition.
>
> Second, your discussion omits the central notion of 'quantity-type'. The
> quantity-type provides the dimensionality (e.g. 'length', usually symbolised
> L in dimensional analysis). Each quantity type can use one of a number of
> scales or units of measure. The semantics in context (e.g. property names
> like distance, offset, height, wavelength, which is what CSVW has focussed
> on), is something else again, also having a many-to-one relationship with
> the underlying quantity-type (length). While quantity-type can be inferred
> from the property-type, a complete analysis requires mention of all three.
>
> Finally, in section 6.1.2 you show how a specific named datatype can be
> applied. This is cute, but I'm a little uneasy as it does not tie back to
> external definitions of either the unit or the quantity-type, and I don't
> see how it can ...
>
> Simon
>
> On 14 February 2016 at 03:48, Jeni Tennison <jeni@jenitennison.com> wrote:
>>
>> Hi Simon,
>>
>> I just wanted to flag that there’s a section in the Primer here:
>>
>>   http://w3c.github.io/csvw/primer/#units-of-measure
>>
>> that talks explicitly about how units of measure can be handled in various
>> ways.
>>
>> It would be great if you could take the time to cast your eyes over it and
>> see if there’s anything that’s unclear or missing in the explanation.
>>
>> Thanks,
>>
>> Jeni
>>
>> > On 21 Sep 2015, at 17:56, Simon Cox <dr.shorthair@gmail.com> wrote:
>> >
>> > Thanks Gregg. Yes, you are correct that there is not a uniform
>> > convention on how to associate a uom with a value in RDF.
>> > It could be argued that this is a fundamental gap between computer
>> > science and the real world - in nature there are no floats, reals and
>> > doubles, just values that are expressed as scaled numbers - the scaling
>> > factor or unit of measure is essential in their evaluation ;-)
>> >
>> > But back to the practical matter: there is also a lot of variety in the
>> > specification and designation of the 'property' associated with a cell.
>> > This is usually common down a column, hence the x-ref in the spec from
>> > column annotations to cells on the topic of 'proeprty URL'.
>> > I guess I don't fully understand why you deal with the one and not the
>> > other.
>> > The fact that the spec is silent on units is the surprise, and risks
>> > sending users looking to solve their whole problem elsewhere, which would
>> > not be good outcome.
>> >
>> > Is there room for a statement of 'best practice' or maybe even an
>> > enumeration of some alternatives?
>> >
>> > Simon
>> >
>> > On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net>
>> > wrote:
>> > > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com> wrote:
>> > >
>> > > I am involved with the Research Data Alliance activity on Data Types
>> > > and Registries.
>> > > The goal of this is to
>> > > (i) develop a format/model for the description of the structure of
>> > > datasets
>> > > (ii) allow the descriptions to be registered, so they can be referred
>> > > to.
>> > > kinda like enhanced MIME-types, so that client applications know
>> > > what's inside a dataset, not just the file format.
>> > > A prototype has already been developed by CNRI, with a test
>> > > deployment.
>> > >
>> > > There is clearly a significant shared concern with CSV on the web, so
>> > > in preparation for meetings next week I consulted the Candidate Specs,
>> > > particularly the "Model for Tabular Data and Metadata on the Web". I have
>> > > not read the full suite of documents in detail, but was surprised to find
>> > > that 'units of measure' is not mentioned in the set of 'core annotations'
>> > > for columns http://www.w3.org/TR/tabular-data-model/#columns (in most tables
>> > > data in a single column will have a common unito of measure).
>> > >
>> > > I raised this with Jeremy, and he showed me the route which can be
>> > > followed, by adding a column or traversing through the QB vocabulary.
>> > > However, this is complicated, and not made immediately available or
>> > > even flagged in the text.
>> > > I strongly suggest
>> > > (i) at least alerting readers to how this very common requirement can
>> > > be managed
>> > > (ii) better still, consider adding uom as a standard column
>> > > annotation.
>> >
>> > Just my perspective, but I think the issue is that there is no one
>> > standard way of describing units in RDF data. As the basic data model used
>> > by CSV on the Web closely corresponds to RDF, the fact that literal values
>> > extracted from CSV cells don’t have more dimensions is related to this
>> > underlying lack of a data model for describing data with units.
>> >
>> > Searching for this indicated a couple of different ways to handle it:
>> >
>> > * Define an OWL datatype which describes the values with units (see
>> > http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf)
>> >
>> > unit:megaPascal rdf:type   rdfs:datatype ;
>> >                 rdfs:label "MPa" .
>> >
>> > unit:Pascal rdf:type   rdfs:datatype ;
>> >                 rdfs:label "Pa" .
>> >
>> > :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal .
>> > :AlMg3 prop:hasYieldStrength   "2"^^unit:Pascal .
>> >
>> > QUDT (http://www.openphacts.org/specs/units/) also describes similar
>> > methods.
>> >
>> > CSVW already supports this by allowing an arbitrary datatype using the
>> > @id field on a datatype (see
>> > http://www.w3.org/TR/tabular-metadata/#datatypes).
>> >
>> > @id If included, @id is a link property that identifies the datatype
>> > described by this datatype description. The value of this property becomes
>> > the id annotation for the described datatype. It must not start with _: and
>> > it must not be the URL of a built-in datatype.
>> >
>> > * Use a structured value to represent the data, for example:
>> >
>> > :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units unit:MegaPascal]
>> > .
>> >
>> > This can be supported using the virtual columns feature, which allows
>> > relationships to be created and allocate columns to different values. This
>> > might also be useful when the units varied on each row.
>> >
>> > I think describing a use case for this, and using this as an informative
>> > example in one of the documents, or a primer would be a good way to approach
>> > this right now. As common practice emerges, this could be incorporated into
>> > a future version of these specs, but this should be done in harmony with
>> > describing a standard way of describing dimensional data in RDF and JSON.
>> >
>> > Gregg
>> >
>> > > Simon Cox
>> > > CSIRO, co-convenor of RDA Data Types activity.
>> >
>> >
>>
>> --
>> Jeni Tennison
>> http://www.jenitennison.com/
>>
>>
>>
>>
>

Received on Sunday, 14 February 2016 12:06:41 UTC