Re: Absence of mention of units of measure for columns is very surprising from Simon Cox on 2016-02-14 (public-csv-wg-comments@w3.org from February 2016)

From: Simon Cox <dr.shorthair@gmail.com>
Date: Mon, 15 Feb 2016 06:21:03 +1100
To: Jeni Tennison <jeni@jenitennison.com>
Cc: public-csv-wg-comments@w3.org
Message-ID: <CAMcwmNx6TzZ7VrS3716jRA+O6q-WWu6ROwfy+ZpK0UsRCKNbhQ@mail.gmail.com>
Thanks Jeni -

We probably need to wait for some adoption and best practice to emerge from
here.

Simon

On 15 February 2016 at 04:55, Jeni Tennison <jeni@jenitennison.com> wrote:

> Hi Simon,
>
> I’ve attempted to address the missing pieces here:
>
>   https://github.com/w3c/csvw/pull/818
>
> by including information about how to add details about the quantity being
> measured by the column as well as its unit.
>
> Do you think this is sufficient?
>
> Thanks,
>
> Jeni
>
> > On 14 Feb 2016, at 06:51, Simon Cox <dr.shorthair@gmail.com> wrote:
> >
> > Thanks Jeni
> >
> > From what I can see you have provided some guidance that was missing
> before.
> > However, I don't think it is complete yet.
> >
> > Firstly, I'd like to have seen 'units of measure' discussed in a wider
> context, of 'reference systems and scales'. Most tabular data (which I
> think is the scope of CSVW) is made up of columns which don't only share
> semantics (the column label), but whose value uses a common reference
> system. Term values are usually taken from a common vocabulary (aka
> 'nominal reference system') which may also be ordered (i.e 'ordinal
> reference system', e.g. geologic timescale). Numeric values may be
> quantitative (with a unit of measure or scale) or positional (also with a
> datum and direction, as well as scale). A general treatment would enable
> any of these kinds of reference system to be associated with all the values
> in a column. This unifies consideration of a ubiquitous feature of tabular
> data. Given where you have gotten to with CSVW I understand this would have
> to wait for a second edition.
> >
> > Second, your discussion omits the central notion of 'quantity-type'. The
> quantity-type provides the dimensionality (e.g. 'length', usually
> symbolised L in dimensional analysis). Each quantity type can use one of a
> number of scales or units of measure. The semantics in context (e.g.
> property names like distance, offset, height, wavelength, which is what
> CSVW has focussed on), is something else again, also having a many-to-one
> relationship with the underlying quantity-type (length). While
> quantity-type can be inferred from the property-type, a complete analysis
> requires mention of all three.
> >
> > Finally, in section 6.1.2 you show how a specific named datatype can be
> applied. This is cute, but I'm a little uneasy as it does not tie back to
> external definitions of either the unit or the quantity-type, and I don't
> see how it can ...
> >
> > Simon
> >
> > On 14 February 2016 at 03:48, Jeni Tennison <jeni@jenitennison.com>
> wrote:
> > Hi Simon,
> >
> > I just wanted to flag that there’s a section in the Primer here:
> >
> >   http://w3c.github.io/csvw/primer/#units-of-measure
> >
> > that talks explicitly about how units of measure can be handled in
> various ways.
> >
> > It would be great if you could take the time to cast your eyes over it
> and see if there’s anything that’s unclear or missing in the explanation.
> >
> > Thanks,
> >
> > Jeni
> >
> > > On 21 Sep 2015, at 17:56, Simon Cox <dr.shorthair@gmail.com> wrote:
> > >
> > > Thanks Gregg. Yes, you are correct that there is not a uniform
> convention on how to associate a uom with a value in RDF.
> > > It could be argued that this is a fundamental gap between computer
> science and the real world - in nature there are no floats, reals and
> doubles, just values that are expressed as scaled numbers - the scaling
> factor or unit of measure is essential in their evaluation ;-)
> > >
> > > But back to the practical matter: there is also a lot of variety in
> the specification and designation of the 'property' associated with a cell.
> > > This is usually common down a column, hence the x-ref in the spec from
> column annotations to cells on the topic of 'proeprty URL'.
> > > I guess I don't fully understand why you deal with the one and not the
> other.
> > > The fact that the spec is silent on units is the surprise, and risks
> sending users looking to solve their whole problem elsewhere, which would
> not be good outcome.
> > >
> > > Is there room for a statement of 'best practice' or maybe even an
> enumeration of some alternatives?
> > >
> > > Simon
> > >
> > > On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net>
> wrote:
> > > > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com>
> wrote:
> > > >
> > > > I am involved with the Research Data Alliance activity on Data Types
> and Registries.
> > > > The goal of this is to
> > > > (i) develop a format/model for the description of the structure of
> datasets
> > > > (ii) allow the descriptions to be registered, so they can be
> referred to.
> > > > kinda like enhanced MIME-types, so that client applications know
> what's inside a dataset, not just the file format.
> > > > A prototype has already been developed by CNRI, with a test
> deployment.
> > > >
> > > > There is clearly a significant shared concern with CSV on the web,
> so in preparation for meetings next week I consulted the Candidate Specs,
> particularly the "Model for Tabular Data and Metadata on the Web". I have
> not read the full suite of documents in detail, but was surprised to find
> that 'units of measure' is not mentioned in the set of 'core annotations'
> for columns http://www.w3.org/TR/tabular-data-model/#columns (in most
> tables data in a single column will have a common unito of measure).
> > > >
> > > > I raised this with Jeremy, and he showed me the route which can be
> followed, by adding a column or traversing through the QB vocabulary.
> > > > However, this is complicated, and not made immediately available or
> even flagged in the text.
> > > > I strongly suggest
> > > > (i) at least alerting readers to how this very common requirement
> can be managed
> > > > (ii) better still, consider adding uom as a standard column
> annotation.
> > >
> > > Just my perspective, but I think the issue is that there is no one
> standard way of describing units in RDF data. As the basic data model used
> by CSV on the Web closely corresponds to RDF, the fact that literal values
> extracted from CSV cells don’t have more dimensions is related to this
> underlying lack of a data model for describing data with units.
> > >
> > > Searching for this indicated a couple of different ways to handle it:
> > >
> > > * Define an OWL datatype which describes the values with units (see
> http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf
> )
> > >
> > > unit:megaPascal rdf:type   rdfs:datatype ;
> > >                 rdfs:label "MPa" .
> > >
> > > unit:Pascal rdf:type   rdfs:datatype ;
> > >                 rdfs:label "Pa" .
> > >
> > > :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal .
> > > :AlMg3 prop:hasYieldStrength   "2"^^unit:Pascal .
> > >
> > > QUDT (http://www.openphacts.org/specs/units/) also describes similar
> methods.
> > >
> > > CSVW already supports this by allowing an arbitrary datatype using the
> @id field on a datatype (see
> http://www.w3.org/TR/tabular-metadata/#datatypes).
> > >
> > > @id If included, @id is a link property that identifies the datatype
> described by this datatype description. The value of this property becomes
> the id annotation for the described datatype. It must not start with _: and
> it must not be the URL of a built-in datatype.
> > >
> > > * Use a structured value to represent the data, for example:
> > >
> > > :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units
> unit:MegaPascal] .
> > >
> > > This can be supported using the virtual columns feature, which allows
> relationships to be created and allocate columns to different values. This
> might also be useful when the units varied on each row.
> > >
> > > I think describing a use case for this, and using this as an
> informative example in one of the documents, or a primer would be a good
> way to approach this right now. As common practice emerges, this could be
> incorporated into a future version of these specs, but this should be done
> in harmony with describing a standard way of describing dimensional data in
> RDF and JSON.
> > >
> > > Gregg
> > >
> > > > Simon Cox
> > > > CSIRO, co-convenor of RDA Data Types activity.
> > >
> > >
> >
> > --
> > Jeni Tennison
> > http://www.jenitennison.com/
> >
> >
> >
> >
> >
>
> --
> Jeni Tennison
> http://www.jenitennison.com/
>
>
>
>
>
Received on Sunday, 14 February 2016 19:22:11 UTC