Re: Absence of mention of units of measure for columns is very surprising from Jeremy Tandy on 2015-09-22 (public-csv-wg-comments@w3.org from September 2015)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Tue, 22 Sep 2015 07:57:48 +0000
To: Simon Cox <dr.shorthair@gmail.com>, Gregg Kellogg <gregg@greggkellogg.net>
Cc: public-csv-wg-comments@w3.org
Message-ID: <CADtUq_1OkmDiMg_kVD1Y3v-w_=haJa2P2K_tnF8qO1GBFrLpaw@mail.gmail.com>
Hi Simon-

One of my key goals for CSV on the Web was to be able to convey CSV-encoded
environmental data as RDF. This means that getting the units of measure
explicitly mentioned is an important part of that information.

As Gregg points out, there is not a single convention for expressing units
of measure. Gregg defines two, and then there's the RDF Data Cube mechanism
of 'attaching' attributes (such as UoM) to the columns of data.

I am satisfied that I _can_ express units of measure in the CSV-metadata
... but agree that it's not entirely straightforward. Your insight into the
starting point of a scientist wanting to publish data is useful.

I think that the best way forward here is:
i) to ensure that the Primer that we plan to produce to accompany the
Recommendations has a section on "representing scaled values that have
units of measure" where we can illustrate each of the  3 mechanisms
outlined above.
ii) add a _note_ into the model document [1] (I think this is the best
place) indicating that units of measure are not formally part of the
tabular data model but that they can be incorporated in a number of ways
... perhaps with a reference to the Primer section.

Would that be sufficient to resolve your concerns? (at least in the interim
whilst there is no single convention for describing units of measure in RDF)

Jeremy

[1]: http://www.w3.org/TR/tabular-data-model/

On Mon, 21 Sep 2015 at 17:57 Simon Cox <dr.shorthair@gmail.com> wrote:

> Thanks Gregg. Yes, you are correct that there is not a uniform convention
> on how to associate a uom with a value in RDF.
> It could be argued that this is a fundamental gap between computer science
> and the real world - in nature there are no floats, reals and doubles, just
> values that are expressed as scaled numbers - the scaling factor or unit of
> measure is essential in their evaluation ;-)
>
> But back to the practical matter: there is also a lot of variety in the
> specification and designation of the 'property' associated with a cell.
> This is usually common down a column, hence the x-ref in the spec from
> column annotations to cells on the topic of 'proeprty URL'.
> I guess I don't fully understand why you deal with the one and not the
> other.
> The fact that the spec is silent on units is the surprise, and risks
> sending users looking to solve their whole problem elsewhere, which would
> not be good outcome.
>
> Is there room for a statement of 'best practice' or maybe even an
> enumeration of some alternatives?
>
> Simon
>
> On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net>
> wrote:
>
>> > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com> wrote:
>> >
>> > I am involved with the Research Data Alliance activity on Data Types
>> and Registries.
>> > The goal of this is to
>> > (i) develop a format/model for the description of the structure of
>> datasets
>> > (ii) allow the descriptions to be registered, so they can be referred
>> to.
>> > kinda like enhanced MIME-types, so that client applications know what's
>> inside a dataset, not just the file format.
>> > A prototype has already been developed by CNRI, with a test deployment.
>> >
>> > There is clearly a significant shared concern with CSV on the web, so
>> in preparation for meetings next week I consulted the Candidate Specs,
>> particularly the "Model for Tabular Data and Metadata on the Web". I have
>> not read the full suite of documents in detail, but was surprised to find
>> that 'units of measure' is not mentioned in the set of 'core annotations'
>> for columns http://www.w3.org/TR/tabular-data-model/#columns (in most
>> tables data in a single column will have a common unito of measure).
>> >
>> > I raised this with Jeremy, and he showed me the route which can be
>> followed, by adding a column or traversing through the QB vocabulary.
>> > However, this is complicated, and not made immediately available or
>> even flagged in the text.
>> > I strongly suggest
>> > (i) at least alerting readers to how this very common requirement can
>> be managed
>> > (ii) better still, consider adding uom as a standard column annotation.
>>
>> Just my perspective, but I think the issue is that there is no one
>> standard way of describing units in RDF data. As the basic data model used
>> by CSV on the Web closely corresponds to RDF, the fact that literal values
>> extracted from CSV cells don’t have more dimensions is related to this
>> underlying lack of a data model for describing data with units.
>>
>> Searching for this indicated a couple of different ways to handle it:
>>
>> * Define an OWL datatype which describes the values with units (see
>> http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf
>> )
>>
>> unit:megaPascal rdf:type   rdfs:datatype ;
>>                 rdfs:label "MPa" .
>>
>> unit:Pascal rdf:type   rdfs:datatype ;
>>                 rdfs:label "Pa" .
>>
>> :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal .
>> :AlMg3 prop:hasYieldStrength   "2"^^unit:Pascal .
>>
>> QUDT (http://www.openphacts.org/specs/units/) also describes similar
>> methods.
>>
>> CSVW already supports this by allowing an arbitrary datatype using the
>> @id field on a datatype (see
>> http://www.w3.org/TR/tabular-metadata/#datatypes).
>>
>> @id If included, @id is a link property that identifies the datatype
>> described by this datatype description. The value of this property becomes
>> the id annotation for the described datatype. It must not start with _: and
>> it must not be the URL of a built-in datatype.
>>
>> * Use a structured value to represent the data, for example:
>>
>> :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units unit:MegaPascal] .
>>
>> This can be supported using the virtual columns feature, which allows
>> relationships to be created and allocate columns to different values. This
>> might also be useful when the units varied on each row.
>>
>> I think describing a use case for this, and using this as an informative
>> example in one of the documents, or a primer would be a good way to
>> approach this right now. As common practice emerges, this could be
>> incorporated into a future version of these specs, but this should be done
>> in harmony with describing a standard way of describing dimensional data in
>> RDF and JSON.
>>
>> Gregg
>>
>> > Simon Cox
>> > CSIRO, co-convenor of RDA Data Types activity.
>>
>>
>
Received on Tuesday, 22 September 2015 07:58:32 UTC