Re: Absence of mention of units of measure for columns is very surprising

Thanks Jeni

>From what I can see you have provided some guidance that was missing
before.
However, I don't think it is complete yet.

Firstly, I'd like to have seen 'units of measure' discussed in a wider
context, of 'reference systems and scales'. Most tabular data (which I
think is the scope of CSVW) is made up of columns which don't only share
semantics (the column label), but whose value uses a common reference
system. Term values are usually taken from a common vocabulary (aka
'nominal reference system') which may also be ordered (i.e 'ordinal
reference system', e.g. geologic timescale). Numeric values may be
quantitative (with a unit of measure or scale) or positional (also with a
datum and direction, as well as scale). A general treatment would enable
any of these kinds of reference system to be associated with all the values
in a column. This unifies consideration of a ubiquitous feature of tabular
data. Given where you have gotten to with CSVW I understand this would have
to wait for a second edition.

Second, your discussion omits the central notion of 'quantity-type'. The
quantity-type provides the dimensionality (e.g. 'length', usually
symbolised L in dimensional analysis). Each quantity type can use one of a
number of scales or units of measure. The semantics in context (e.g.
property names like distance, offset, height, wavelength, which is what
CSVW has focussed on), is something else again, also having a many-to-one
relationship with the underlying quantity-type (length). While
quantity-type can be inferred from the property-type, a complete analysis
requires mention of all three.

Finally, in section 6.1.2 you show how a specific named datatype can be
applied. This is cute, but I'm a little uneasy as it does not tie back to
external definitions of either the unit or the quantity-type, and I don't
see how it can ...

Simon

On 14 February 2016 at 03:48, Jeni Tennison <jeni@jenitennison.com> wrote:

> Hi Simon,
>
> I just wanted to flag that there’s a section in the Primer here:
>
>   http://w3c.github.io/csvw/primer/#units-of-measure
>
> that talks explicitly about how units of measure can be handled in various
> ways.
>
> It would be great if you could take the time to cast your eyes over it and
> see if there’s anything that’s unclear or missing in the explanation.
>
> Thanks,
>
> Jeni
>
> > On 21 Sep 2015, at 17:56, Simon Cox <dr.shorthair@gmail.com> wrote:
> >
> > Thanks Gregg. Yes, you are correct that there is not a uniform
> convention on how to associate a uom with a value in RDF.
> > It could be argued that this is a fundamental gap between computer
> science and the real world - in nature there are no floats, reals and
> doubles, just values that are expressed as scaled numbers - the scaling
> factor or unit of measure is essential in their evaluation ;-)
> >
> > But back to the practical matter: there is also a lot of variety in the
> specification and designation of the 'property' associated with a cell.
> > This is usually common down a column, hence the x-ref in the spec from
> column annotations to cells on the topic of 'proeprty URL'.
> > I guess I don't fully understand why you deal with the one and not the
> other.
> > The fact that the spec is silent on units is the surprise, and risks
> sending users looking to solve their whole problem elsewhere, which would
> not be good outcome.
> >
> > Is there room for a statement of 'best practice' or maybe even an
> enumeration of some alternatives?
> >
> > Simon
> >
> > On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net>
> wrote:
> > > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com> wrote:
> > >
> > > I am involved with the Research Data Alliance activity on Data Types
> and Registries.
> > > The goal of this is to
> > > (i) develop a format/model for the description of the structure of
> datasets
> > > (ii) allow the descriptions to be registered, so they can be referred
> to.
> > > kinda like enhanced MIME-types, so that client applications know
> what's inside a dataset, not just the file format.
> > > A prototype has already been developed by CNRI, with a test deployment.
> > >
> > > There is clearly a significant shared concern with CSV on the web, so
> in preparation for meetings next week I consulted the Candidate Specs,
> particularly the "Model for Tabular Data and Metadata on the Web". I have
> not read the full suite of documents in detail, but was surprised to find
> that 'units of measure' is not mentioned in the set of 'core annotations'
> for columns http://www.w3.org/TR/tabular-data-model/#columns (in most
> tables data in a single column will have a common unito of measure).
> > >
> > > I raised this with Jeremy, and he showed me the route which can be
> followed, by adding a column or traversing through the QB vocabulary.
> > > However, this is complicated, and not made immediately available or
> even flagged in the text.
> > > I strongly suggest
> > > (i) at least alerting readers to how this very common requirement can
> be managed
> > > (ii) better still, consider adding uom as a standard column annotation.
> >
> > Just my perspective, but I think the issue is that there is no one
> standard way of describing units in RDF data. As the basic data model used
> by CSV on the Web closely corresponds to RDF, the fact that literal values
> extracted from CSV cells don’t have more dimensions is related to this
> underlying lack of a data model for describing data with units.
> >
> > Searching for this indicated a couple of different ways to handle it:
> >
> > * Define an OWL datatype which describes the values with units (see
> http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf
> )
> >
> > unit:megaPascal rdf:type   rdfs:datatype ;
> >                 rdfs:label "MPa" .
> >
> > unit:Pascal rdf:type   rdfs:datatype ;
> >                 rdfs:label "Pa" .
> >
> > :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal .
> > :AlMg3 prop:hasYieldStrength   "2"^^unit:Pascal .
> >
> > QUDT (http://www.openphacts.org/specs/units/) also describes similar
> methods.
> >
> > CSVW already supports this by allowing an arbitrary datatype using the
> @id field on a datatype (see
> http://www.w3.org/TR/tabular-metadata/#datatypes).
> >
> > @id If included, @id is a link property that identifies the datatype
> described by this datatype description. The value of this property becomes
> the id annotation for the described datatype. It must not start with _: and
> it must not be the URL of a built-in datatype.
> >
> > * Use a structured value to represent the data, for example:
> >
> > :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units unit:MegaPascal]
> .
> >
> > This can be supported using the virtual columns feature, which allows
> relationships to be created and allocate columns to different values. This
> might also be useful when the units varied on each row.
> >
> > I think describing a use case for this, and using this as an informative
> example in one of the documents, or a primer would be a good way to
> approach this right now. As common practice emerges, this could be
> incorporated into a future version of these specs, but this should be done
> in harmony with describing a standard way of describing dimensional data in
> RDF and JSON.
> >
> > Gregg
> >
> > > Simon Cox
> > > CSIRO, co-convenor of RDA Data Types activity.
> >
> >
>
> --
> Jeni Tennison
> http://www.jenitennison.com/
>
>
>
>
>

Received on Sunday, 14 February 2016 06:53:01 UTC