Re: Absence of mention of units of measure for columns is very surprising

Hi Simon,

I just wanted to flag that there’s a section in the Primer here:

  http://w3c.github.io/csvw/primer/#units-of-measure

that talks explicitly about how units of measure can be handled in various ways.

It would be great if you could take the time to cast your eyes over it and see if there’s anything that’s unclear or missing in the explanation.

Thanks,

Jeni

> On 21 Sep 2015, at 17:56, Simon Cox <dr.shorthair@gmail.com> wrote:
> 
> Thanks Gregg. Yes, you are correct that there is not a uniform convention on how to associate a uom with a value in RDF.
> It could be argued that this is a fundamental gap between computer science and the real world - in nature there are no floats, reals and doubles, just values that are expressed as scaled numbers - the scaling factor or unit of measure is essential in their evaluation ;-)
> 
> But back to the practical matter: there is also a lot of variety in the specification and designation of the 'property' associated with a cell.
> This is usually common down a column, hence the x-ref in the spec from column annotations to cells on the topic of 'proeprty URL'.
> I guess I don't fully understand why you deal with the one and not the other.
> The fact that the spec is silent on units is the surprise, and risks sending users looking to solve their whole problem elsewhere, which would not be good outcome.
> 
> Is there room for a statement of 'best practice' or maybe even an enumeration of some alternatives?
> 
> Simon
> 
> On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net> wrote:
> > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com> wrote:
> >
> > I am involved with the Research Data Alliance activity on Data Types and Registries.
> > The goal of this is to
> > (i) develop a format/model for the description of the structure of datasets
> > (ii) allow the descriptions to be registered, so they can be referred to.
> > kinda like enhanced MIME-types, so that client applications know what's inside a dataset, not just the file format.
> > A prototype has already been developed by CNRI, with a test deployment.
> >
> > There is clearly a significant shared concern with CSV on the web, so in preparation for meetings next week I consulted the Candidate Specs, particularly the "Model for Tabular Data and Metadata on the Web". I have not read the full suite of documents in detail, but was surprised to find that 'units of measure' is not mentioned in the set of 'core annotations' for columns http://www.w3.org/TR/tabular-data-model/#columns (in most tables data in a single column will have a common unito of measure).
> >
> > I raised this with Jeremy, and he showed me the route which can be followed, by adding a column or traversing through the QB vocabulary.
> > However, this is complicated, and not made immediately available or even flagged in the text.
> > I strongly suggest
> > (i) at least alerting readers to how this very common requirement can be managed
> > (ii) better still, consider adding uom as a standard column annotation.
> 
> Just my perspective, but I think the issue is that there is no one standard way of describing units in RDF data. As the basic data model used by CSV on the Web closely corresponds to RDF, the fact that literal values extracted from CSV cells don’t have more dimensions is related to this underlying lack of a data model for describing data with units.
> 
> Searching for this indicated a couple of different ways to handle it:
> 
> * Define an OWL datatype which describes the values with units (see http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf)
> 
> unit:megaPascal rdf:type   rdfs:datatype ;
>                 rdfs:label "MPa" .
> 
> unit:Pascal rdf:type   rdfs:datatype ;
>                 rdfs:label "Pa" .
> 
> :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal .
> :AlMg3 prop:hasYieldStrength   "2"^^unit:Pascal .
> 
> QUDT (http://www.openphacts.org/specs/units/) also describes similar methods.
> 
> CSVW already supports this by allowing an arbitrary datatype using the @id field on a datatype (see http://www.w3.org/TR/tabular-metadata/#datatypes).
> 
> @id If included, @id is a link property that identifies the datatype described by this datatype description. The value of this property becomes the id annotation for the described datatype. It must not start with _: and it must not be the URL of a built-in datatype.
> 
> * Use a structured value to represent the data, for example:
> 
> :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units unit:MegaPascal] .
> 
> This can be supported using the virtual columns feature, which allows relationships to be created and allocate columns to different values. This might also be useful when the units varied on each row.
> 
> I think describing a use case for this, and using this as an informative example in one of the documents, or a primer would be a good way to approach this right now. As common practice emerges, this could be incorporated into a future version of these specs, but this should be done in harmony with describing a standard way of describing dimensional data in RDF and JSON.
> 
> Gregg
> 
> > Simon Cox
> > CSIRO, co-convenor of RDA Data Types activity.
> 
> 

--
Jeni Tennison
http://www.jenitennison.com/

Received on Saturday, 13 February 2016 16:49:15 UTC