- From: Simon Cox <dr.shorthair@gmail.com>
- Date: Mon, 15 Feb 2016 06:21:03 +1100
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: public-csv-wg-comments@w3.org
- Message-ID: <CAMcwmNx6TzZ7VrS3716jRA+O6q-WWu6ROwfy+ZpK0UsRCKNbhQ@mail.gmail.com>
Thanks Jeni - We probably need to wait for some adoption and best practice to emerge from here. Simon On 15 February 2016 at 04:55, Jeni Tennison <jeni@jenitennison.com> wrote: > Hi Simon, > > I’ve attempted to address the missing pieces here: > > https://github.com/w3c/csvw/pull/818 > > by including information about how to add details about the quantity being > measured by the column as well as its unit. > > Do you think this is sufficient? > > Thanks, > > Jeni > > > On 14 Feb 2016, at 06:51, Simon Cox <dr.shorthair@gmail.com> wrote: > > > > Thanks Jeni > > > > From what I can see you have provided some guidance that was missing > before. > > However, I don't think it is complete yet. > > > > Firstly, I'd like to have seen 'units of measure' discussed in a wider > context, of 'reference systems and scales'. Most tabular data (which I > think is the scope of CSVW) is made up of columns which don't only share > semantics (the column label), but whose value uses a common reference > system. Term values are usually taken from a common vocabulary (aka > 'nominal reference system') which may also be ordered (i.e 'ordinal > reference system', e.g. geologic timescale). Numeric values may be > quantitative (with a unit of measure or scale) or positional (also with a > datum and direction, as well as scale). A general treatment would enable > any of these kinds of reference system to be associated with all the values > in a column. This unifies consideration of a ubiquitous feature of tabular > data. Given where you have gotten to with CSVW I understand this would have > to wait for a second edition. > > > > Second, your discussion omits the central notion of 'quantity-type'. The > quantity-type provides the dimensionality (e.g. 'length', usually > symbolised L in dimensional analysis). Each quantity type can use one of a > number of scales or units of measure. The semantics in context (e.g. > property names like distance, offset, height, wavelength, which is what > CSVW has focussed on), is something else again, also having a many-to-one > relationship with the underlying quantity-type (length). While > quantity-type can be inferred from the property-type, a complete analysis > requires mention of all three. > > > > Finally, in section 6.1.2 you show how a specific named datatype can be > applied. This is cute, but I'm a little uneasy as it does not tie back to > external definitions of either the unit or the quantity-type, and I don't > see how it can ... > > > > Simon > > > > On 14 February 2016 at 03:48, Jeni Tennison <jeni@jenitennison.com> > wrote: > > Hi Simon, > > > > I just wanted to flag that there’s a section in the Primer here: > > > > http://w3c.github.io/csvw/primer/#units-of-measure > > > > that talks explicitly about how units of measure can be handled in > various ways. > > > > It would be great if you could take the time to cast your eyes over it > and see if there’s anything that’s unclear or missing in the explanation. > > > > Thanks, > > > > Jeni > > > > > On 21 Sep 2015, at 17:56, Simon Cox <dr.shorthair@gmail.com> wrote: > > > > > > Thanks Gregg. Yes, you are correct that there is not a uniform > convention on how to associate a uom with a value in RDF. > > > It could be argued that this is a fundamental gap between computer > science and the real world - in nature there are no floats, reals and > doubles, just values that are expressed as scaled numbers - the scaling > factor or unit of measure is essential in their evaluation ;-) > > > > > > But back to the practical matter: there is also a lot of variety in > the specification and designation of the 'property' associated with a cell. > > > This is usually common down a column, hence the x-ref in the spec from > column annotations to cells on the topic of 'proeprty URL'. > > > I guess I don't fully understand why you deal with the one and not the > other. > > > The fact that the spec is silent on units is the surprise, and risks > sending users looking to solve their whole problem elsewhere, which would > not be good outcome. > > > > > > Is there room for a statement of 'best practice' or maybe even an > enumeration of some alternatives? > > > > > > Simon > > > > > > On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net> > wrote: > > > > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com> > wrote: > > > > > > > > I am involved with the Research Data Alliance activity on Data Types > and Registries. > > > > The goal of this is to > > > > (i) develop a format/model for the description of the structure of > datasets > > > > (ii) allow the descriptions to be registered, so they can be > referred to. > > > > kinda like enhanced MIME-types, so that client applications know > what's inside a dataset, not just the file format. > > > > A prototype has already been developed by CNRI, with a test > deployment. > > > > > > > > There is clearly a significant shared concern with CSV on the web, > so in preparation for meetings next week I consulted the Candidate Specs, > particularly the "Model for Tabular Data and Metadata on the Web". I have > not read the full suite of documents in detail, but was surprised to find > that 'units of measure' is not mentioned in the set of 'core annotations' > for columns http://www.w3.org/TR/tabular-data-model/#columns (in most > tables data in a single column will have a common unito of measure). > > > > > > > > I raised this with Jeremy, and he showed me the route which can be > followed, by adding a column or traversing through the QB vocabulary. > > > > However, this is complicated, and not made immediately available or > even flagged in the text. > > > > I strongly suggest > > > > (i) at least alerting readers to how this very common requirement > can be managed > > > > (ii) better still, consider adding uom as a standard column > annotation. > > > > > > Just my perspective, but I think the issue is that there is no one > standard way of describing units in RDF data. As the basic data model used > by CSV on the Web closely corresponds to RDF, the fact that literal values > extracted from CSV cells don’t have more dimensions is related to this > underlying lack of a data model for describing data with units. > > > > > > Searching for this indicated a couple of different ways to handle it: > > > > > > * Define an OWL datatype which describes the values with units (see > http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf > ) > > > > > > unit:megaPascal rdf:type rdfs:datatype ; > > > rdfs:label "MPa" . > > > > > > unit:Pascal rdf:type rdfs:datatype ; > > > rdfs:label "Pa" . > > > > > > :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal . > > > :AlMg3 prop:hasYieldStrength "2"^^unit:Pascal . > > > > > > QUDT (http://www.openphacts.org/specs/units/) also describes similar > methods. > > > > > > CSVW already supports this by allowing an arbitrary datatype using the > @id field on a datatype (see > http://www.w3.org/TR/tabular-metadata/#datatypes). > > > > > > @id If included, @id is a link property that identifies the datatype > described by this datatype description. The value of this property becomes > the id annotation for the described datatype. It must not start with _: and > it must not be the URL of a built-in datatype. > > > > > > * Use a structured value to represent the data, for example: > > > > > > :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units > unit:MegaPascal] . > > > > > > This can be supported using the virtual columns feature, which allows > relationships to be created and allocate columns to different values. This > might also be useful when the units varied on each row. > > > > > > I think describing a use case for this, and using this as an > informative example in one of the documents, or a primer would be a good > way to approach this right now. As common practice emerges, this could be > incorporated into a future version of these specs, but this should be done > in harmony with describing a standard way of describing dimensional data in > RDF and JSON. > > > > > > Gregg > > > > > > > Simon Cox > > > > CSIRO, co-convenor of RDA Data Types activity. > > > > > > > > > > -- > > Jeni Tennison > > http://www.jenitennison.com/ > > > > > > > > > > > > -- > Jeni Tennison > http://www.jenitennison.com/ > > > > >
Received on Sunday, 14 February 2016 19:22:11 UTC