- From: Dan Brickley <danbri@google.com>
- Date: Sun, 14 Feb 2016 12:06:07 +0000
- To: Simon Cox <dr.shorthair@gmail.com>
- Cc: Jeni Tennison <jeni@jenitennison.com>, public-csv-wg-comments@w3.org
On 14 February 2016 at 06:51, Simon Cox <dr.shorthair@gmail.com> wrote: > Thanks Jeni > > From what I can see you have provided some guidance that was missing before. > However, I don't think it is complete yet. Quick thought - is there anything in the Data Cube examples in https://github.com/w3c/csvw/blob/gh-pages/examples/rdf-data-cube-example.md that address any of the points below? --Dan > Firstly, I'd like to have seen 'units of measure' discussed in a wider > context, of 'reference systems and scales'. Most tabular data (which I think > is the scope of CSVW) is made up of columns which don't only share semantics > (the column label), but whose value uses a common reference system. Term > values are usually taken from a common vocabulary (aka 'nominal reference > system') which may also be ordered (i.e 'ordinal reference system', e.g. > geologic timescale). Numeric values may be quantitative (with a unit of > measure or scale) or positional (also with a datum and direction, as well as > scale). A general treatment would enable any of these kinds of reference > system to be associated with all the values in a column. This unifies > consideration of a ubiquitous feature of tabular data. Given where you have > gotten to with CSVW I understand this would have to wait for a second > edition. > > Second, your discussion omits the central notion of 'quantity-type'. The > quantity-type provides the dimensionality (e.g. 'length', usually symbolised > L in dimensional analysis). Each quantity type can use one of a number of > scales or units of measure. The semantics in context (e.g. property names > like distance, offset, height, wavelength, which is what CSVW has focussed > on), is something else again, also having a many-to-one relationship with > the underlying quantity-type (length). While quantity-type can be inferred > from the property-type, a complete analysis requires mention of all three. > > Finally, in section 6.1.2 you show how a specific named datatype can be > applied. This is cute, but I'm a little uneasy as it does not tie back to > external definitions of either the unit or the quantity-type, and I don't > see how it can ... > > Simon > > On 14 February 2016 at 03:48, Jeni Tennison <jeni@jenitennison.com> wrote: >> >> Hi Simon, >> >> I just wanted to flag that there’s a section in the Primer here: >> >> http://w3c.github.io/csvw/primer/#units-of-measure >> >> that talks explicitly about how units of measure can be handled in various >> ways. >> >> It would be great if you could take the time to cast your eyes over it and >> see if there’s anything that’s unclear or missing in the explanation. >> >> Thanks, >> >> Jeni >> >> > On 21 Sep 2015, at 17:56, Simon Cox <dr.shorthair@gmail.com> wrote: >> > >> > Thanks Gregg. Yes, you are correct that there is not a uniform >> > convention on how to associate a uom with a value in RDF. >> > It could be argued that this is a fundamental gap between computer >> > science and the real world - in nature there are no floats, reals and >> > doubles, just values that are expressed as scaled numbers - the scaling >> > factor or unit of measure is essential in their evaluation ;-) >> > >> > But back to the practical matter: there is also a lot of variety in the >> > specification and designation of the 'property' associated with a cell. >> > This is usually common down a column, hence the x-ref in the spec from >> > column annotations to cells on the topic of 'proeprty URL'. >> > I guess I don't fully understand why you deal with the one and not the >> > other. >> > The fact that the spec is silent on units is the surprise, and risks >> > sending users looking to solve their whole problem elsewhere, which would >> > not be good outcome. >> > >> > Is there room for a statement of 'best practice' or maybe even an >> > enumeration of some alternatives? >> > >> > Simon >> > >> > On 21 September 2015 at 16:19, Gregg Kellogg <gregg@greggkellogg.net> >> > wrote: >> > > On Sep 18, 2015, at 7:52 AM, Simon Cox <dr.shorthair@gmail.com> wrote: >> > > >> > > I am involved with the Research Data Alliance activity on Data Types >> > > and Registries. >> > > The goal of this is to >> > > (i) develop a format/model for the description of the structure of >> > > datasets >> > > (ii) allow the descriptions to be registered, so they can be referred >> > > to. >> > > kinda like enhanced MIME-types, so that client applications know >> > > what's inside a dataset, not just the file format. >> > > A prototype has already been developed by CNRI, with a test >> > > deployment. >> > > >> > > There is clearly a significant shared concern with CSV on the web, so >> > > in preparation for meetings next week I consulted the Candidate Specs, >> > > particularly the "Model for Tabular Data and Metadata on the Web". I have >> > > not read the full suite of documents in detail, but was surprised to find >> > > that 'units of measure' is not mentioned in the set of 'core annotations' >> > > for columns http://www.w3.org/TR/tabular-data-model/#columns (in most tables >> > > data in a single column will have a common unito of measure). >> > > >> > > I raised this with Jeremy, and he showed me the route which can be >> > > followed, by adding a column or traversing through the QB vocabulary. >> > > However, this is complicated, and not made immediately available or >> > > even flagged in the text. >> > > I strongly suggest >> > > (i) at least alerting readers to how this very common requirement can >> > > be managed >> > > (ii) better still, consider adding uom as a standard column >> > > annotation. >> > >> > Just my perspective, but I think the issue is that there is no one >> > standard way of describing units in RDF data. As the basic data model used >> > by CSV on the Web closely corresponds to RDF, the fact that literal values >> > extracted from CSV cells don’t have more dimensions is related to this >> > underlying lack of a data model for describing data with units. >> > >> > Searching for this indicated a couple of different ways to handle it: >> > >> > * Define an OWL datatype which describes the values with units (see >> > http://stackoverflow.com/questions/20248369/units-of-measurement-in-owl-and-rdf) >> > >> > unit:megaPascal rdf:type rdfs:datatype ; >> > rdfs:label "MPa" . >> > >> > unit:Pascal rdf:type rdfs:datatype ; >> > rdfs:label "Pa" . >> > >> > :AlMg3 prop:hasTensileStrength "300"^^unit:megaPascal . >> > :AlMg3 prop:hasYieldStrength "2"^^unit:Pascal . >> > >> > QUDT (http://www.openphacts.org/specs/units/) also describes similar >> > methods. >> > >> > CSVW already supports this by allowing an arbitrary datatype using the >> > @id field on a datatype (see >> > http://www.w3.org/TR/tabular-metadata/#datatypes). >> > >> > @id If included, @id is a link property that identifies the datatype >> > described by this datatype description. The value of this property becomes >> > the id annotation for the described datatype. It must not start with _: and >> > it must not be the URL of a built-in datatype. >> > >> > * Use a structured value to represent the data, for example: >> > >> > :AIMg3 prop:hasTensileStrength [rdf:value 300, ex:units unit:MegaPascal] >> > . >> > >> > This can be supported using the virtual columns feature, which allows >> > relationships to be created and allocate columns to different values. This >> > might also be useful when the units varied on each row. >> > >> > I think describing a use case for this, and using this as an informative >> > example in one of the documents, or a primer would be a good way to approach >> > this right now. As common practice emerges, this could be incorporated into >> > a future version of these specs, but this should be done in harmony with >> > describing a standard way of describing dimensional data in RDF and JSON. >> > >> > Gregg >> > >> > > Simon Cox >> > > CSIRO, co-convenor of RDA Data Types activity. >> > >> > >> >> -- >> Jeni Tennison >> http://www.jenitennison.com/ >> >> >> >> >
Received on Sunday, 14 February 2016 12:06:41 UTC