RE: Units of Measure (BP, Coverages, SSN,Time?)

Lets be clear about what QUDT and UCUM actually offer.

QUDT -

·         primarily provides a model for descriptions of units of measure, and of quantity-kinds (a.k.a. qualities, or “observable properties”); the model is formalized using OWL, and thus provides an RDF-based syntax for description of a uom or a quantity-kind

·         also provides some lists (called ‘vocabularies’) of individual unit- and quantity-kind- descriptions, but which is very idiosyncratic and incomplete (includes a whole bunch of currencies!)

·         there are no rules for how the labels or symbols for units are built in the QUDT vocabularies; they are not aligned with the ISO or SI standards (e.g. the label for the unit of length is spelled ‘Meter’, and the symbol for the unit of temperature is ‘degC’), capitalization is inconsistent, and use of non-asci character set is variable

·         the maintenance arrangements for QUDT are private (TopQuadrant +  NASA) and the publication arrangements are flaky (QUDT v2.0 has been ‘on the way’ for about 3 years, and even though it is linked the qudt.org website, it has been 404 for over a year).

UCUM –

·         Focuses on a rule for how to generate a symbol for a ‘derived uom’

·         uses a rigorous algorithm based on a theory of quantities and dimensional analysis, which starts from any base set of units in a rational system (SI, MKS, cgs, even pounds-feet-seconds if you want!)

·         UCUM provides a base set of symbols corresponding essentially with SI, plus symbols for the standard power of ten prefixes (micro/milli/kilo/mega etc). The base set has some fudging to get around the anomaly that the SI base unit for mass (kg) already has a power-of-ten prefix built in.

·         The algorithm and base set of symbols is such that symbols generated following UCUM are aligned with conventional usage, and with ISO 1000

·          There is some additional notation using {} and [] to allow for annotations and ‘conventional’ units, which I always get confused about.

My assessment is that the QUDT Ontology v1.1 is good enough, (I was on an Ontolog telecon with Pat Hayes, Ralph Hodgson, Gary Berg-Cross a couple of years ago where that was the clear consensus) but the QUDT vocabularies are not. So we need another set of URIs denoting uoms, with the expectation that dereferencing one of these would result in a QUDT-based representation.
Ideally we would have a reliable set of URIs for UOMs which could leverage the UCUM algorithm to build the URI, and which would resolve to a QUDT-based representation of the unit of measure. These representations should be built on-the-fly using the UCUM engine.

Note that, using QUDT, a uom description is an OWL _individual_ (not a class), but with complete semantics, still supporting some reasoning. Rob – going with individuals doesn’t mean you have to us SKOS and certainly doesn’t lose semantic precision -  probably best not to casually suggest that!

Simon


From: Rob Atkinson [mailto:rob@metalinkage.com.au]
Sent: Saturday, 2 July 2016 1:32 PM
To: Jon Blower <j.d.blower@reading.ac.uk>; Rob Atkinson <rob@metalinkage.com.au>; Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>; m.riechert@reading.ac.uk; public-sdw-wg@w3.org
Subject: Re: Units of Measure (BP, Coverages, SSN,Time?)

Hi Jon

The encoding scheme issue raises a duality between class and instance - any UoM could be expressed as as either an instance (with SKOS encoding as a natural default) or a Class - RDFS or OWL being the default options. In addition a meta-model of UoM could be defined in RDFS or OWL and used to drive encodings of instances.

Personally, I think that in the Web we should specify that a URI is used if one is available - and that an encoding of its details may be used as annotation. In the case of an "anonymous" UoM, then the encoding will still probably need to reference base units using URIs.

The wrinkles are whether URIs are explicit, or encoded as items in a namespace - and whether any encoding scheme (model) may be used or one is recommended, and if the model itself needs to be explicitly referenced (presumably this applies to JSON-LD, RDFA etc as RDF will always use URIs to specify the model elements anyways.

A worked example set with:
1) just URI from a well-known vocabulary (UCUM)
2) A encoded UoM with one URI, and a simple label
3) ditto, with a more complex set of details
4) ditto with more that one URI (e.g. UCUM and QUDT)
5) a blank/anonymous encoded UoM with base measures.

Would we go so far as to recommend QUDT as the meta-model (as per example provided?) - or simply list a few in use and provide a couple of examples?

This will cover the "follow-your-nose" cases - however there is the case of a data encoding where the UoM is specified in metadata. The question here then is defining a BP for this metadata.
One option - we can use RDF-QB to define data structures and relevant UoM. I'm not sure there is an obvious alternative to ad-hoc metadata models and UoM specified any non-interoperable way that emerges.

This option then speaks directly to the coverages metadata perspective (encoding of data using RD-QB becomes a trivial case - we simply state that if RDF encoding, then BP would be to use RDF-QB encoding consistent with the RDF-QB metadata for the set, and the interesting and more generally useful case is describing an existing or compact encoding usefully)

Rob

On Sat, 2 Jul 2016 at 02:20 Jon Blower <j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>> wrote:
Hi Rob – yes, I think those are the missing bits, but, just to reiterate, it may not be (just) a “vocabulary” that we need (in the sense of a set of URIs), but a serialisation scheme for any unit.

For concrete examples, we should look at where we need to use units. I think we have:


1.       As part of coordinate systems and coordinate reference systems

2.       As part of measured quantities (e.g. the range of a coverage), linked to observed properties etc

3.       …

My last paragraph wasn’t very clear, sorry. I was trying to say that the different uses (coordinate systems, observed properties) might actually have different best practices in terms of the encoding of their units. We could feasibly decide that coordinate system units are best expressed as URIs, but the units of observed properties are better expressed as strings in a named serialisation scheme (like UCUM). Maybe, I don’t know – just raising the possibility.

Cheers,
Jon


From: Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>
Date: Friday, 1 July 2016 14:39
To: Jon Blower <sgs02jdb@reading.ac.uk<mailto:sgs02jdb@reading.ac.uk>>, Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>, "Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>" <Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>>, Maik Riechert <m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>>, "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>

Subject: Re: Units of Measure (BP, Coverages, SSN,Time?)


This is the type of recommendation i think we need. Lets refine... the missing bits are:
1 guidance on what vocabulary.. even noting that different communities use different ones and naming them is a help.
2 provision of mappings if you want to interoperate across community choice here.. do you embed multiple uris, or provide sone sort of sameAs service?
3 concrete examples

I dont quite follow the final paragraph and the implications for what the encoding would look like?

Rob

On Fri, 1 Jul 2016 11:12 am Jon Blower <j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>> wrote:
Just to add a little to this – units of measure are very tricky in general. The overall requirement, I think, is to have an unambiguous serialisation scheme for units, including both base units (the easy cases) and the infinite number of derived units (the hard cases) – that is to say, a spec for serialising units to ASCII strings. This allows clients to convert between units, which is a primary use case for having “strongly typed” units.

In terms of serialisations, I’m aware of UCUM and UDUNITS (the latter is used extensively in climate/met/ocean and is connected with CF). I don’t think either are perfect in terms of governance, and I’m not even sure that UDUNITS has a formal spec.

Then there are URIs. QUDT has URIs for a lot of base and derived units, but it can’t possibly have them all, hence the need for a scheme that allows any unit to be serialised. So there will always be gaps, but I note that QUDT covers a lot of the common cases I can think of – so it’s not clear to me how important the gaps are.

Typical clients will just want to display the symbol for the unit, so we should make sure that, if we use URIs, we also transmit the symbol, as I doubt that a typical web client will want to resolve the URI and look up the symbol. This is effectively what Maik is doing, by transmitting the symbol plus a URI for the unit *scheme* rather than a URI for the unit itself.

(Question – does QUDT use UCUM as a means of generating the unit symbol?)

There are a few tricky cases in science – e.g. salinity, which strictly has no units and is a very weird kind of quantity – and sometimes these tricky cases lead to poor practice in real data files – i.e. expressing units incorrectly or inconsistently. (and of course, poor practice can happen in real-world data files anywhere).

I think an overall BP recommendation would be:


1.       Express units unambiguously if possible, using a named unit serialisation scheme or URI.

2.       Give the unit symbol, and perhaps a longer explanatory text string (e.g. a rdfs:label), to help simple clients understand the unit, even if they don’t want to resolve the full unit description.

3.       Also allow users to record “ad hoc” unit strings for fallback cases that don’t fit well with existing serialisation or URI schemes, making it clear that these are not really machine-understandable

There may be cases where we can refine this further depending on the use case. For example, in CRS definitions, which tend to use simple units, it’s probably desirable to use well-known URIs to represent units. For recording the units of a measured quantity (e.g. the range of the coverage), I like methods like the one Maik suggested, as this maps more neatly to common practice in my community.

Cheers,
Jon


From: Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>
Date: Friday, 1 July 2016 08:46
To: "Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>" <Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>>, "rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>" <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>, Maik Riechert <m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>>, "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>

Subject: Re: Units of Measure (BP, Coverages, SSN,Time?)
Resent-From: <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Resent-Date: Friday, 1 July 2016 08:47

Perfect Simon - thanks.
Its not that obvious trawling the docs what the pragmatic aspects are.

So I would suggest then that a BP endorsed by OGC would have a minimum requirement that a mapping to UCUM is provided for any vocabulary used for UoM, to provide for compatibility with existing recommendations (can we call these BP?)

If it helps I could set up a OGC resource for UCUM - with redirects for specific terms - instead of to the containing spec (thats the way UCUM works) - or to a SKOS resource with skos:exactMatch relationships to the UCUM terms.  I can also deploy a crosswalk to UCUM from another UoM vocab if we decide to recommend it.

The onoging governance of such a resource in the context of the BP can be taken up as a action from the SDW to the OGC (what is the appropriate point of contact here? NA, OAB, TC, PC?)

Rob

On Fri, 1 Jul 2016 at 16:10 <Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>> wrote:

>  If OGC has adopted UCUM as a BP (can someone make a definitive statement on this …

OGC’s endorsement of UCUM comes from

1.      It is recommended in WMS [1]

2.      Ditto GML [2]

3.      There is a branch of the www.opengis.net/def/<http://www.opengis.net/def/> URI set for UCUM - http://www.opengis.net/def/uom/UCUM/ but just redirects to the UCUM spec [3]

But that is purely pragmatic, as it seemed to be the best thing around at the time.
It has a fragile governance arrangement, and URIs are not de-referenceable.

[1] http://www.opengeospatial.org/standards/wms version 1.3 clause C.2.
[2] http://www.opengeospatial.org/standards/gml v3.2.1 clause 8.2.3.6<http://www.opengeospatial.org/standards/gml%20v3.2.1%20clause%208.2.3.6>
[3] http://unitsofmeasure.org/ucum.html


From: Rob Atkinson [mailto:rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>]
Sent: Friday, 1 July 2016 1:46 AM
To: Maik Riechert <m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>>; Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>; SDW WG Public List <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Re: Units of Measure (BP, Coverages, SSN,Time?)

Thanks Maik,

If i read this right, this example assumes the client understands qudt - then uses the semantics of qudt:symbol to map instances (Cel)  in another namespace to this.  UCUM uses http://purl.oclc.org/NET/muo/ucum/unit/temperature/degree-Celsius as the id - but the information to map to that is not present. Is "Cel" just a dummy example - would you actually want to say "degree-Celsius" - and in turn want the OGC redirect to respect that and redirect
http://www.opengis.net/def/uom/UCUM/degree-Celsius to   http://purl.oclc.org/NET/muo/ucum/unit/temperature/degree-Celsius?


What about the original assumption of using QUDT - why not use UCUM or another in the first instance. Coming from the outside and trying to identify a best practice, what exactly is this example saying?

If OGC has adopted UCUM as a BP (can someone make a definitive statement on this - it should be present in the BP when we talk about vocabulary re-use - a list of vocabularies in use in the OGC space) then we should start with that perhaps? If we are saying the BP requirement is to allow an emerging body of QUDT usage to interoperate then we need perhaps to recommend publishing the mappings as a resource - whatever we think is BP we need to communicate clearly to the average user who wont have years of exposure to the history and details to draw on - and will most likely simply want to maximise interoperability of a few cases.

Cheers
Rob

On Fri, 1 Jul 2016 at 01:00 Maik Riechert <m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>> wrote:

Hi Rob,

I just wanted to throw in a slightly different/complementary view on this.

While it is useful to have URIs for any kind of unit, I think it is even more useful to have a symbolic coding in a certain coding scheme for those units, because then clients with support for that scheme can easily parse the unit, and transform it and the associated numbers. One scheme example is UCUM (http://unitsofmeasure.org/ucum.html). OGC gave it a URI as well: http://www.opengis.net/def/uom/UCUM/


In my opinion you would have something like that (JSON-LD):

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"<http://www.w3.org/1999/02/22-rdf-syntax-ns>,
    "qudt": "http://qudt.org/schema/qudt#"<http://qudt.org/schema/qudt>,
    "skos": "http://www.w3.org/2004/02/skos/core#"<http://www.w3.org/2004/02/skos/core>
  },
  "rdf:value": 27.5, // for example purposes only
  "qudt:unit": {
    "@id": "qudt:DegreeCelsius",
    "skos:prefLabel": { "en": "Degree Celsius" },
    "qudt:symbol": {
      "@type": "http://www.opengis.net/def/uom/UCUM/"<http://www.opengis.net/def/uom/UCUM/>,
      "@value": "Cel"
    }
  }
}
So the main point is that the value of "qudt:symbol" has a custom data type, in this case http://www.opengis.net/def/uom/UCUM/.


Cheers

Maik

Am 30.06.2016 um 15:14 schrieb Rob Atkinson:
Hi,

I'm looking into the BP aspects around defining data dimensions as a framework for evaluating and contributing to various SDW threads. One which seems to cut across, but I havent seen an explicit treatment of the UoM problem. I know I may have missed previous conversatiosn - but I dont see any treatment in the current reviewable docs.

Specifically, if I was to follow the W3C Data on the Web Best Practices I would be led via BP #2

"To express frequency of update an instance from the Content-Oriented Guidelines developed as part of the W3C Data Cube Vocabulary efforts was used."

to this statement:
"To express the value of this attribute we would typically use a common thesaurus of units of measure. For the sake of this simple example we will use the DBpedia resource http://dbpedia.org/resource/Year which corresponds to the topic of the Wikipedia page on "Years".

If we have a Time ontology - surely we would be pointing to that as a recommendation for temporal units of measure.
Likewise, i would have thought that OGC would have an interest in binding CRS with their in built units of measure to spatial dimensions.
One could argue that without interoperability at this level there is a question why the OGC would have any involvement in Web standards - but if there is a counter-argument then I feel this needs to be front-and-centre of the BP to explain to a potential user what they can expect, and where they are going to be left with making all the significant decisions.

If we have Time and CRS UoM, then we may be able to get away with not specifiying a vocabulary for other UoM for measurements. Are there any obvious dimensions that need UoM vocabularies?

When I specify O&M profiles, (my driving use case), I'll need to specify the UoM for measurements - is there any recommendation regarding which vocabulary to choose?   And for CRS based dimensions?

Rob Atkinson

Received on Monday, 4 July 2016 00:14:42 UTC