Re: [SUMMARY] QuantitativeValue / Units of Measure - Proposal

Hi Alex,

On Aug 22, 2013, at 12:04 AM, Alex Milowski wrote:

> 
> 
> 
> On Wed, Aug 21, 2013 at 2:34 PM, Martin Hepp <martin.hepp@ebusiness-unibw.org> wrote:
> Hi Alex,
> 
> > >
> > >   1. XML Schema's durations aren't possible.  A great deal of effort went into making a value space and lexical representation that is based on ISO time standards (ISO 8601).
> > ...
> > I still fail to see why ISO 8601 semantics and syntax (akin to XML Schema's datatypes) shouldn't be directly allowed.
> >
> Because it is much more difficult to execute queries that check whether a certain quantitative criterion is met when the data can be both a point value or an interval. For doing this with the current mechanism, you do not need more than to expand a point value to an interval with min=max, and this will work with any XSD datatype. Doing this with ISO 8601 requires a specific mechanism just for date and time datatypes.
> The current approach is a single solution for a wide range of datatypes.
> 
> Durations are a concept in themselves.  There is a huge different between "30 minutes" and "from 5:30-6:00" and so a min/max concept doesn't work.
> 

A duration can be expressed with the current mechanism:

<div itemscope itemtype="http://schema.org/QuantitativeValue">
      <span itemprop="value">30</span> minutes
      <meta itemprop="unitCode" content="MIN" >
</div>

Where is the problem?

> 
> >
> > >
> > >   2. The codes are odd for common use (e.g. Hour = HUR, instead of 'H' or 'h', ANN = year, instead of 'yr' or 'Y', GRM = gram, instead of 'g')
> > >
> > They are a bit cryptic, but non the other hand unique and reliable. For instance, there are many variants of miles, and UN/CEFACT has individual codes for all of them.
> > ...
> > Note that you can use the properties of depth, heigh, width, and weight for any object of choice in the current form.
> > ...
> > Not if they expect unit codes that make no sense in other contexts.
> 
> I fail to see why, since UN/CEFACT has reliable codes for all SI units.
> 
> Practically speaking, I doubt anyone would use UN/CEFACT codes in place of standard SI unit symbols. 


>  
> 
> > As of now, you could use a prefix with unitCode for SI codes.
> >
> > Prefix?  I'm not sure I understand what you mean: prefixed as in CURIE or prefixed with some string (e.g. "si.") ?
> >
> I meant a string prefix for a textual value, like uncefact:HUR or si:kg
> 
> That looks a awful lot like a CURIE.  Why not make it one? 

Because 
1. Someone would have to set-up, operate, and update a service for the respective URIs.
2. That someone may need to get IPR clearance e.g. for using the official textual definitions for units (I tried for several years to get get the legal permission to put up an OWL version of the UNSPSC...).
3. CURIEs are not supported in Microdata. (Of course you could use the respective full URI there.)

> 
> > >
> > > On the other hand, if I want to describe the weight of an object using SI units, I need a way to say that with a different property.  That is, I should be able to use the standard prefixing scheme of SI units to say "kg", "g" "mg" and so on without having to resort to looking up a strange code.
> >
> > ...
> > UN/CEFACT provides reliable codes for most if not all SI units plus many more, commonly needed ones for commerce.
> >
> > Also, many back-end systems readily support UN/CEFACT codes (e.g. PDM systems).
> >
> > I am well familiar with UN/CEFACT but I think their codes are a non-starter for any kind of scientific or sensor data.  Why would you adopt a unit symbol that isn't the actual accepted symbol used within the greater scientific community?
> 
> We are talking about marking up data on the Web in here, thus the codes used are intended mostly for machines, not humans. I think that having short, unique three-letter codes from a small alphabet is a much more reliable mechanism than using the original SI symbols, that require proper Unicode encoding etc.
> 
> Humans have to put those codes on the Web, either manually or through developing software.  As such, it does matter whether the mechanism makes sense to the consumer.  
> 
> A three letter code doesn't seem very extensible nor does it yield a syntax that will make sense to people over time.
> 
> Besides, Unicode is already supported by the browser and just about every other modern tool on the planet, so we shouldn't shy away from using it properly.  Would we exclude a particular character or glyph from any natural language?  That is, tell someone they can't use their native language?  I doubt that.  As such, using the ohm symbol (Ω) when that is explicitly what we mean seems very natural.

It is not a question of shying away. It is just that many, many sites don't get their character encoding right, so proposing a mechanism in schema.org that depends on codes outside of 8 bit ASCII is calling for trouble.

> >
> > > [2] http://www.qudt.org/
> >
> > As said, I am in general fine with
> >
> > - adding a unit property for pure textual unit references
> > - adding a prefix mechanism that allows additional unit codes to be used with unitCode
> >
> > Certainly good to know.
> >
> > In my opinion, I don't like the idea of a prefix.  The "unitCode" needs to be a URI so it can play well on the Web.
> >
> 
> From a "linked data" perspective, URIs may make sense. From a markup perspective, they are long and ugly and as long as you do not have an authoritative URI that is dereferenceable, then a URI is as good or bad as any unique string. In short, URIs for enumerations are overrated in the context of schema.org.
> 
> 
> I'm not buying that argument as I see no distinction between the goals of the Linked Data crowd and those of anyone here attempting to use schema.org.  I'm just not thinking of Linked Data.  It is just a basic principle of the Web that we name things with URIs [1].
> 

> The Web does not critically depend on this axiom. It's broken a thousand times every day. 
> 
> >
> > >However, if you expand the range of options in the markup, you make the consumption for clients more difficult, so less applications will be able top fully >understand the data.
> >
> > Often true but the current properties feel too skewed towards UN/CEFACT-based commerce.  It is possible that these two domains may not mix well.
> >
> 
> >I think you are misunderstanding me. While schema:QuantitativeValue is a very generic element, the height/depth/width properties are (currently) meant and >defined for http://schema.org/Product only.
> 
> I understand that but if they are only ever going to be used for http://schema.org/Product, then they should be be prefixed as productHeight/productDepth/etc. so that height/depth/width could be used in a boarder context.

schema.org tries to find intuitive, short, yet unique names for properties. This sometimes conflicts with ontologically clean, generic properties. I have said in other contexts that there may be point at which it will no longer be possible to keep up the idea of global property IDs and use a strict frame-based approach, but that is a generic issue with the current model behind schema.org.

>  
> 
> I fail to see why http://schema.org/QuantitativeValue should be limited or biased towards commerce.
> 
> 
> In its current state, aligned with UN/CEFACT, it is biased to the way UN/CEFACT sees the world.  That is not how most handle scientific data.   Could they be forced?  Possible.  I doubt anyone would make the effort and instead they'd just role their own types.  The result is we'd continue to lose on interoperability in a broader context.

The only real bias is that the unitCode property currently suggests to use a UN/CEFACT strings for marking up units of measurement.
I made that choice for GoodRelations back in 2007 after carefully comparing the available standard codes for units of measurement in terms of authoritativeness and completeness/coverage. And there are many scenarios where you need unit codes for the non-SI units.

If you (and a broad audience of publishers of respective content) think that any other unit code standard should be supported, it can be easily handled by updating the definition of the unitCode property and allowing prefixes for other codes. This is quite clearly better than defining multiple properties for multiple unitCode standards. But even then I think that recommending the use of official unit symbols that require proper character encoding, like Ω for Ohm, is bad.

Martin

> 
> 
> [1] http://www.w3.org/TR/webarch/
> 
> 
> -- 
> --Alex Milowski
> "The excellence of grammar as a guide is proportional to the paucity of the
> inflexions, i.e. to the degree of analysis effected by the language
> considered."
> 
> Bertrand Russell in a footnote of Principles of Mathematics

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Received on Wednesday, 21 August 2013 22:33:45 UTC