General comments on XSCH Datatypes note

My review of "XML Datatypes in RDF and OWL" [1] 

Overall, this is a good document.  It discusses a number
of issues related to the use of datatypes in RDF and OWL that were left
unresolved by the Recommendations.  It is comprehensive in addressing
the issues discussed: covering alternative approaches and providing
appropriate references and/or quotes as necessary.  In fact, because
of this comprehensiveness and the importance of the references,
reviewing this document was more of a project than I had originally
envisioned (although reading these references proved enlightening)..  

I have no major issues with this document, although I do have some
lesser concerns and comments.  These fall into two categories: general
and detailed.  The detailed concerns were already presented in an
email sent to the list yesterday [details].  The general concerns
follow below.


* The document covers a number of loosely related subjects.  It is
  like a bag of datatype issues and other related material.  Different
  parts will be of interest to different audiences.  I mentioned this
  before, but my main concern now is that someone reading linearly
  through the document will encounter the interpretation descriptions
  in 1.2, 1.3, and 1.4 and stop reading.  I think such material would
  be better placed as an appendix.  It was also not clear to me the
  purpose and role of such material in this document.  By role, I mean
  are the interpretation descriptions in 1.2 and 1.3 quotes from the
  RDF and OWL semantics documents respectively or a different form for
  the same content?

* An important reference for datatypes in computing environments is
  the ISO standard on Language-independent datatypes - ISO/IEC
  11404:1996.  It provides an excellent framework for describing
  datatypes and appears to have been a strong influence on the XML
  Schema base types document [2] (which includes a reference to
  11404).  The XSCH note could benefit referencing the ISO work
  directly and using some of its terminology, although I don't think
  that this is necessary for this iteration of the note.

* My primary interest in these datatype issues is with the treatment
  of numeric types being consistent with their use in engineering
  applications (or at least usable by those applications).  Loss in 
  precision or unexpected changes in values due to automatic type
  conversion could be problematic in an engineering environment.

  Engineering view of some numeric types:

  To explain the engineering point of view on this, let me mention
  three important numeric types for that domain: count, measurement,
  and constant.

  A count is an integer representing essentially the
  cardinal number for a set of things classified by some set of tests.
  An example would be the count of packages of candy available for
  shipment.  A count is an exact number.  Tests may include
  measurements, but a count is not an approximation of a sum of 
  these measurements nor is it a sum of the approximation of these 
  measurements.

  A measurement is an inexact numeric value (usually represented as a
  real) produced by some measurement method.  This value denotes a
  value range which includes the actual value.  The actual value is
  unknowable, but more precise measurement methods can reduce the
  range of uncertainty up to a point.  The precision or uncertainty is
  usually included with the measurement value.  Either implicitly
  using significant figures or explicitly using a seperate property
  value such as error range.

  A constant is an exact value used in computation.  It may or may not
  be possible to express exactly as a numeric.  An inch is exactly
  2.54 centimeters, but Pi is not 3.14159.

  This suggests some potential needs and concerns for a type system
  underlaying this.  1. Because the value spaces for these types
  are different, measurements are disjoint from counts and constants.
  2.  Some means of capturing precision or error/uncertainty is needed
  for measurement values. 3. Some means is needed for denoting
  constants that cannot be expressed precisely in numeric form.

  Some answers about how 1 and 2 can/must be handled with XML Schema
  types are revealed in the XML Schema Datatypes document. In [2] the
  description for Decimal explicitly states that, "Precision is not
  reflected in this value space, the number 2.0 is not distinct from
  the number 2.00."  Thus precision cannot be encoded in decimal
  values or other types derived from or constructed with
  Decimal. Meaning: that objects must used to state precision or error
  properties for measurements (this is not a bad approach any since
  there are often other properties or metadata associated with a
  measurement as mentioned previously by Bernard [3]).  Measurements
  on the SW are thus not datatypes and the disjoint type issue becomes
  mute.
  
  For issue 3, there remains no answer.  As far as I know there is no
  way to denote a rational value without using a numeric literal, but 
  many important values cannot be expressed precisely as numeric 
  literals.

  Information on these issues may belong in this datatype note or not,
  I am not sure.  I do think that the SWBPD wg should present these
  issues in some one of its notes, though.

-Evan 

[1] http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/
[2] http://www.w3.org/TR/xmlxschema-2/
[3] http://lists.w3.org/Archives/Public/public-swbp-wg/2004Dec/0119.html
[details] http://lists.w3.org/Archives/Public/public-swbp-wg/2005Jan/0040

Received on Thursday, 13 January 2005 17:57:30 UTC