Re: General comments on XSCH Datatypes note

Same as previous message, but on general comments.

ewallace@cme.nist.gov wrote:
> My review of "XML Datatypes in RDF and OWL" [1] 
> 
> Overall, this is a good document.  It discusses a number
> of issues related to the use of datatypes in RDF and OWL that were left
> unresolved by the Recommendations.  It is comprehensive in addressing
> the issues discussed: covering alternative approaches and providing
> appropriate references and/or quotes as necessary.  In fact, because
> of this comprehensiveness and the importance of the references,
> reviewing this document was more of a project than I had originally
> envisioned (although reading these references proved enlightening)..  
> 
> I have no major issues with this document, although I do have some
> lesser concerns and comments.  These fall into two categories: general
> and detailed.  The detailed concerns were already presented in an
> email sent to the list yesterday [details].  The general concerns
> follow below.
> 
> 
> * The document covers a number of loosely related subjects.  It is
>   like a bag of datatype issues and other related material.  Different
>   parts will be of interest to different audiences.  I mentioned this
>   before, but my main concern now is that someone reading linearly
>   through the document will encounter the interpretation descriptions
>   in 1.2, 1.3, and 1.4 and stop reading.  I think such material would
>   be better placed as an appendix.  It was also not clear to me the
>   purpose and role of such material in this document.  By role, I mean
>   are the interpretation descriptions in 1.2 and 1.3 quotes from the
>   RDF and OWL semantics documents respectively or a different form for
>   the same content?

We restructured document. Moved 1.2, 1.3 and 1.4 to an appendix
Merged 1.1 as 1.3 with 0. to make new introductory 1.
Added new subsection 1.1 Reading this Document as follows:
[[
While this document can be read from start to finish, many readers will 
benefit from skipping sections.

The intended reader is informed about RDF and/or OWL, and may be a 
creator or user of metadata or ontologies, or may be an implementor of 
systems that implement the RDF or OWL Recommendations, or may be the 
author or editor of related specifications.

The reader who is interested in defining their own datatypes should read 
section 2 and maybe appendix B, which gives a formal treatment in terms 
of OWL DL and user defined datatypes..

The reader who is interested in the correct use of datatypes should read 
section 3, concerning which values are the same, and section 5 
concerning numerics, particularly, but not exclusively, for engineering 
applications.

Implementors probably should read most of the document: appendix A 
summarizes the formal treatment of datatyping from the recommendations; 
section 3 gives an extended discussion about equality; section 2 
discusses the mapping from URIs to user defined types.

Readers most interested in formal semantics will find most value in 
appendix B, concerning user defined datatypes, and section 3 concerning 
equality. Such readers should start by reviewing appendix A, which 
should be familiar.

Section 4 on durations, is of more limited interest, but is significant 
to any reader who wishes to use, implement or build on top of duration 
datatypes.
]]



> 
> * An important reference for datatypes in computing environments is
>   the ISO standard on Language-independent datatypes - ISO/IEC
>   11404:1996.  It provides an excellent framework for describing
>   datatypes and appears to have been a strong influence on the XML
>   Schema base types document [2] (which includes a reference to
>   11404).  The XSCH note could benefit referencing the ISO work
>   directly and using some of its terminology, although I don't think
>   that this is necessary for this iteration of the note.
> 
Added to references. Changed first para of section 1.3 (old section 1.1) 
to read:
[[
[XML SCHEMA2] defines facilities for defining simple types to be used in 
XML Schema as well as other XML specifications. %%It is influenced by 
earlier work on datatypes such as [ISO 11404].
]]
http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw-20050127-changes/#sec-xmls-dt


> * My primary interest in these datatype issues is with the treatment
>   of numeric types being consistent with their use in engineering
>   applications (or at least usable by those applications).  Loss in 
>   precision or unexpected changes in values due to automatic type
>   conversion could be problematic in an engineering environment.
> 
>   Engineering view of some numeric types:
> 
>   To explain the engineering point of view on this, let me mention
>   three important numeric types for that domain: count, measurement,
>   and constant.
> 
>   A count is an integer representing essentially the
>   cardinal number for a set of things classified by some set of tests.
>   An example would be the count of packages of candy available for
>   shipment.  A count is an exact number.  Tests may include
>   measurements, but a count is not an approximation of a sum of 
>   these measurements nor is it a sum of the approximation of these 
>   measurements.
> 
>   A measurement is an inexact numeric value (usually represented as a
>   real) produced by some measurement method.  This value denotes a
>   value range which includes the actual value.  The actual value is
>   unknowable, but more precise measurement methods can reduce the
>   range of uncertainty up to a point.  The precision or uncertainty is
>   usually included with the measurement value.  Either implicitly
>   using significant figures or explicitly using a seperate property
>   value such as error range.
> 
>   A constant is an exact value used in computation.  It may or may not
>   be possible to express exactly as a numeric.  An inch is exactly
>   2.54 centimeters, but Pi is not 3.14159.
> 
>   This suggests some potential needs and concerns for a type system
>   underlaying this.  1. Because the value spaces for these types
>   are different, measurements are disjoint from counts and constants.
>   2.  Some means of capturing precision or error/uncertainty is needed
>   for measurement values. 3. Some means is needed for denoting
>   constants that cannot be expressed precisely in numeric form.
> 
>   Some answers about how 1 and 2 can/must be handled with XML Schema
>   types are revealed in the XML Schema Datatypes document. In [2] the
>   description for Decimal explicitly states that, "Precision is not
>   reflected in this value space, the number 2.0 is not distinct from
>   the number 2.00."  Thus precision cannot be encoded in decimal
>   values or other types derived from or constructed with
>   Decimal. Meaning: that objects must used to state precision or error
>   properties for measurements (this is not a bad approach any since
>   there are often other properties or metadata associated with a
>   measurement as mentioned previously by Bernard [3]).  Measurements
>   on the SW are thus not datatypes and the disjoint type issue becomes
>   mute.
>   
>   For issue 3, there remains no answer.  As far as I know there is no
>   way to denote a rational value without using a numeric literal, but 
>   many important values cannot be expressed precisely as numeric 
>   literals.
> 
>   Information on these issues may belong in this datatype note or not,
>   I am not sure.  I do think that the SWBPD wg should present these
>   issues in some one of its notes, though.
>

Added new section 5
http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw-20050127-changes/#sec-numerics
which reads:

[[
5. The Use of Numeric Types

%% whole section is new

For much data on the Semantic Web a motivation for providing type 
information is to permit the use of the data by engineering 
applications, and interoperation between engineering applications. Most 
such data will be marked up using the numeric types from XML Schema.

Loss in precision or unexpected changes in values due to automatic type 
conversion could be problematic in an engineering environment.

In the engineering domain there are three important types of usage for 
numerics: count, measurement, and constant.

count
     A count is an integer representing essentially the cardinal number 
for a set of things classified by some set of tests. An example would be 
the count of packages of candy available for shipment. A count is an 
exact number. Tests may include measurements, but a count is not an 
approximation of a sum of these measurements nor is it a sum of the 
approximation of these measurements. A type such as xsd:integer or a 
type derived from xsd:integer is appropriate for counts.
measurement
     A measurement is an inexact numeric value (usually represented as a 
real) produced by some measurement method. This value denotes a value 
range which includes the actual value. The actual value is unknowable, 
but more precise measurement methods can reduce the range of uncertainty 
up to a point. The precision or uncertainty is usually included with the 
measurement value. Either implicitly using significant figures or 
explicitly using a seperate property value such as error range. Either 
the xsd:float or xsd:double datatypes are appropriate for measurement, 
but it should be noted that these do not include a precision or 
uncertainity, which should be included as the value of a separate 
property. [XML SCHEMA2] explicitly states for xsd:decimal that, 
"Precision is not reflected in this value space, the number 2.0 is not 
distinct from the number 2.00."
constant
     A constant is an exact value used in computation. It may or may not 
be possible to express exactly as a numeric. A millimeter is exactly 
0.001 meters, but Pi is not 3.14159. Often an xsd:decimal will be more 
appropriate than an xsd:float or xsd:double for expressing a constant.

This suggests some potential needs and concerns for a type system 
underlaying this.

     * Because the value spaces for these types are different, 
measurements are disjoint from counts and constants.
     * Some means of capturing precision or error/uncertainty is needed 
for measurement values.
     * Some means is desirable for writing down constants that cannot be 
expressed precisely in numeric form.

The first of these issues will generally be reflected in the use of 
xsd:integer for counts, xsd:float and xsd:double for measurements, and 
xsd:decimal for constants.

The second issue concerning precision of measurements, must be addressed 
at the modelling level by using objects to state precision or error 
properties for measurements. This is not a bad approach any since there 
are often other properties or metadata associated with a measurement.

For the third issue, concerning some constants, no solution is offered.
]]

Added the following bullet point to intro
[[
%% Appropriate use of numeric types for engineering applications.
]]

Haven't changed abstract, hmm, probably should have done.


Jeremy

> -Evan 
> 
> [1] http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/
> [2] http://www.w3.org/TR/xmlxschema-2/
> [3] http://lists.w3.org/Archives/Public/public-swbp-wg/2004Dec/0119.html
> [details] http://lists.w3.org/Archives/Public/public-swbp-wg/2005Jan/0040
> 

Received on Wednesday, 2 February 2005 17:37:28 UTC