Canonical, Lexical, and Value Spaces from Matthew Fuchs on 2002-07-10 (www-xml-schema-comments@w3.org from July to September 2002)

From: Matthew Fuchs <matt@westbridgetech.com>
Date: Wed, 10 Jul 2002 17:09:30 -0600
To: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Message-Id: <5.1.0.14.1.20020710170826.02833c48@localhost>

[This mail bounced to me, because it came from an unknown
email address.  And then it got lost.  I've just found it
and forward it to the list with apologies to Matthew and
the WG for its four-week delay. -MSM]

As I mentioned during the call, the discussion on 110b leads me to think 
that we d be better off representing the relationship between the lexical 
space and the value space as an adjunction.  The manager s summary of what 
follows is that this would link lexical forms to the value space through 
the canonical space.  The value here is that we wouldn t have needed to 
claim that 2.0 and 2.00000 are the same in the value space as only 2.0 is a 
canonical form, only 2.0 is in the value space and (from this perspective) 
2.00000 is just another lexical form for 2.0 .  In other words, the current 
view seems to be that the following diagram commutes:



Lexicals                     Values

                    f

2.000 --------------à 2.000

      |                               |

      | a?                          | b

     \|/              g             \|/

2.0     <--------------à   2.0



The bottom arrow is claimed to be a bijection, and for that to work, b must 
be the identity arrow (i.e., 2.0 = 2.000).  But given that g is a 
bijection, we already have a unique surjection from the lexical 
representations to the canonical ones the arrow a? is g (b(f( 2.000 ))) 
(where g is the inverse of g [which is the restriction of f to the 
canonicals]).  But given that we already have a unique a?, why not dump f 
and b, giving the following diagram:



Lexicals                     Values



2.000

      |

      | a?

     \|/              g

2.0     <--------------à   2.0



In which case, it is obvious that no value space can support more 
information than is provided by the canonical forms.  The issue of whether 
2.000 = 2.0 is handled on the lexical side, and the issue of whether 
2.0=2.000 is rendered irrelevant.  While this may only apply to the numeric 
types, I think it may still be worth stating, as it better separates the 
value space from issues arriving from the lexical forms.  It s also useful 
in that the lexical forms could appear (as we now well know) to convey more 
information than the canonical form can, and this would unambiguously state 
otherwise.



I d also note that this makes the relationship between the lexical space 
and the value space an adjunction:



2.000 -------u------à 2.0 =UF( 2.000 )    2.0=F( 2.000 )

       |                            |                              |

       |                            |                              |

       >                         U(>)                          !>

       |                            |                              |

      \|/                          \|/                            \|/

01.654000 u----à 1.654                      1.654



While this may seem obscure at the moment, I think it becomes essential 
both for building operators on these types, and provides a basis for 
embedding all the numeric types into a single larger type, because the 
different datatypes have different adjunctions into the larger space.



Matthew

Received on Wednesday, 10 July 2002 19:12:10 UTC