RE: question about lexical and value spaces from Dave Peterson on 2008-01-18 (www-xml-schema-comments@w3.org from January to March 2008)

From: Dave Peterson <davep@iit.edu>
Date: Fri, 18 Jan 2008 10:10:33 -0500
To: noah_mendelsohn@us.ibm.com, Michael Kay <mike@saxonica.com>
Cc: "'Peter F. Patel-Schneider'" <pfps@research.bell-labs.com>, Schema IG <w3c-xml-schema-ig@w3.org>
Message-Id: <a06240803c3b66b46d3a1@[192.168.1.100]>
At 8:24 PM -0500 2008-01-17, noah_mendelsohn@us.ibm.com wrote (to the
comments list):

>Suggestion:  can we take this discussion to the schemas IG list where more
>WG members will see it?  As far as I know the comments list is tracked
>very carefully for picking up new issues and bug reports, but it is not
>necessarily subscribed by all members of the working group.

I agree.  Accordingly, to bring the IG list readers up to date, I'm
copying the entire msg of Noah's, and Michael Kay's response at the
end of this msg.  My comments follow, preceding those copies.

At 10:33 AM +0000 2008-01-18, Michael Kay wrote (to the comments list):
>  >
>>  MK> There is intense debate about whether "ineffable values"
>>  (values with no lexical representation) should be considered as being
>within the
>>  > value space or not.
>>
>NM> Really?  I thought we were always clear that if there was no
>  > lexical form, there was no value.

>Although from a practical viewpoint I find the idea of the value space
>holding values that you can't write deeply unattractive, I have to concede
>that the description of union types becomes easier if we say that the value
>space of a union is the union of the value spaces of the members,
>disregarding the fact that some of these values are unreachable because of
the "first match" rule.

I believe the union problem was dealt with by the decision that lexical
mappings need not be functions.  In those cases where the lexical mapping
is not a function (i.e., more than one value for a given literal), there
are to be rules determining which of the values to return in any given
circumstance.  In the casse of unions, the rule says "in the absence of
xsi:type, take the value from the first member datatype in whose lexical
space the literal occurs.  This works because all of the member spaces
are ultimately atomic.  The special datatypes (anyAtomicType and
anySimpleType) must have a special rule; I believe the current candidate
is to use the string value.  For lists, list-values that can't have a
space-separated lexical representation are lost from the value space.

The real question that raised the possibility of ineffable values was
the desire on the part of one or more members that anySimpleType not
have its value space increase each time a new primitive is added--the
proposed solution is to simply incude everything in that value space
right from the beginning.  Then a new primitive (or new way to deal with
the lost list-values) would make ineffable values effable, rather than
make them be added to the value space.

It's not clear whether the same no-new-values approach is to be used
on anyAtomicType.  Allowing atomic values that are not values of any
existing primitive datatype raises the question of what is an atomic
value.  One could at least imagine new primitives where the values were
lists, for example.  Should our current anyAtomicType include ineffable
list-values?

Here follows the complete text of the last two messages from the comments
list thread:
-------------
At 8:24 PM -0500 2008-01-17, noah_mendelsohn@us.ibm.com wrote (to the
comments list):
>Michael Kay writes:
>
>>  There is intense debate about whether "ineffable values" (values with no
>>  lexical representation) should be considered as being within the value
>space
>>  or not.
>
>Really?  I thought we were always clear that if there was no lexical form,
>there was no value.  For example, I thought it was pretty clear that if
>you used a pattern facet to restrict away all the lexical forms ending in
>the digit 4 in a type derived from xs:integer, then the numbers 4, 14 and
>so on were in fact not in the value space of the type.  Paul Biron and I
>tend to recall often the discussion we had many years ago in line waiting
>for dinner at a restaurant near the first New Orleans meeting at which we
>pointed out how impractically hard it would be to enforce such things in
>systems that in fact allow the values to be manipulated directly.  If you
>have an API that purports to establish some new value of a datatype, it
>can be very difficult to test whether there does or doesn't exist at least
>one lexical form for it in the face of complex patterns.  Still, the
>datatypes were focussed mainly on validation, and there is something very
>appealing about being able to say that every value has at least one
>serialization.  I was not aware that there was any serious consideration
>of changing this.
>
>Suggestion:  can we take this discussion to the schemas IG list where more
>WG members will see it?  As far as I know the comments list is tracked
>very carefully for picking up new issues and bug reports, but it is not
>necessarily subscribed by all members of the working group.
>
>Noah

At 10:33 AM +0000 2008-01-18, Michael Kay wrote (to the comments list):
>  >
>>  MK> There is intense debate about whether "ineffable values"
>>  (values with no lexical representation) should be considered as being
>within the
>>  > value space or not.
>>
>NM> Really?  I thought we were always clear that if there was no
>>  lexical form, there was no value.
>
>I think that
>
>http://www.w3.org/Bugs/Public/show_bug.cgi?id=3243 and
>
>http://www.w3.org/Bugs/Public/show_bug.cgi?id=5058
>
>demonstrate that there are others who hold different views.
>
>Although from a practical viewpoint I find the idea of the value space
>holding values that you can't write deeply unattractive, I have to concede
>that the description of union types becomes easier if we say that the value
>space of a union is the union of the value spaces of the members,
>disregarding the fact that some of these values are unreachable because of
>the "first match" rule.
>
>I guess my own position (or at least, my attempt to achieve a workable
>compromise) is best summarized in
>
>http://www.w3.org/Bugs/Public/show_bug.cgi?id=3243#c5
>
>I've been known to say that we can define it either way and it makes no
>difference. That's true as far as the XML Schema specification is concerned.
>It does start to make a difference once you recognize that other
>specifications are using the set of types that we define, and they may
>define ways of creating values other than by parsing strings in the lexical
>space (for example, by means of arithmetic operators).
>
>Perhaps the real solution is to design the lexical space so that it does
>have a distinct representation of every value. That could be achieved, for
>example, by allowing an escapable separator in writing lists, and by some
>kind of microsyntax for unions: "(xs:int)3" for example.
>
>Michael Kay
>http://www.saxonica.com/

-- 
Dave Peterson

davep@iit.edu
Received on Friday, 18 January 2008 15:10:53 UTC