Re: Lexical representation of xsd:decimal and xsd:integer

Sounds like some valid concerns. I would suggest you open a bug [1] 
against schema spec version 1.0 part 2. Alternatively you can send an 
email to www-xml-schema-comments@w3.org, where all comments on the schema 
spec should go. But it'll probably take longer (than Bugzilla) to get the 
WG's attention.

[1] http://www.w3.org/Bugs/Public/enter_bug.cgi?product=XML%20Schema

Thanks,
Sandy Gao
XML Parser Development, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com




"Alessandro Triglia" <sandro@mclink.it> 
Sent by: xmlschema-dev-request@w3.org
10/28/2005 01:00 PM

To
"[Public XML Schema-DEV]" <xmlschema-dev@w3.org>
cc

Subject
Lexical representation of xsd:decimal and xsd:integer






Hi

There is a problematic area in Part 2.  Some of the related questions are:

- Is ".5" a valid lexical representation for xsd:decimal?

- Is "" a valid lexical representation for xsd:decimal?

- Is "" a valid lexical representation for xsd:integer?

- Is "0" or "" the canonical lexical representation of the integer value 
0?


Part 2 says:

--------------------------------
decimal has a lexical representation consisting of a finite-length 
sequence
of decimal digits (#x30-#x39) separated by a period as a decimal 
indicator.
An optional leading sign is allowed. If the sign is omitted, "+" is 
assumed.
Leading and trailing zeroes are optional. If the fractional part is zero,
the period and following zero(es) can be omitted. For example: -1.23,
12678967.543233, +100000.00, 210.
--------------------------------

There are three problematic terms in this paragraph:  "finite-length
sequence", "separated", and "leading zeros".  I am trying to understand 
what
these terms mean by looking in other parts of the document, because they 
are
all ambiguous.

"Finite-length sequence" is used in many other places for the length of:
lists, strings, binary octets of hexBinary, binary octets of baseBinary,
etc.  Obviously, lists, strings, and binary octets must be allowed to have 
a
zero length.  Therefore, at least in these cases (and possibly in all
cases), "finite-length" includes zero-length.   This is supported by the 
use
of the phrase "finite, non-zero-length" for NMTOKENS, IDREFS, and 
ENTITIES,
and by the addition of "(possibly empty)" after "finite-length" in the
definition of list.

So if "finite-length sequence of digits" in xsd:decimal includes zero
digits, then all of the following are valid lexical representations for 
this
type:  "", "+", ".", "3.", ".3".

This has several other implications:

-- "" is a valid lexical representation of the integer value 0

-- "" and "E" are valid lexical representations of the float value 0

Is all of the above intended?  Is this the common understanding?  If not,
there is a defect in the specification of the lexical representation of
xsd:decimal and xsd:integer (it should not say "finite-length").

Actually, I doubt that the above was intended.  One of the reasons is that
none of the examples in Part 2 shows a decimal number with an empty 
integer
part, or an "empty" integer, but a stronger reason comes from the 
definition
of the canonical lex rep of xsd:integer.  While the definition of the
canonical lex rep of xsd:decimal requires the presence of at least one 
digit
in the integer part, the definition of the canonical lex rep of 
xsd:integer
does not include the same requirement -- it says that leading zeros are
prohibited, full stop.  Is the single "0" in the representation of the
integer value 0 a "leading zero" or not?  Depending on the answer, the
canonical lex rep of the integer value 0 will be either "" or "0".  Which 
is
it?

If the latter is intended, then "fixed-length sequence" for xsd:integer
apparently does not include a zero length, and "leading zeros" apparently
does not include the single zero digit representing the integer value 
zero.
But if the former is intended (and so the canonical lex rep of the integer 
0
is ""), then I wonder why "0.x" (rather than ".x") was chosen as the
canonical lex rep of decimals with a null integer part.

Should "finite-length sequence of digits" have been in both cases "finite,
but non-zero-length, sequence of digits"?

Also, should "separated be a period" have been "with a period optionally
inserted at any point in the sequence of digits"?

Or perhaps "with a period optionally inserted at any point in the sequence
of digits except before the first digit and after the last digit"?  (or
equivalently "between any two digits"?)

Or perhaps "with a period optionally inserted at any point in the sequence
of digits except after the last digit"?

I think the first big question is, What was intended?  And the second is, 
Is
there anything to be mended in the text, and how?

Alessandro Triglia
OSS Nokalva

Received on Tuesday, 1 November 2005 14:32:20 UTC