W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > April to June 2004

Re: XML schema draft comments: "space-separated" ambiguous

From: Dave Peterson <davep@iit.edu>
Date: Wed, 9 Jun 2004 18:17:54 -0400
Message-Id: <a0521061dbced26f3a4ab@[]>
To: Daniel Barclay <daniel@fgm.com>, www-xml-schema-comments@w3.org

At 4:13 PM -0400 040609, Daniel Barclay wrote:
>Regarding the draft at
>Section says:
>   The ·lexical space· of a ·list· datatype is a set of literals whose
>   internal structure is a space-separated sequence of literals of
>   the ·atomic· datatype of the items in the ·list·.
>It doesn't seem to to specify whether "space-separated" means
>"separated by space characters" or "separated by space" (each
>contiguous group of space characters).

To quote from the section on the whiteSpace facet:

>All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage
>return) are replaced with #x20 (space)
>After the processing implied by replace , contiguous sequences
>of #x20's are collapsed to a single #x20, and leading and trailing
>#x20's are removed.

>For all datatypes ·derived ·by ·list ·the value of whiteSpace is collapse

The point of all this is that whitespace normalization occurs *before*
you get to the lexical space, so in the lexical representations of
lists there is never more than one space (#x20) character.  I think that
the 1.0 editors were pretty consistent in saying "space" meaning one
space (#x20) character, and "whitespace" when they meant a sequence
of any or all.  Mayhap the 1.1 revision will be more explicit.

At 4:22 PM -0400 040609, Daniel Barclay wrote ("XML schema draft
comments: is list canonical form underspecified?"):
>Section says:
>    The canonical-lexical-representation for the ·list· datatype is
>    defined as the lexical form in which each item in the ·list· has
>    the canonical lexical representation of its ·itemType·.
>Is that canonical form underspecified?  Specifically, doesn't
>it need to specify canonical form of space-separating the list
>of item lexical values?
>_If_ "a b" and "a  b" (two spaces) are both legal lexical values,
>which is the canonical lexical representation?

See above.

Hope this helps.
Dave Peterson

Received on Wednesday, 9 June 2004 21:09:51 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:50:02 UTC