W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > April to June 2004

Re: XML schema draft comments: "space-separated" ambiguous

From: Dave Peterson <davep@iit.edu>
Date: Wed, 9 Jun 2004 18:17:54 -0400
Message-Id: <a0521061dbced26f3a4ab@[192.168.0.3]>
To: Daniel Barclay <daniel@fgm.com>, www-xml-schema-comments@w3.org

At 4:13 PM -0400 040609, Daniel Barclay wrote:
>Regarding the draft at
>http://www.w3.org/TR/2004/PER-xmlschema-2-20040318/:
>
>Section 2.5.1.2 says:
>
>   The ·lexical space· of a ·list· datatype is a set of literals whose
>   internal structure is a space-separated sequence of literals of
>   the ·atomic· datatype of the items in the ·list·.
>
>It doesn't seem to to specify whether "space-separated" means
>"separated by space characters" or "separated by space" (each
>contiguous group of space characters).

To quote from the section on the whiteSpace facet:

>replace
>All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage
>return) are replaced with #x20 (space)
>collapse
>After the processing implied by replace , contiguous sequences
>of #x20's are collapsed to a single #x20, and leading and trailing
>#x20's are removed.

>For all datatypes ·derived ·by ·list ·the value of whiteSpace is collapse

The point of all this is that whitespace normalization occurs *before*
you get to the lexical space, so in the lexical representations of
lists there is never more than one space (#x20) character.  I think that
the 1.0 editors were pretty consistent in saying "space" meaning one
space (#x20) character, and "whitespace" when they meant a sequence
of any or all.  Mayhap the 1.1 revision will be more explicit.

At 4:22 PM -0400 040609, Daniel Barclay wrote ("XML schema draft
comments: is list canonical form underspecified?"):
>Section 2.5.1.2 says:
>
>    The canonical-lexical-representation for the ·list· datatype is
>    defined as the lexical form in which each item in the ·list· has
>    the canonical lexical representation of its ·itemType·.
>
>Is that canonical form underspecified?  Specifically, doesn't
>it need to specify canonical form of space-separating the list
>of item lexical values?
>
>_If_ "a b" and "a  b" (two spaces) are both legal lexical values,
>which is the canonical lexical representation?

See above.

Hope this helps.
-- 
Dave Peterson
SGMLWorks!

davep@iit.edu
Received on Wednesday, 9 June 2004 21:09:51 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:34 UTC