Comments and questions on the last call XML Schema working draft from Falk, Alexander on 2000-04-10 (www-xml-schema-comments@w3.org from April to June 2000)

From: Falk, Alexander <falk@icon.at>
Date: Mon, 10 Apr 2000 14:03:36 +0200
To: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-ID: <0FED160BABE4D311AD2E0050DA4657850226BE@MEDUSA>
Hi,

I was studying the new April 7 version of the XML Schema working draft
throughout the weekend, as we are in the process of finalizing the beta 3
version of XML Spy 3.0 (see http://www.xmlspy.com/version30.asp), and I have
a first list of comments and questions - especially regarding the changes to
the datatypes (part 2).

Part 1 - Structures

A. Schema for Schemas
Why does the Public Identifier URN for the DOCTYPE statement still use
19991216 as its date, when the DTD for Schemas (Appendix B) is v1.1 dated
2000/04/06. This Public Identifier URN seems to imply that the Schema for
Schemas is itself written in compliance with the old December 1999 XML
Schema draft, which it is not.
Along the same lines: the year in the XML Schema namespace URI is also still
fixed with 1999 - is that going to change for the final recommendation?
While it is understandable from an implementors point of view that the URN
should remain constant over the time of the draft and recommendation
creation, it would IMHO be rather confusing for all future schema authors,
if the date given here is not identical to the date of the final
recommendation.

G. Tabulation of Changes
The comments in this list are not very useful at all. Compared with "H
Revisions from previous draft" in Part 2, which is ideal for implementors
and saves us the burden of re-reading the entire Specs again and again, the
list of changes in Part 1 is too minimal. Comments like "Lots of edits" or
"more from Noah" are simply not comprehensible without the background that
only insiders of the WG can have. Please provide a more meaningful change
history in the future (or none at all).

Part 2 - Datatypes

3.2.2.2 Constraining facets on boolean datatype
Other than specifically restricting the lexical space to either {0,1} or
{true, false} for a certain schema, what is the intention of allowing a
pattern facet for booleans?

3.2.8.1 Constraining facets on binary datatype
As binary currently only offers two different encodings that specify the
respective lexical spaces, defining a pattern facet on binary doesn't make
much sense - other than e.g. restricting the letters a-f to uppercase-only
or lower-case only. However, with base64 the alphabet is strictly defined in
the RFC. To answer the question contained in the Ed.Note of this chapter, I
would, therefore, suggest to omit the pattern facet here from an
implementors standpoint, as its benefits are rather limited and the
potential confusion would be worse.

3.2.3 - 3.2.5 Lexical notation of floating-point numbers
While it is very nice from an implementors standpoint to know that all sorts
of float, double, or decimal numbers will only use the period as a decimal
separator, I wonder if this is really satisfying for many European and other
non-US users. Specifically, when XML is being used to supplant existing
systems, it is often necessary to interpret floating-point or decimal number
with other decimal separators (most notably ',') and in some cases also
including thousands separators (e.g. 4,560,758.99 vs. 4.560.758,99). Why is
there no means provided to support these formatting styles in the XML schema
draft. Just like the encoding facet for binaries, this "formatting" or
"picture" facet (to use an old COBOL-coined term that was also suggested in
the DCD submission to the W3C in July 1998) could be used to specify the
various aspects of the lexical space of these datatypes. If we were to
consider XML schemas for B2B e-Commerce scenarios only, it would be
understandable to only allow one format that can be easily processed - but
XML schemas should be thought of in much broader terms.

3.3 A general question concering constraining facets in derived types:
Most of the derived datatypes have certain facets that distinguish them from
the primitive types. However, each one of the derived types still lists the
very facets that were used to generate it from the primitive types in its
list of applicable constraining facets. Consider the case of recurringDay,
which is derived from recurringDuration by fixing the duration facet with
"PT24H" and the period facet with "P1M". This type still lists duration and
period as possible constraining facets - yet they are absolutley fixed by
the very definition of recurringDay. How should a validating processor treat
a new type derived from recurringDay that actually tries to use one of these
facets in its definition? I see two possible solutions to this dilemma:
a) you integrate some kind of "final" method to fix constraining facets
(e.g. the definition of recurringDay would use the period and duration
facets with this "final" mechanism to explicitely forbid any further
attempts at adding additional constraints through the same facets).
b) if this seems to be too complicated, it would also possible to make the
above mechanism mandatory for ANY kind of facet (e.g. once a derived type
was generated by using any one facet, that facet cannot be used anymore to
further derive from that derived type). This would, perhaps, result in some
of the derived built-in types that are currently defined, to be redefined as
primitive types, but would resolve all potential ambiguities arising from
multiple use of the same facet for any sort of grandchildren-derived type.

3.3.29.1 Lexical representation of recurringDay
If this is a left truncated ISO-8601 day, then it should be ----DD, not
---DD

A. Schema for Datatype Definitions
The part.xsd schema document includes the namespace
"http://www.w3.org/XML/1998/namespace" from a schemLocation
"../structures/xml.xsd" yet I was unable to locate this file on the W3C
web-server. Can you please provide a URL that will allow me to access the
xml.xsd file? Furthermore, would it be possible for a future draft or the
final recommendation to include one downloadable archive file (ZIP, gzip, or
any other common formats) that includes all required files in one neat
package (i.e. the specs and their respective DTDs and XSL files plus the
non-normative Schema DTDs, XSDs, and any other required file).

E. Regular Expressions
For an implementors position I don't see why defining {,m} as a shorthand
form of {0,m} would be a problem. It would seem logical to add this, now
that {n,} is allowed. I don't think it is relevant whether or not Perl
includes such a quantifier. If it is more consistent and could potentially
help schema authors, then it should be added.
Along these same lines: I doubt that there is any meaningful use for {0,0}
apart from effectively "commenting out" the preceding atom. Furthermore,
{0,0} could then potentially be written as {,} which is even more confusing.
Apart from being a logical consequence of the {n,m} quantifier, what was the
reason for adding {0,0} to the table as a separate line?
Another problem: it is currently impossible to define a pattern that uses
the vertical bar '|' as a character, because this is defined as a separator
between branches, and there is no single character escape defined for \|.
The only workaround is to include the vertical bar inside of a positive
character group in a character class escape: [|]. Wouldn't it be better
(i.e. more consistent) to add \| as a single char escape?

Sincerely,

Alexander Falk

... Icon Information-Systems
... ALEXANDER FALK
... President, CEO
... http://www.icon-is.com/falk
Received on Monday, 10 April 2000 08:03:39 UTC