Re: Comments and questions on the last call XML Schema working draft from petsa@us.ibm.com on 2000-04-14 (www-xml-schema-comments@w3.org from April to June 2000)

From: <petsa@us.ibm.com>
Date: Fri, 14 Apr 2000 13:44:22 -0400
To: "Falk, Alexander" <falk@icon.at>
cc: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-ID: <852568C1.006162F1.00@D51MTA03.pok.ibm.com>
Thanks for your comments.  I've added my remarks in the body of your note
preceded by AM>>

All the best, Ashok


"Falk, Alexander" <falk@icon.at>@w3.org on 04/10/2000 08:03:36 AM

Sent by:  www-xml-schema-comments-request@w3.org


To:   "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
cc:
Subject:  Comments and questions on the last call XML Schema working draft





Hi,

I was studying the new April 7 version of the XML Schema working draft
throughout the weekend, as we are in the process of finalizing the beta 3
version of XML Spy 3.0 (see http://www.xmlspy.com/version30.asp), and I
have a first list of comments and questions - especially regarding the
changes to the datatypes (part 2).

Part 1 - Structures

A. Schema for Schemas
Why does the Public Identifier URN for the DOCTYPE statement still use
19991216 as its date, when the DTD for Schemas (Appendix B) is v1.1 dated
2000/04/06. This Public Identifier URN seems to imply that the Schema for
Schemas is itself written in compliance with the old December 1999 XML
Schema draft, which it is not.

Along the same lines: the year in the XML Schema namespace URI is also
still fixed with 1999 - is that going to change for the final
recommendation? While it is understandable from an implementors point of
view that the URN should remain constant over the time of the draft and
recommendation creation, it would IMHO be rather confusing for all future
schema authors, if the date given here is not identical to the date of the
final recommendation.

G. Tabulation of Changes
The comments in this list are not very useful at all. Compared with "H
Revisions from previous draft" in Part 2, which is ideal for implementors
and saves us the burden of re-reading the entire Specs again and again, the
list of changes in Part 1 is too minimal. Comments like "Lots of edits" or
"more from Noah" are simply not comprehensible without the background that
only insiders of the WG can have. Please provide a more meaningful change
history in the future (or none at all).

Part 2 - Datatypes

3.2.2.2 Constraining facets on boolean datatype
Other than specifically restricting the lexical space to either {0,1} or
{true, false} for a certain schema, what is the intention of allowing a
pattern facet for booleans?
AM>> You could specify a pattern that only allowed "0", for example.

3.2.8.1 Constraining facets on binary datatype
As binary currently only offers two different encodings that specify the
respective lexical spaces, defining a pattern facet on binary doesn't make
much sense - other than e.g. restricting the letters a-f to uppercase-only
or lower-case only. However, with base64 the alphabet is strictly defined
in the RFC. To answer the question contained in the Ed.Note of this
chapter, I would, therefore, suggest to omit the pattern facet here from an
implementors standpoint, as its benefits are rather limited and the
potential confusion would be worse.
AM>> The utlity of "pattern" is questionable for some datatypes.  Thanks
for your feedback.

3.2.3 - 3.2.5 Lexical notation of floating-point numbers
While it is very nice from an implementors standpoint to know that all
sorts of float, double, or decimal numbers will only use the period as a
decimal separator, I wonder if this is really satisfying for many European
and other non-US users. Specifically, when XML is being used to supplant
existing systems, it is often necessary to interpret floating-point or
decimal number with other decimal separators (most notably ',') and in some
cases also including thousands separators (e.g. 4,560,758.99 vs.
4.560.758,99). Why is there no means provided to support these formatting
styles in the XML schema draft. Just like the encoding facet for binaries,
this "formatting" or "picture" facet (to use an old COBOL-coined term that
was also suggested in the DCD submission to the W3C in July 1998) could be
used to specify the various aspects of the lexical space of these
datatypes. If we were to consider XML schemas for B2B e-Commerce scenarios
only, it would be understandable to only allow one format that can be
easily processed - but XML schemas should be thought of in much broader
terms.
AM>> This argument was made by several people but there was a strong
sentiment for a single
AM>> lexical representation.

3.3 A general question concering constraining facets in derived types:
Most of the derived datatypes have certain facets that distinguish them
from the primitive types. However, each one of the derived types still
lists the very facets that were used to generate it from the primitive
types in its list of applicable constraining facets. Consider the case of
recurringDay, which is derived from recurringDuration by fixing the
duration facet with "PT24H" and the period facet with "P1M". This type
still lists duration and period as possible constraining facets - yet they
are absolutley fixed by the very definition of recurringDay. How should a
validating processor treat a new type derived from recurringDay that
actually tries to use one of these facets in its definition? I see two
possible solutions to this dilemma:
AM>> The facets that have been given values during the refinement process
cannot
AM>> be changed.  They are incuded in the post-validation infoset becase
their actual
AM>> values may be useful in some cases.

a) you integrate some kind of "final" method to fix constraining facets
(e.g. the definition of recurringDay would use the period and duration
facets with this "final" mechanism to explicitely forbid any further
attempts at adding additional constraints through the same facets).

b) if this seems to be too complicated, it would also possible to make the
above mechanism mandatory for ANY kind of facet (e.g. once a derived type
was generated by using any one facet, that facet cannot be used anymore to
further derive from that derived type). This would, perhaps, result in some
of the derived built-in types that are currently defined, to be redefined
as primitive types, but would resolve all potential ambiguities arising
from multiple use of the same facet for any sort of grandchildren-derived
type.

3.3.29.1 Lexical representation of recurringDay
If this is a left truncated ISO-8601 day, then it should be ----DD, not
---DD
ISO 8601 says that the definition in the document is correct.

A. Schema for Datatype Definitions
The part.xsd schema document includes the namespace "
http://www.w3.org/XML/1998/namespace" from a schemLocation
"../structures/xml.xsd" yet I was unable to locate this file on the W3C
web-server. Can you please provide a URL that will allow me to access the
xml.xsd file? Furthermore, would it be possible for a future draft or the
final recommendation to include one downloadable archive file (ZIP, gzip,
or any other common formats) that includes all required files in one neat
package (i.e. the specs and their respective DTDs and XSL files plus the
non-normative Schema DTDs, XSDs, and any other required file).

E. Regular Expressions
For an implementors position I don't see why defining {,m} as a shorthand
form of {0,m} would be a problem. It would seem logical to add this, now
that {n,} is allowed. I don't think it is relevant whether or not Perl
includes such a quantifier. If it is more consistent and could potentially
help schema authors, then it should be added.

Along these same lines: I doubt that there is any meaningful use for {0,0}
apart from effectively "commenting out" the preceding atom. Furthermore,
{0,0} could then potentially be written as {,} which is even more
confusing. Apart from being a logical consequence of the {n,m} quantifier,
what was the reason for adding {0,0} to the table as a separate line?

Another problem: it is currently impossible to define a pattern that uses
the vertical bar '|' as a character, because this is defined as a separator
between branches, and there is no single character escape defined for \|.
The only workaround is to include the vertical bar inside of a positive
character group in a character class escape: [|]. Wouldn't it be better
(i.e. more consistent) to add \| as a single char escape?

AM>> These are good suggestions.  We decide to stick closely to Perl for
the sake of consistency.

Sincerely,

Alexander Falk

... Icon Information-Systems
... ALEXANDER FALK
... President, CEO
... http://www.icon-is.com/falk
Received on Friday, 14 April 2000 13:44:41 UTC