Comments on schema datatypes WD of 6 May

1) datatypes-cowan-maxlength: says string length is defined in bytes,
but it should be characters.

2) datatypes-cowan-boolean-parochial: the values "yes", "no", "true", and
"false" are anglocentric and thus unacceptable.  Even "1" and "0" will
look bad in all-Arabic XML, where "٠" and "١" would fit in
much better.  Therefore, the lexical-representation facet needs to
say definitely what is logical true and what is logical false.
In some contexts ".TRUE." and ".FALSE." may be the right thing (Fortran),
or "T" and "F", or "si" and "no", or "да" and
"не&#x0442" (Russian).

3) datatypes-cowan-y2k: It is simply beyond belief that the 2-digit
dates of ISO 8601 would be perpetuated in a standard being written
in 1999!  These should be flushed out; there should be no standardized
way to represent dates that cannot be properly interpreted ("the
current century"?  Which current century?)

4) uri-scheme-facet: If URI is made a subtype of string, then scheme
could be encoded using a regular expression.

5) picture-or-regex:  Pictures are slightly shorter and are familiar
to one community, but regexes are familiar to several overlapping
communities (Unix and Perl), and are a superset of regexes.

6) perl-regex: Adopting Perl's syntax wholesale sounds like a good
idea, but there are several i18n problems with ranges (what do
they mean, Unicode value or collating sequence?) and the \0nnn and
\xnn escapes, which "know" that the number of octal or hex digits
is only 8 bits worth.

In addition, "[a-z]" sometimes means literally a character in the
range &#x61 to &#x7a, sometimes it means "any Latin lowercase
letter", and sometimes it means "any lowercase letter".  In ASCII
these are the same, in Unicode they are not.  It would be better
to eliminate ranges and have ways to say "any lowercase letter",
"any digit (Euro or not)", "any XML name char", "any XML name-start
char", etc. etc.

7) nmtoken-primitive-or-generated:  If the above is done, then it
makes sense to define NMTOKEN etc. as subtypes of "string" constrained
by regexes.

8) three-valued-logic: I don't feel strongly about this, but I think
unknown = NULL is a fair assumption.  If you really need more than
three values, use an enum.

9) datatypes-cowan-enum: There seems to be no equivalent of the XML
enumerated attribute type.  This could be provided in general by
allowing a fundamental facet "values" (or the like) specifying
exact values.  Then enums would be a subtype of NMTOKEN, but
there could be other enums such as {14,18,23,28,34,Times Square}
(stations of the IRT subway in Manhattan).

10) dateTime-lexical-representation:  The reason that dates like
4/3/1943 aren't supported in ISO 8601 is that they mean April 3rd
in some places and March 4th in other places.  They should be
discouraged, not supported.


-- 
John Cowan					cowan@ccil.org
		e'osai ko sarji la lojban.

Received on Sunday, 9 May 1999 00:25:29 UTC