- From: <petsa@us.ibm.com>
- Date: Tue, 11 May 1999 10:11:14 -0400
- To: John Cowan <cowan@locke.ccil.org>
- cc: Www-Xml-Schema-Comments@w3.org
- Message-ID: <8525676E.004DED9D.00@D51MTA03.pok.ibm.com>
John: Thank you for your thoughtful comments on the datatypes spec. Paul and I will discuss them next week when he gets back from WWW8. Regards, Ashok (Embedded image moved to John Cowan <cowan@locke.ccil.org> file: 05/09/99 12:35 AM pic29844.pcx) To: www-xml-schema-comments@w3.org cc: (bcc: Ashok Malhotra/Watson/IBM) Subject: Comments on schema datatypes WD of 6 May 1) datatypes-cowan-maxlength: says string length is defined in bytes, but it should be characters. 2) datatypes-cowan-boolean-parochial: the values "yes", "no", "true", and "false" are anglocentric and thus unacceptable. Even "1" and "0" will look bad in all-Arabic XML, where "٠" and "١" would fit in much better. Therefore, the lexical-representation facet needs to say definitely what is logical true and what is logical false. In some contexts ".TRUE." and ".FALSE." may be the right thing (Fortran), or "T" and "F", or "si" and "no", or "да" and "нет" (Russian). 3) datatypes-cowan-y2k: It is simply beyond belief that the 2-digit dates of ISO 8601 would be perpetuated in a standard being written in 1999! These should be flushed out; there should be no standardized way to represent dates that cannot be properly interpreted ("the current century"? Which current century?) 4) uri-scheme-facet: If URI is made a subtype of string, then scheme could be encoded using a regular expression. 5) picture-or-regex: Pictures are slightly shorter and are familiar to one community, but regexes are familiar to several overlapping communities (Unix and Perl), and are a superset of regexes. 6) perl-regex: Adopting Perl's syntax wholesale sounds like a good idea, but there are several i18n problems with ranges (what do they mean, Unicode value or collating sequence?) and the \0nnn and \xnn escapes, which "know" that the number of octal or hex digits is only 8 bits worth. In addition, "[a-z]" sometimes means literally a character in the range a to z, sometimes it means "any Latin lowercase letter", and sometimes it means "any lowercase letter". In ASCII these are the same, in Unicode they are not. It would be better to eliminate ranges and have ways to say "any lowercase letter", "any digit (Euro or not)", "any XML name char", "any XML name-start char", etc. etc. 7) nmtoken-primitive-or-generated: If the above is done, then it makes sense to define NMTOKEN etc. as subtypes of "string" constrained by regexes. 8) three-valued-logic: I don't feel strongly about this, but I think unknown = NULL is a fair assumption. If you really need more than three values, use an enum. 9) datatypes-cowan-enum: There seems to be no equivalent of the XML enumerated attribute type. This could be provided in general by allowing a fundamental facet "values" (or the like) specifying exact values. Then enums would be a subtype of NMTOKEN, but there could be other enums such as {14,18,23,28,34,Times Square} (stations of the IRT subway in Manhattan). 10) dateTime-lexical-representation: The reason that dates like 4/3/1943 aren't supported in ISO 8601 is that they mean April 3rd in some places and March 4th in other places. They should be discouraged, not supported. -- John Cowan cowan@ccil.org e'osai ko sarji la lojban.
Attachments
- application/octet-stream attachment: pic29844.pcx
Received on Tuesday, 11 May 1999 10:11:40 UTC