- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Thu, 6 Apr 2000 07:00:23 +0800 (CST)
- To: yergeau@alis.com
- cc: xml-editor@w3.org, w3c-i18n-ig@w3.org
On Wed, 5 Apr 2000, John Cowan wrote: > In principle and as XML 1.0 is written, there might be an encoding > named "UTF-+ADc-" in which case there would be no straightforward > way of discriminating between it and UTF-7 to a processor which understood > both. I think we can afford to wait for this problem to arise. (And in any case, I think "+" is not an allowed character in a MIME Content-Type Header Field according to RFC 2045, so the problem would only occur if someone made an encoding called UTF-n where "n" is any character allowed by MIME except 7 and where that encoding codes "n" as "+ADC-". I would be surprised if IANA would let anyone but Unicode/ISO register another UTF-n, and I would be most surprised if such a UTF-n had that property. In fact, I would think it most improbable that if XML had propogated a method relying on UTF-+ADC- meaning UTF-7 that IANA would register it with the naughty name. ) > The meaning is that the procedure *of Appendix F* does not reliably > detect UTF-7. Yes, that is why I would like better wording. A sentence like "Limitations: An implementation of autodetection which follows the algorithm given in this appendix will fail to detect the encoding of a UTF-7 entity if its XML header contains encoded characters. The autodetection algorithm given in this appendix may be enhanced to cope with this and with other rarer or anomalous encodings." would be fine if the WG does not want to spell out about +ABC-. (I don't think we need to put in a warning about sending an external parsed entity in UTF-8 if it starts with an XML header for UTF-7 encoded as UTF-7. If one of our i18n hopes is that everything will converge towards UTF-*, then I think we have to be scrupulous to avoid giving developers the idea that anything to do with character encodings is worse or more difficult than it is. We need to foster a can-do, "I can do that" attitude; developers will run a mile if they think things are too hard, and the direction they run might not be towards UTF-*. If they think the infrastructure is broken, they won't use it. And this is one part of the infrastructure that is proving itself not broken AFAIK.) > That is true of Appendix F autodetection, which is explicitly described > as non-normative. The most that is said is that autodetection is > "not entirely hopeless". I think this should be removed. If there is no known problem with entities that have explicit XML headers, then autodetection is not "not entirely hopeless" but "entirely satisfactory". (I take that phrase as rhetorical rather than descriptive; a palliative to prevent panic and depression by readers who might come anticipating problems rather than solutions.) Rick Jelliffe
Received on Wednesday, 5 April 2000 19:00:53 UTC