- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 29 May 2002 14:36:01 +0900
- To: Keith Moore <moore@cs.utk.edu>
- Cc: Tim Bray <tbray@textuality.com>, www-tag@w3.org
At 00:35 02/05/28 -0400, Keith Moore wrote: > > While indeed currently http is defined to use %hh escaping, > > why would there be a need to restrict over the wire to ASCII, > > in particular for future protocols? TCP/IP doesn't have any > > problems with 8-bit data. > >TCP doesn't have any problems with arbitrary binary data >either, but for some reason people often prefer to use text. >The popularity of XML illustrates that rather well. Fully agreed. And XML provides the whole range of Unicode characters for content, and a very wide subset for element/attribute names. >For similar reasons, people often prefer to restrict the >set of characters that are used for certain purposes. For instance, >it's useful if resource identifiers are transmitted in a form >that can be displayed on any terminal, transcribed on most >keyboards, and printed on any printer. For some people, this is useful. For others, having the identifier in a form that they can easily read and associate with is more important. >In other words, it's not TCP that's the problem - it's the >inability of most human beings and their keyboards to cope >with the tremendous diversity of characters that are in use. >TCP is data transparent, but human eyes, minds, voices, >and fingers aren't. Your argument doesn't provide any support for Tim Bray's original proposal of "IRIs for documents, %hh for protocols". Using text on the protocol level is mostly used for debugging. Having to map from one representation on the document level to another on the protocol level is very painful. This is obvious for the native reader, but in many cases also applies to an outsider. Why? Even an outsider has an easier job to check that two strings in an unknown script are the same, rather than to look up unknown characters in a code table and check the escaping. Also, for debugging 8-bit data, it is really easy to set things so that bytes above 0x7F are displayed with some escaping. E.g. something like \324 as it is seen in many versions of emacs. This puts a tiny additional burden on those who have the easiest job anyway, which is not a bad idea. Regards, Martin.
Received on Wednesday, 29 May 2002 03:17:38 UTC