Re: Comments on charmod from Chris

At 00:35 02/05/28 -0400, Keith Moore wrote:
> > While indeed currently http is defined to use %hh escaping,
> > why would there be a need to restrict over the wire to ASCII,
> > in particular for future protocols? TCP/IP doesn't have any
> > problems with 8-bit data.
>
>TCP doesn't have any problems with arbitrary binary data
>either, but for some reason people often prefer to use text.
>The popularity of XML illustrates that rather well.

Fully agreed. And XML provides the whole range of Unicode
characters for content, and a very wide subset for
element/attribute names.


>For similar reasons, people often prefer to restrict the
>set of characters that are used for certain purposes.  For instance,
>it's useful if resource identifiers are transmitted in a form
>that can be displayed on any terminal, transcribed on most
>keyboards, and printed on any printer.

For some people, this is useful. For others, having the
identifier in a form that they can easily read and
associate with is more important.


>In other words, it's not TCP that's the problem - it's the
>inability of most human beings and their keyboards to cope
>with the tremendous diversity of characters that are in use.
>TCP is data transparent, but human eyes, minds, voices,
>and fingers aren't.

Your argument doesn't provide any support for Tim Bray's original
proposal of "IRIs for documents, %hh for protocols". Using text
on the protocol level is mostly used for debugging. Having to
map from one representation on the document level to another
on the protocol level is very painful.

This is obvious for the native reader, but in many cases also
applies to an outsider. Why? Even an outsider has an easier
job to check that two strings in an unknown script are the same,
rather than to look up unknown characters in a code table and
check the escaping.

Also, for debugging 8-bit data, it is really easy to set things
so that bytes above 0x7F are displayed with some escaping. E.g.
something like \324 as it is seen in many versions of emacs.
This puts a tiny additional burden on those who have the
easiest job anyway, which is not a bad idea.


Regards,    Martin.

Received on Wednesday, 29 May 2002 03:17:38 UTC