Typeable characters from Martin J. Duerst on 1996-12-19 (uri@w3.org from December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Thu, 19 Dec 1996 18:37:35 +0100 (MET)
To: uri@bunyip.com
Message-Id: <Pine.SUN.3.95.961219172057.245O-100000@enoshima>
This message adresses the issue of typeability of ASCII characters.
The issue of typeability in general will be adressed in a separate mail.

First the relevant parts of the URL syntax draft:

From  1.3. URL Transcribability

>    The URL syntax has been designed to promote transcribability over all
>    other concerns.  ....

>    These design concerns are not always in alignment.  For example, it
>    is often the case that the most meaningful name for a URL component
>    would require characters which cannot be typed on most keyboards.
>    In such cases, the ability to access a resource is considered more
>    important than having its URL consist of the most meaningful of
>    components.


From 2.3.2. When to Escape and Unescape

>    An exception to the unescaping rules is allowed when it is known that
>    some older systems are escaping a character that does not need to be
>    escaped, and when it is possible to reliably discriminate between
>    such an escaped data character and any reserved use for that
>    character.  For example, it is generally safe to unescape "%7e" when
>    it occurs near the beginning of an http URL path, since many older
>    systems automatically escape the "~" character even though it is
>    unreserved.


From 2.3.3. Excluded Characters

>    Other characters are excluded because gateways and other transport
>    agents are known to sometimes modify such characters.
> 
>       unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"


From F.2:

>    The tilde "~" character was added to those in the "unreserved" set,
>    since it is extensively used on the Internet in spite of the
>    difficulty to transcribe it with some keyboards.


And some additional facts: A check on my Mac keyboard (standard
Swiss German keyboard) and a check through my ECMA registry for
ISO 646 versions showed that at least the following characters
can also not be assumed to be widely typeable (not that I want
to imply "widely" from "Swiss", but I know that the situation
is similar all around Europe, and probably not better in Asia):

"@", "$", "#"(for fragments).

Together with "~", these characters may also sometimes be modified
in gateways and other transports.


As a result, the draft gives the impression that it treated
typability above all other concerns. The reader has to go to
the appendix to find an indication that this might not be
true. And there, only 1/4 of the problem is mentionned.

To clear things up, I guess the following changes have to be
made:

Change design considerations from:

>    The URL syntax has been designed to promote transcribability over all
>    other concerns.  ....

to:
>    The URL syntax has been designed with transcribability as
>    its main concern.


>    In such cases, the ability to access a resource is considered more
>    important than having its URL consist of the most meaningful of
>    components.

to:
>    In such cases, the ability to type an URL has been favored
>    in most cases. In some cases, existing previous usage has
>    let to the introduction of exception.

If you really want to be honest, you will also add:
>    These exceptions favor users of US-American keyboards over others.

In 2.2. Unreserved Characters:

>       mark        = "$" | "-" | "_" | "." | "!" | "~" |
>                     "*" | "'" | "(" | ")" | ","
> 
>    Unreserved characters can be escaped without changing the semantics
>    of the URL, but this should not be done unless the URL is being used
>    in a context which does not allow the unescaped character to appear.

Add a note that "$" and "~" are not available on many keyboards.


In 2.3.2. When to Escape and Unescape:

>    The angle-bracket "<" and ">" and double-quote (`"') characters are
>    excluded because they are often used as the delimiters around URLs in
>    text documents and protocol fields.  The character "#" is excluded
>    because it is used to delimit a URL from a fragment identifier in URL
>    references.  The percent character "%" is excluded because it is used
>    for the encoding of escaped characters.

Add a note saying that "#" is not available on many keyboards.


>    An exception to the unescaping rules is allowed when it is known that
>    some older systems are escaping a character that does not need to be
>    escaped, and when it is possible to reliably discriminate between
>    such an escaped data character and any reserved use for that
>    character.  For example, it is generally safe to unescape "%7e" when
>    it occurs near the beginning of an http URL path, since many older
>    systems automatically escape the "~" character even though it is
>    unreserved.

Either clearly say *here* that this is done despite the fact that
typeability of "~" is limited, or go back to the original state
of having "~" unwise (it appears indeed changed by some gateways).


Regards,	Martin.


----
Dr.sc.  Martin J. Du"rst			    ' , . p y f g c R l / =
Institut fu"r Informatik			     a o e U i D h T n S -
der Universita"t Zu"rich			      ; q j k x b m w v z
Winterthurerstrasse  190			     (the Dvorak keyboard)
CH-8057   Zu"rich-Irchel   Tel: +41 1 257 43 16
 S w i t z e r l a n d	   Fax: +41 1 363 00 35   Email: mduerst@ifi.unizh.ch
----
Received on Thursday, 19 December 1996 12:38:06 UTC