Date: Thu, 19 Dec 1996 21:35:19 +0100 (MET) From: "Martin J. Duerst" <email@example.com> To: firstname.lastname@example.org Subject: URL syntax: Typeability Message-Id: <Pine.SUN.3.95.961219205452.245Z-100000@enoshima> Continuing with my comments on draft-fielding-url-syntaxt-02.txt: The draft stresses transcribability very strongly, and with a very specific understanding. While a justification for why the current syntax was choosen is definitely a good thing, I think that the draft clearly overdoes in this area, for various reasons: - As shown in an earlier mail, the existing syntax has its own problems e.g. with European keyboards. - The draft itself rightfully mentionnes the use of meaningful components to help people remember URLs. This is a great step to recognize a very important fact that some earlier designs and documents ignored or excluded. - The draft even goes as far as admiting that there is practice beyond what is allowed: > Excluded characters must be escaped in order to be properly > represented within a URL. However, there do exist some systems that > allow characters from the "unwise" and "national" sets to be used in > URL references; a robust implementation should be prepared to handle > those characters when it is possible to do so. which may be taken as an indication that keyboarding and other aspects of transcribability are seen as less important by some users. - The draft ignores a couple of arguments that show that taking a lowest-common-denominator perspective to allow "everybody" to transcribe "every" URL is overevaluated. I developped these arguments mostly in the recent URN discussion (special thanks to "devil's advocate" Keith Moore :-). Despite the important differences between URNs and URLs, they can be applied to URL. Requiring that all URLs exist in ASCII, and only in ASCII, because many people cannot type anything else, sounds like requiring that all newspaper texts be printed with a minimum typesize of 16pt, because many people cannot read smaller print :-). Doing so would make newspapers overly clumsy for most (ASCII makes many URLs very clumsy for those that actually use them. Transcribability, went averaged over the weighted set of potential users, is significantly lower). It is not necessary because there are glasses (it's easy to construct a Java Web applet/page that provides any keyboard whatever and any additional input support whatever). Requiring all URLs to exist only in an ASCII subset also gives many problems when creating them. How do you create a (syntactically correct!) URL for Japanese files? You are not supposed to just input the filename into your HTML page! As a consequence of the above considerations, I argue for a "downsizing" of the general transcribability issues in the various parts where it is mentionned. One particular issue in this context, remaining from an earlier mail, is the following text: > 2. URL Characters and Character Escaping > > All URLs consist of a restricted set of characters, chosen to > maximize their transcribability and usability across varying computer > systems, natural languages, and nationalities. This restricted set > corresponds to a subset of the graphic printable characters of the > US-ASCII coded character set . If one wants to maximise transcribability aross computer systems and languages (we dropped nationalities already), the best solution is to choose an URL representation most practical for the majority of the users that will use that URL. For an URL describing some Greek resource, that will most probably be Greek, and so on. Probably, it is therefore best to shorten that paragraph to: > All URLs consist of a restricted set of characters. This restricted set > corresponds to a subset of the graphic printable characters of the > US-ASCII coded character set . And while we are at it, US-ASCII (as referenced) does not contain SPACE nor DELETE, and no control characters. All ASCII characters are printable. See ECMA registration Nr. 6. Therefore, the text will become: > All URLs consist of a restricted set of characters. This restricted set > corresponds to a subset of the US-ASCII coded character set . It's late now. It looks like I have to delay the real I18N issues to tomorrow. Regards, Martin.