Date: Tue, 15 Apr 1997 15:05:53 -0400 (EDT) From: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com> Subject: Re: revised "generic syntax" internet draft To: Gary.Adams@east.sun.com, email@example.com Cc: firstname.lastname@example.org, email@example.com, Harald.T.Alvestrand@uninett.no Message-Id: <libSDtMail.9704151505.29976.gra@zeppo> > Date: Tue, 15 Apr 1997 10:30:53 PDT > From: Larry Masinter <firstname.lastname@example.org> > To: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@East> > CC: email@example.com, firstname.lastname@example.org, Harald.T.Alvestrand@uninett.no > Subject: Re: revised "generic syntax" internet draft > > > Are there any "facts" still in need of investigation > > or are the only unresolved issues questions of "opinion"? (My opinion > > is that the current system is already broken, if this could be > > subtantiated would that invalidate the "status quo" as a viable > > alternative?) > > > At this point, I think we need not just "facts" but some > actual "design". Exactly how does this all work in a way that > actually solves the problem? This is a fair challenge and it also addresses the point I was trying to make about whether or not the current system is indeed broken. i.e. how would the status quo address these difficult transitions in URL transcribability. > > Let's suppose someone wants to publish information > about their product and put up a URL in a magazine. Let's also consider two modes of operation. One targeted at a common language centric group (monolingual) and another targeted to a wider multilingual audience. e.g. http://ccim.org/ (iso8859-1 text with images of Chinese text) http://www.un.org/ (The link "Franšais" points to URL "/french/" and "Espa˝ol" points to the URL "/spanish/") > > a) what URLs do they support in their server? Today the only option is to support a native encoding of the platform exposed via a URL which is opaque or ambiguous to the client application. The proposal for UTF-8 %HH escaped URLs would provide a canonical external representation of the URL. > b) what gets printed in the magazine? Assuming I can read the characters in the magazine in an unambiguous fashion, I should also be able to enter the characters of the URL on my local computer. If name of the document and the content of the document are in different languages, then choosing a name that reaches the largest audience makes the the most sense. e.g. phone numbers are more recognizable in a language portability sense than advertizement style 1-800-CALL-NOW mixed messages. Without opening the metadata can of worms the ideal printed form of a URL would be completely unambiguous about the contents of what it promises to deliver. e.g. language, encoding, time to live. Basically it is an incomplete contract for disconnected resource when it placed in print. ( "Content-Encoding" "ISO8859-5", "Content-Language" "ru-lt" "Lithuanian Russian", "Expires:" "..." "<URL:http://.../>" ) > c) what does the user type into the browser? Since it was a Braille magazine, they probably type in the same Braille characters. I'm not sure why the input and output questions are asked separately here? If the fonts on the screen or in the printed media are different than the labels on the input device, then some tools will be required to transcribe the information reliably. e.g. laser printers should have corresponding OCR scanners if the information is not human typable. > d) what does the browser do with what the user typed > in order to turn it into the URL that was generated in (a). Today the only alternative is to say the platform specific encoding of the server system must be %HH encoded as raw octets and published in the magazine, which the user enters as raw ascii strings, which is transmitted to the server where it is %HH decoded and handed to the local data store. i.e., it is only meaningful to the local server and is opaque to the magazine, the end user and the browser. If the encoding is labeled (or known to be UTF8), then the magazine could publish either native character representation or a %HH escaped URL. Similarly the browser could support input of native characters or a %HH escaped URL. Finally, the %HH escaped UTF8 URL is transmitted to the server and converted for use in accessing the local resource. I could be wrong, but this second scenario seems more transparent to users in terms of the possibility of presenting meaningful names to a wider audience with more potential forms of user corrections when the URL is more understandable to the user community. > > how does this work for > 1) Japanese (16-bit characters) I'm not sure if the 16-bit character question is directed at eventually using binary URLs to eliminate the expansion problems of UTF8? e.g. Java uses an internal 16-bit character. For now only 7-bit clean UTF8 %HH escaped URLs are "on the table" for discussion in the gerneric URL syntax document. > 2) Hebrew (right to left) If I understand the issue correctly, the moment a bidirectional character is permitted in a URL, the rendering could be ambiguous for "http://system/ABC" vs. "http://system/CBA". I don't have a good answer for this situation. Perhaps someone with "file manager" experience on a Hebrew platform could shed some light on what the typical conventions are used for navigating hierarchical file systems? > > What happens with "/" and the path components? How does > directionality get represented? What are the considerations > for ambiguity beyond the familiar 0O0O0O1l1l1l for ASCII? In the NFS server specification rfc2055 section 6.1 they had to address the "/" issue for "cononical vs native path" considerations. The simple answer is that "/" is separator for multi-component lookup consideration, whether the native file system uses "/" or ":" or whatever internally. The escaped form for "%2f" could be used when the name actually contains a "/". > > When the details of this are worked out, and we actually > have something that works to allow non-ASCII URLs, then > we can look and see if %xx-hex encoded UTF-8 encoded Unicode > actually forms part of the solution. But it doesn't seem > "trivial" to me, or at all certain that the current proposal > is actually part of the solution. > If the current proposal doesn't solve the problem, where is the next place where a solution could be considered? e.g., to address the security issue a new URL scheme https was used to introduce SSL communication. Another approach could be to pursue an "executable content" solution to a portion of the problem space. I think the I18N named resource problem is a real need in the market today. It could be that I18N URLs are not the right way to meet that need. I'm opened to other ways to meet that need, but so far the UTF8 %HH escaped proposal has appeared to be the most open and understandable approach. %HH escaped unspecified character set (current spec) %HH escaped ISO-8859-1 character set (common European practice) %HH escaped ISO-2022 character set (common Asian practice ?) %HH escaped UTF-8 Unicode 2.0 character set (proposed) others? > Regards, > > Larry > -- > http://www.parc.xerox.com/masinter Thanks for still listening.