- From: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com>
- Date: Wed, 30 Apr 1997 08:37:40 -0400
- To: Dan.Oscarsson@trab.se, masinter@parc.xerox.com
- Cc: uri@bunyip.com
> From: Dan Oscarsson <Dan.Oscarsson@trab.se> ... > > > > Dan, for each item in a directory listing, there are two entries. > > > > <A HREF="this-is-the-URL">this-is-what-the-user-sees</A> > > > > The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8, > > no matter what the user sees. > > > > If you use hex-encoding, yes. But NOT if you use the native character set > of the document. In that case, the 'this-is-the-URL' part must > use the same character set as the rest of the html document. Raw UTF-8 > may only be used in a UTF-8 encoded html document, not in a iso 8859-1 > encoded document. The document character set for HTML 2.0 and 3.2 was iso 8859-1. The document character set for HTML 4.0 and XML will be iso 10646. >From what little I know about SGML, the document must be converted to a single document character set before the SGML parser is allowed to operate on the markup. http://www.w3.org/pub/WWW/MarkUp/Cougar/ http://www.w3.org/pub/WWW/TR/WD-xml-961114.html#sec2.2 http://www.w3.org/pub/WWW/TR/WD-xml-961114.html#sec4.2.3 If I use a multilingual text editor to create my *ML documents and "paste" a raw UTF8 url into the href field, the editor either 'negotiates for the encoding information' from the desktop clipboard service or it assumes the sending application is using the same encoding that it needs. So when I cut the EUC-jp URL from my browser "location" window and paste it into my editor it may just assume the bits are iso8859-1 characters. For experimenting with combined document authoring/browsing functions the "w3 for emacs" browser and the "psgml-mode" editor in the Xemacs 20.0(with MULE support) provide a good platform for experimentation. http://www.xemacs.org/faq/xemacs-faq.html#internationalization > > A large amount of html documents are hand written in a text editor. A user > can not be expected to use a different encoding when typing the URLs > in a document. But they might have to use a different encoding when saving the file to disk. And the document itself might be converted as it is saved to disk. These are common functions in a multibyte plain text editor, just as intelligent cut and paste functions are needed in a shared desktop environment. I think your point about "authoring URLs" within HTML documents with a "plain text editor" is that the user will have a local input method for entering native characters (e.g., compose key sequences, virtual keyboard, radical composition, etc.) which will be operating in the same manner for document text and for URL characters. Since the authoring tools did not offer a means of recording the character encoding information, it is not possible for a web server to make a distinction when a document is transmitted on the wire. \ /gra
Received on Wednesday, 30 April 1997 08:38:19 UTC