Re: revised "generic syntax" internet draft

Jon Knight (jon@net.lut.ac.uk)
Wed, 16 Apr 1997 11:04:23 +0100 (BST)


Date: Wed, 16 Apr 1997 11:04:23 +0100 (BST)
From: Jon Knight <jon@net.lut.ac.uk>
To: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com>
Cc: uri@bunyip.com, fielding@kiwi.ICS.UCI.EDU, Harald.T.Alvestrand@uninett.no
Subject: Re: revised "generic syntax" internet draft 
In-Reply-To: <libSDtMail.9704151153.13046.gra@zeppo>
Message-Id: <Pine.SUN.3.95.970416104120.6402N-100000@weeble.lut.ac.uk>

On Tue, 15 Apr 1997, Gary Adams - Sun Microsystems Labs BOS wrote:
> Using the HotJava browser yesterday to view
> 
>    http://www.alis.com:8085/~yergeau/url_utf8.htm
>    
> I was able to manually select the "View"->"Character Set" -> "Other" -> UTF8   
> and see the accented characters in the document text as well as in the 
> presentation of the URL. This worked for the 8bit UTF8 bytes, but was 
> not implemented for the %HH escaped characters. This would be a very
> useful feature to support in an I18N browser.

A few more datapoints on the above URL:

* Netscape Navigator 3.01 for X11 running under SunOS 4.1.4 (as
  are all the tests below) displays both that page and the two pages
  linked to from it (or is it one page with two different URLs?  Whatever
  - they both get displayed).  One of the URLs has lots of accented
  characters in which get displayed in the URL window, the above
  document's text and in the bottom left hand corner when the cursor is
  over the appropriate URL in the above document (Netscape is set to have
  a document encoding of "Japanese (auto-detect)" by the way). 

* X Mosaic 2.7b5 doesn't work with the above page or the pages linked to
  from it.  As far as I can tell, this is because there is a charset
  attribute following the "text/html" on the Content-Type header; I think 
  this is confusing it.

* Telnet (yes, I use telnet to get HTML pages once in a while) can
  retrieve the page linked to above.  However cut'n'paste under X11R6
  doesn't cut'n'paste the non-ASCII characters for me so the I18N'ed URL
  can't be cut'n'pasted (either from Netscape's URL window or from the
  document that telnet returned in an xterm).  I notice that the web
  server is returning the charset attribute even though I'm making an
  HTTP/1.0 request.  Is that right?  I thought thinks like charset were an
  HTTP/1.1 thing?

* Lynx version 2.7.1 blows up spectacularly on the above URL, most likely
  because of the charset parameter on the Content-Type header again (it
  complains that "Start file could not be found or is not text/html or
  text/plain" after dumping the raw HTML out to the screen).  The
  document with the %-escaped URL suffers the same fate but the I18N
  version can't even be cut'n'pasted and I've no idea how to generate all
  the accented characters on my keyboard.

* The CERN line mode browser v3.0 blew up on the above URL with a failed
  system call after complaining that it couldn't display it.

As I say folks, just some more datapoints, interpret as you will.

Tatty bye,

Jim'll

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer
Studies, Loughborough University of Technology, Leics., ENGLAND.  LE11 3TU.
* I've found I now dream in Perl.  More worryingly, I enjoy those dreams. *