Re: revised "generic syntax" internet draft from Gary Adams - Sun Microsystems Labs BOS on 1997-04-16 (uri@w3.org from April 1997)

From: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com>
Date: Wed, 16 Apr 1997 11:18:52 -0400
To: fielding@kiwi.ICS.UCI.EDU, mduerst@ifi.unizh.ch
Cc: uri@bunyip.com
Message-Id: <199704161518.LAA03459@zeppo.East.Sun.COM>
> From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
> 
> Yes, because at present we don't tell the client to transcode the URL.
> Any transcoding is guaranteed to fail on some systems, because the
> URL namespace has always been private to the generator (the server in
> "http" or "ftp" or "gopher" URLs, the filesystem in "file" URLs, etc.).

This point about the private namespace has got to be understood by
everyone engaged in this current discussion. While the notion of URL 
generators was not clearly spelled out in the original syntax documents,
it is clear how current practice has evolved to it's current state,
given the minimal amount of semantic requirements within URLs.

> 
> Proposal 1b allows cooperating systems to have localized URLs that work
> (at least locally) on systems deployed today.

This is a point that I don't toally agree with. If Japanese glyphs
were printed in the magazine a client and server could only exchange
information if they used the same encoding assumptions. Therefore
only the ASCII URL would be safe to print and exchange.

> 
> Unless they are currently using iso-8859-1 characters in URLs, on pages
> encoded using iso-8859-1, which are also displayed correctly by far more
> browsers than just the ones you refer to.  Likewise for EUC URLs on
> EUC-encoded pages, and iso-2022-kr URLs on iso-2022-kr-encoded pages.
> The fact is, these browsers treat the URL as part of the HTML data
> stream and, for the most part, display it according to that charset
> and not any universal charset.

This is another point that I don't agree with. The contents of a document
are only constrained by the document authoring tools, while the character
set of the URL are constrained by the data store. e.g. an entire filesystem
might be restricted to EUC-jp on a Unix server, while the documents
might be encoded for Korean, Chineese, etc. user communities.

The goals for I18N include both monolingual and multilingual servers.

> 
> You keep waving around this "easily" remark without understanding the
> internals of a server.  If I thought it was easy to do those things,
> they would have been implemented two years ago.
> What do you do if you have two legacy resources, one of which has the
> same octet encoding as the UTF-8 transcoding of the other?  How do you
> justify the increased time to access a resource due to the failed
> (usually filesystem) access on every request?  Why should a busy server
> do these things when they are not necessary to support existing web
> services?

>From a legacy system perspective, I found out today that NFS v3
have no means of asking about the encoding used in the mounted
filesystem. So two mounted filesystems could present EUC and SJIS
pathnames with no means for the client to reliably render Japanese
glyphs for the remote file system pathnames.

URLs have the same "legacy problem".

> 
> That's all fine and good, but it doesn't solve the problem.  If we don't
> need to solve the problem, then the draft should progress as it stands.
> After all, there are at least a hundred other problems that the draft
> *does* solve, and you are holding it up.

I think we do still need to solve the problem, but I'm beginning to see
your point that the syntax document will not be enough to actual
solve the underlying problem. A better place to start may be to deploy
new servers which do generate UTF8 based resource names and are clearly
labeled that they are using Unicode for external communication.
Initially, Jeeves, Jigsaw and webnfs server might be better places 
to start than the Apache code base. These are smaller installed base
and much more flexible to modify at this point in time.

\
/gra
Received on Wednesday, 16 April 1997 19:27:09 UTC