Re: iDNR, an alternative name resolution protocol from Martin J. Duerst on 1998-09-02 (uri@w3.org from September 1998)

From: Martin J. Duerst <duerst@w3.org>
Date: Wed, 02 Sep 1998 12:35:24 +0900
To: "Sam Sun" <ssun@ns.cnri.reston.va.us>
Cc: "Larry Masinter" <masinter@parc.xerox.com>, "Harald Tveit Alvestrand" <Harald.Alvestrand@maxware.no>, "Jon Davis" <jdavis@inetinit.org>, "URI distribution list" <uri@Bunyip.Com>
Message-Id: <199809020838.RAA23672@sh.w3.mag.keio.ac.jp>

Hello Sam,

Many thanks for your comments. We are still working on the draft,
so any comments are wellcome. In particular, I think one of the
hard pieces is that in many places, this draft is a meta-spec,
i.e. it say what other specs should do. That requires very careful
wording; I think Larry has already some very good work on that.

As for your comments, here are my answers:

At 16:33 98/09/01 -0400, Sam Sun wrote:

> The draft defines URI as "... both for transmission in network protocols and
> representation in spoken and written human communication". However, it seems
> that the URI defined for network protocol may have different set of
> requirements from URI targeted for human communication. URI defined for
> network protocol doesn't need to be concerned with "user friendly" as much
> as URI defined for human comsumption. And I think URI defined human
> communication should not require "everyone in the world be able to read or
> enter", because no single language is "friendly" to everyone in the world.
> 
> For any particular URI scheme defined for a specific network protocol (e.g.
> http), it makes it simpler to have a uniform encoding. However, if URI is
> defined as the guideline for every network protocol to be integrated with
> web browser, it doesn't seem practical to enforce any specific encoding.
> Different URI schemes may map to different network protocols, and different
> protocols may have their very own encoding (already) defined. In fact, most
> URI scheme specific Resolver (telnet, ftp, ldap, ...) treats its URI as
> "human entered" and converts it into the protocol encoding before sending
> out the request.

We have to clearly distinguish three things here:

- The URIs as they are seen by humans. On a napkin, cardboard box, or what
  you want, they don't have an encoding. On the screen, they have an encoding,
  but the user doesn't and shouldn't care about it.

- URIs as they are used in protocols. Up to now, the main protocol I know
  that uses URIs is HTTP. FTP, telnet, LDAP,... don't use URIs [directly].
  Even HTTP in many cases uses only a part of an URI.

- Information in URIs that is somehow used in protocols. These are not URIs.
  You are right that each protocol should be able to use whatever encoding
  is appropriate. If the draft says or implies anything else, we have to
  make it clearer. While FTP i18n is defined to use UTF-8 anyway, and so
  FTP doesn't make for a particularly interesting example, there is already
  an example that shows this very clearly: The IMAP URI, RFC 2192.
  IMAP uses a different encoding (usually denoted as "modified UTF-7"
  for its folder names). The RFC gives code for conversion between this
  and UTF-8.



> These said, it seems more appropriate to define URI "for representation in
> spoken and written human communication" ONLY. And the URI encoding should be
> defined as scheme specific. Some URI schemes (e.g. "http:") may require a
> single encoding. While other URI schemes (e.g. "hld:") would allow any
> native encoding to be used. The conversion from the human entered URI to the
> network protocol is handled by the scheme specific Resolver.

I agree with you that it might be a good idea to start with "representation in
(spoken and) written human communication". As far as I understand, that was
what was done in RFC 2396, too. But I would want to make a strong and
serious warning against anything that makes encodings in URIs dependent on
scheme specifics. Currently, we can have a look at any URI and always
read the characters (as long as they are limited to the ASCII repertoire),
independent of the scheme. If we need individual converters and display
logic for each URI type, we loose all the benefits of having an *Uniform*
resource identifier.


Regards,   Martin.

Received on Wednesday, 2 September 1998 04:44:52 UTC