Re: iDNR, an alternative name resolution protocol

Sam Sun (ssun@CNRI.Reston.VA.US)
Fri, 4 Sep 1998 12:37:16 -0400


Message-ID: <05d401bdd822$473ed630$1c1e1b0a@ssun.CNRI.Reston.Va.US>
From: "Sam Sun" <ssun@CNRI.Reston.VA.US>
To: "Larry Masinter" <masinter@parc.xerox.com>,
Cc: "URI distribution list" <uri@Bunyip.Com>
Date: Fri, 4 Sep 1998 12:37:16 -0400
Subject: Re: iDNR, an alternative name resolution protocol

>No, if you're going to update your software, update it to generate UTF-8,
>don't update it to add some encoding-declaration. That is, we _don't_
>want to recommend some new practice that will further the current situation
>where there is no interoperability.
>
>> This allows URI parsers to convert to UTF-8 (or any
>> other encoding used by the protocol) correctly without checking the
document
>> context.
>
>A 'URI interpreter' isn't a 'URI parser'. The parsing itself is simple.
>


According to the W3C implementation (http://www.w3.org/Library/), all that
'URI interpreter' does seems to 'parse' out the URI reference and hand it to
the protocol specific 'filters' (see
http://ssun.cnri.reston.va.us/ietf/w3c-libwww-5.1e/Library/User/Using/Filter
s.html) to 'interprete'.

For example, any "ftp URL" is handed to HTFTPParseURL() in HTFTP.c, and the
function HTFTPParseURL() will 'interprete' the "ftp URL" and get "uid",
"passwd", etc.

Because each network protocol does things (including use of encoding)
differently, I don't quite understand why it's necessary for the 'URI
interpreter' to care about the exact encoding.


>> Otherwise, it could be hard for URI parsers to figure out the
>> encoding of any particular URI, especially in multilingual document or on
>> platforms with multiple input methods installed.
>
>The point is that it doesn't need to 'figure it out'.
>
>> For example, the URI in HTML document may be defined as:
>>
>> <uri scheme> ":" [ <encoding> "@" ] <uri scheme specific string>
>>
>> The <encoding> is optional, and is not needed if the <uri scheme specific
>> string> uses UTF-8.
>


>This suggestion would continue to propagate non-interoperability and
>has no migration path.
>

I'm not sure I understand your points here. Could you elaberate? (I assume
we are talking about the "URI" encoding in the HTML document, not what get
transmitted over the wire.) I thought the suggestion would ENCOURAGE the use
of UTF-8 (because all other encoding requires extra typing). In the mean
time, for platforms where UTF-8 is not practical, it defines a machenism
that will help protocol specific 'filters' (e.g. HTFTPParseURL() ) to
correctly convert to the encoding used by the protocol.


Regards,
Sam