- From: Sam Sun <ssun@CNRI.Reston.VA.US>
- Date: Fri, 4 Sep 1998 12:37:16 -0400
- To: "Larry Masinter" <masinter@parc.xerox.com>, "Martin J. Duerst" <duerst@w3.org>
- Cc: "URI distribution list" <uri@Bunyip.Com>
>No, if you're going to update your software, update it to generate UTF-8, >don't update it to add some encoding-declaration. That is, we _don't_ >want to recommend some new practice that will further the current situation >where there is no interoperability. > >> This allows URI parsers to convert to UTF-8 (or any >> other encoding used by the protocol) correctly without checking the document >> context. > >A 'URI interpreter' isn't a 'URI parser'. The parsing itself is simple. > According to the W3C implementation (http://www.w3.org/Library/), all that 'URI interpreter' does seems to 'parse' out the URI reference and hand it to the protocol specific 'filters' (see http://ssun.cnri.reston.va.us/ietf/w3c-libwww-5.1e/Library/User/Using/Filter s.html) to 'interprete'. For example, any "ftp URL" is handed to HTFTPParseURL() in HTFTP.c, and the function HTFTPParseURL() will 'interprete' the "ftp URL" and get "uid", "passwd", etc. Because each network protocol does things (including use of encoding) differently, I don't quite understand why it's necessary for the 'URI interpreter' to care about the exact encoding. >> Otherwise, it could be hard for URI parsers to figure out the >> encoding of any particular URI, especially in multilingual document or on >> platforms with multiple input methods installed. > >The point is that it doesn't need to 'figure it out'. > >> For example, the URI in HTML document may be defined as: >> >> <uri scheme> ":" [ <encoding> "@" ] <uri scheme specific string> >> >> The <encoding> is optional, and is not needed if the <uri scheme specific >> string> uses UTF-8. > >This suggestion would continue to propagate non-interoperability and >has no migration path. > I'm not sure I understand your points here. Could you elaberate? (I assume we are talking about the "URI" encoding in the HTML document, not what get transmitted over the wire.) I thought the suggestion would ENCOURAGE the use of UTF-8 (because all other encoding requires extra typing). In the mean time, for platforms where UTF-8 is not practical, it defines a machenism that will help protocol specific 'filters' (e.g. HTFTPParseURL() ) to correctly convert to the encoding used by the protocol. Regards, Sam
Received on Friday, 4 September 1998 12:45:59 UTC