W3C home > Mailing lists > Public > www-tag@w3.org > February 2003

RE: "How to Compare URIs" update 3

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Sat, 22 Feb 2003 21:15:41 +0000
Message-ID: <T6093cd38c2c407b707788@DTCSEUVIG3.dtc.lon.ime.reuters.com>
To: duerst@w3.org
Cc: www-tag@w3.org, uri@w3.org

Hi Martin,

I find your diagram very useful and I agree that it or something 
like it would be a good addition to the new version of RFC 2396.
The one piece of terminology I have some trouble with, and which 
is already in RFC 2396, is the phrase "original character sequence".
Presumably, the sequence is "original" in the sense that the entity 
managing the resource has used this character sequence (eg a file 
pathname) to identify it.  If that is the case, then the problem I 
have is simply due to the, possibly selfish, perception that the 
characters I enter into the browser's address box are the "original" 
characters and that these are transformed in various ways before 
arriving at the entity managing the resource.  The direction of the 
arrows in the RFC 2396 diagram strengthens this way of perceiving 
the flow.  I wonder whether some word other than "original" would 
be clearer?

Thanks,
Misha


> -----Original Message-----
> From: Martin Duerst [mailto:duerst@w3.org] 
> Sent: 22 February 2003 20:36
> To: Tim Bray
> Cc: WWW-Tag; uri@w3.org
> Subject: Re: "How to Compare URIs" update 3

[...]

> RFC 2396 gives three levels, condensed in the following line:
> 
> URI character sequence->octet sequence->original character sequence
> 
> In practice, there are two more layers, one on each side.
> We then get:
> 
> a) substrate: paper, metal, audio waves, ascii, UTF-16, EBCDIC,...
>     We don't want to limit that to a particular encoding.
>     ^
>     |   conversion depending on substrate representation
>     V
> b) URI character sequence (just characters)
>     ^
>     |   conversion defined by RFC 2396 (always US-ASCII!)
>     V
> c) octet sequence (just octets)
>     ^
>     |   conversion currently scheme/server dependent, moving 
> towards UTF-8
>     V
> d) original character sequence (file names on server, query 
> strings,...)
>     ^
>     |   conversion server-dependent
>     V
> e) original octet sequence (e.g. UTF-16 for a filename on 
> WinNT, EBCDIC
>                              on an EBCDIC server, and so on)
> 
> Maybe this diagram should go into the new version of RFC 2396.

[...]



-------------------------------------------------------------- --
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Saturday, 22 February 2003 16:16:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:16 GMT