Re: When is percent-encoding required. from Charles Lindsey on 2010-01-10 (uri@w3.org from January 2010)

From: Charles Lindsey <chl@clerew.man.ac.uk>
Date: Sun, 10 Jan 2010 12:27:32 -0000
To: Julien ÉLIE <julien@trigofacile.com>
Cc: URI <uri@w3.org>
Message-ID: <op.u6bgv6rx6hl8nm@clerew.man.ac.uk>
I am forwarding tjis from Julian Élie. who seems to have difficulty  
posting to this list. My own comments are added.

On Sat, 09 Jan 2010 20:16:36 -0000, Julien ÉLIE <julien@trigofacile.com>  
wrote:

> Hi Bob,
>
> [I hope this mail will reach the mailing-list.  I do not understand
> why I cannot post -- I receive mails from the list, but I never had
> the one for the archive approval system.]
>
>>>> For certain, you should percent-encode that "%" as well, but I'm
>>>> inclined to believe you should percent encode the "^`{|}" also.  I
>>>> think this would be the correct normalized form:
>>>> news:foo@bar.!%23$%25&'*+%2F=%3F%5E%60%7B%7C%7D.example
>>>
>>> I have just tested to write that line in IE8 and it works fine:
>>
>> I would argue that IE8 *doesn't* work fine.
>>
>>> ARTICLE <foo@bar.!#$%&'*+/=?^`{|}.example> is sent.
>>>
>>> However, with Firefox 3.5.6, the Windows file explorer or
>>> Windows Mail (a newsreader), it fails:
>>>
>>> ARTICLE <foo@bar.!%23$%25&'*+%2F=%3F%5E%60%7B%7C%7D.example>
>>>
>>> is sent.
>>
>> I consider that to be correct behavior.
>
> Then that's a problem because you will never be able to read
> that article.  All you will receive is a 430 error code
> (message-ID not found).
> A news server expect a real message-ID, not en encoded message-ID.

Sure. That example was entirely artificial regarding what happens in the  
Real World, and I doubt those characters will ever appear in real  
message-ids; but the specification needs to get it right, just in case.

>
> A message-ID is parsed as a byte string by a news server.
>
>
>>>> However, I believe virtually all URI parsers will interpret
>>>> "news:foo@bar.!%23$%25&'*+%2F=%3F^`{|}.example" as intended.
>>>
>>> Works fine in IE8 but Firefox, the Windows file explorer
>>> and Windows Mail still re-encode it:
>>
>> When I said "URI parsers" I specifically meant the parser itself — as
>> in, the parser won't misinterpret some component as something other
>> than what it is, and the value of all components will be available to
>> the application.  As Martin said, those characters aren't really
>> supposed to show up in a URI and have to be encoded.  Browsers that
>> figure out what you meant and encode the URI before sending it are
>> following Postel's law and, in my opinion, doing the right thing.
>
> I do not know what browsers are supposed to do but if that is the
> right thing, then it does not work with the NNTP protocol.

But nobody is expecting it to be sent to NNTP in that form.  
Whoever/whatever interprets that URI (possibly a browser) should decode  
it, then open an NNTP dialog with the NNTP server, and then send the  
article with the funny characters already decoded. Perhaps I should  
mention that in the RFC.
>
>
> When Joseph said it worked fine in his newsreader (Unison), what
> was the ARTICLE command actually sent to the news server?
>
> It MUST be:
>
> ARTICLE <foo@bar.!#$%&'*+/=?^`{|}.example>
>
> Otherwise, the newsreader is broken.
>
>
> If someone wants to test message-IDs, I can create relevant articles
> with these message-IDs locally on my news server, so that you could
> test to retrieve them.  (I can also post them to a worldwide group
> like misc.test if you have access to it and prefer your usual news
> server.)
>



-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
Received on Sunday, 10 January 2010 12:28:02 UTC