Re: PROPOSAL: i74: Encoding for non-ASCII headers

Frank Ellermann wrote:
> Julian Reschke wrote:
>  
>>> If there is a chance that these values have to be displayed in
>>> HTML pages or used in XML files the NCR form &#xnnnnnn; might
>>> work "as is", for \u'nnnnnn' something needs to determine a
>>> corresponding UTF-16, hex. NCR, or UTF-8.
>  
>> Not sure I understand this.
>  
>> 1) Even if you want to use a value in HTML or XML, you will
>> need to decode first, then re-encode, otherwise you'll end up
>> with something like "&xnnnnnnn;".
> 
> Not for "work as is", where decoding hex. NCRs is the job of a 
> browser, or in the XML case unnecessary.  If you want something
> better than "as is" for various Unicode security considerations
> both notations are fine.

Nope. Sorry.

There are characters allowed in HTTP headers that need to be escaped 
both in HTML and XML, such as "<". So, to create HTML or XML from the 
header contents, you will need to HMTL- or XML-escape the text anyway 
(everything else is a hack).

If you do so, you can't simply include a "&#xNNNNNN;" form from the HTTP 
header, you need to decode it first.

> To protect encodings both forms allow this, your proposal &amp;
> is okay for HTML and XML, maybe the &#x26; in RFC 5137 is more
> general.  For the \u form RFC 5137 mentions \\u as protection,
> in essence that is "double all backslashes" (for a shell prompt
> when I had to do this "manually" it made me nervous... ;-)  
> 
> In both cases you'd have to explain what you want and how this
> works, for NCRs that might be simpler (YMMV).

I'm still not sure what you're talking about. Could you please provide 
an example?

 > ...

BR, Julian

Received on Tuesday, 25 March 2008 13:14:52 UTC