- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 25 Mar 2008 14:14:04 +0100
- To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- CC: ietf-http-wg@w3.org
Frank Ellermann wrote: > Julian Reschke wrote: > >>> If there is a chance that these values have to be displayed in >>> HTML pages or used in XML files the NCR form &#xnnnnnn; might >>> work "as is", for \u'nnnnnn' something needs to determine a >>> corresponding UTF-16, hex. NCR, or UTF-8. > >> Not sure I understand this. > >> 1) Even if you want to use a value in HTML or XML, you will >> need to decode first, then re-encode, otherwise you'll end up >> with something like "&xnnnnnnn;". > > Not for "work as is", where decoding hex. NCRs is the job of a > browser, or in the XML case unnecessary. If you want something > better than "as is" for various Unicode security considerations > both notations are fine. Nope. Sorry. There are characters allowed in HTTP headers that need to be escaped both in HTML and XML, such as "<". So, to create HTML or XML from the header contents, you will need to HMTL- or XML-escape the text anyway (everything else is a hack). If you do so, you can't simply include a "&#xNNNNNN;" form from the HTTP header, you need to decode it first. > To protect encodings both forms allow this, your proposal & > is okay for HTML and XML, maybe the & in RFC 5137 is more > general. For the \u form RFC 5137 mentions \\u as protection, > in essence that is "double all backslashes" (for a shell prompt > when I had to do this "manually" it made me nervous... ;-) > > In both cases you'd have to explain what you want and how this > works, for NCRs that might be simpler (YMMV). I'm still not sure what you're talking about. Could you please provide an example? > ... BR, Julian
Received on Tuesday, 25 March 2008 13:14:52 UTC