- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 25 Mar 2008 14:01:25 +0100
- To: ietf-http-wg@w3.org
Julian Reschke wrote: >> If there is a chance that these values have to be displayed in >> HTML pages or used in XML files the NCR form &#xnnnnnn; might >> work "as is", for \u'nnnnnn' something needs to determine a >> corresponding UTF-16, hex. NCR, or UTF-8. > Not sure I understand this. > 1) Even if you want to use a value in HTML or XML, you will > need to decode first, then re-encode, otherwise you'll end up > with something like "&xnnnnnnn;". Not for "work as is", where decoding hex. NCRs is the job of a browser, or in the XML case unnecessary. If you want something better than "as is" for various Unicode security considerations both notations are fine. To protect encodings both forms allow this, your proposal & is okay for HTML and XML, maybe the & in RFC 5137 is more general. For the \u form RFC 5137 mentions \\u as protection, in essence that is "double all backslashes" (for a shell prompt when I had to do this "manually" it made me nervous... ;-) In both cases you'd have to explain what you want and how this works, for NCRs that might be simpler (YMMV). > the only difference between the two formats (BCP137, 5.1 and > 5.2) is how they are embedded. Yes, both specify Unicode points with hex. digits minus leading zeros, \u uses 4..6 digits, NCRs use 2..6 digits. Both forms have a clear trailing terminator as recommended in Charmod. Mark Nottingham wrote: | BCP137 does note the ugliness factor WRT NCRs. Yes, a matter of taste, in that case John's taste. Pick what you like better, for implementors both forms are trivial if the protection is clear (if it is at all needed, I'm not sure when). Frank
Received on Tuesday, 25 March 2008 13:00:02 UTC