- From: Dirk-Willem van Gulik <dirkx@webweaving.org>
- Date: Thu, 9 Mar 2006 01:43:49 -0800 (PST)
- To: public-rdf-dawg@w3.org
.. always UTF8 ... > Unicode code points may also be expressed using an \uXXXX (U+0 to > U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a > hexadecimal digit [0-9A-F] I assume that what is ment here is the use of 7bit safe chars to express unicode code points. This begs the question: -> can this be mixed with true utf8 in the same payload. -> my advise would be NOT to allow this; think cross site scripting for an example of the pain you may get into at some point in the future. -> Is there 'escaping' for the \u and \U sequence itself ? And if there is - can this be mixed in utf8 ? And if not - how does one know for a fact what mode one is ? Or on other words: -> If you really want this - better define it narrower OR -> Drop it altogether. As to give strict parsers in hostile environments a chance. DW
Received on Thursday, 9 March 2006 09:43:54 UTC