[Last Call] Registration of media type application/sparql-query (fwd) from Dirk-Willem van Gulik on 2006-03-09 (public-rdf-dawg@w3.org from January to March 2006)

From: Dirk-Willem van Gulik <dirkx@webweaving.org>
Date: Thu, 9 Mar 2006 01:43:49 -0800 (PST)
To: public-rdf-dawg@w3.org
Message-ID: <20060309013957.F70758@skutsje.san.webweaving.org>

.. always UTF8 ...

>  Unicode code points may also be expressed using an \uXXXX (U+0 to
>  U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a
>  hexadecimal digit [0-9A-F]

I assume that what is ment here is the use of 7bit safe chars to express
unicode code points. This begs the question:

->	can this be mixed with true utf8 in the same payload.

	-> my advise would be NOT to allow this; think cross
	site scripting for an example of the pain you may get
	into at some point in the future.

->	Is there 'escaping' for the \u and \U sequence itself ?

	And if there is - can this be mixed in utf8 ? And if not
	- how does one know for a fact what mode one is ?

Or on other words:

->	If you really want this - better define it narrower

OR

->	Drop it altogether.

As to give strict parsers in hostile environments a chance.

DW

Received on Thursday, 9 March 2006 09:43:54 UTC