Re: [Last Call] Registration of media type application/sparql-query (fwd) from Eric Prud'hommeaux on 2006-04-21 (public-rdf-dawg@w3.org from April to June 2006)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 21 Apr 2006 12:51:57 -0400
To: Dirk-Willem van Gulik <dirkx@webweaving.org>
Cc: public-rdf-dawg@w3.org
Message-ID: <20060421165157.GI26709@w3.org>

On Thu, Mar 09, 2006 at 01:43:49AM -0800, Dirk-Willem van Gulik wrote:
> 
> 
> .. always UTF8 ...
> 
> >  Unicode code points may also be expressed using an \uXXXX (U+0 to
> >  U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a
> >  hexadecimal digit [0-9A-F]
> 
> I assume that what is ment here is the use of 7bit safe chars to express
> unicode code points. This begs the question:
> 
> ->	can this be mixed with true utf8 in the same payload.
> 
> 	-> my advise would be NOT to allow this; think cross
> 	site scripting for an example of the pain you may get
> 	into at some point in the future.

I think it is safer to allow than not to allow. Not allowing it would
mean striking "always UTF8" and including text to say that if one
happened to have encoded one's entire query in 7 bits, that it may no
longer be in the same payload as other unicode (including ASCII?).

There currently is no specification for 7 bit SPARQL. One could escape
it to the point that there are no wide characters over the wire, but
it is still utf-8. If you then send it over a 7 bit wire, we don't
specify how to do so.

I'm afraid I don't see the mechanics of how this enables cross-site
scripting (any more than any other wide char format). Was this a
hunch, or a worked out screw case.

> ->	Is there 'escaping' for the \u and \U sequence itself ?
> 
> 	And if there is - can this be mixed in utf8 ? And if not
> 	- how does one know for a fact what mode one is ?

I believe the current text impies one level of escaping. I believe
that, given an application/sparql-query media type
[[
ASK { ?s ?p "\u005Cu0041" }
]]
the interpretation of the object is '\\'+'u'+'0'+'0'+'4'+'1'. The
query
[[
ASK { ?s ?p "\u0041" }
]]
is valid application/sparql-query , but is not the same query, and
the no intermediate processor should treat it as such.

> Or on other words:
> 
> ->	If you really want this - better define it narrower

I think we really do want escaping. It will make life better for a lot
of folks who can't edit chinese and russian and ... directly in their
editor. Do you have some specific text in mind?

In general, I think escaping requires implementation precision and
that there's no way to avoid it. One could rule \\ out of quoting,
but it would not keep one from constructing things that looked like
non-equivilent application/sparql-query . For instance,

ASK { ?query foo:serialization "ASK { ?s ?p \"\u005Cu0041" }\" }" }
or
ASK { ?query foo:serialization "ASK { ?s ?p \u0022\u005Cu0041" }\u0022 }" }

> OR
> 
> ->	Drop it altogether.
> 
> As to give strict parsers in hostile environments a chance.

Did ietf-types ever see this?
I'm about to ask for the media type (through other channels) and want
to make sure they don't know of some dissent that I have noticed.

Do you think this is well-enough specified that we can ask IETF for
our media-type?
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Friday, 21 April 2006 16:52:07 UTC