Re: [css3-values] Invalid URI and IRI from Simon Sapin on 2012-05-17 (www-style@w3.org from May 2012)

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Thu, 17 May 2012 16:26:34 +0200
To: Julian Reschke <julian.reschke@gmx.de>
CC: www-style list <www-style@w3.org>
Message-ID: <4FB50A9A.6000803@kozea.fr>

Le 17/05/2012 12:37, Julian Reschke a écrit :
> On 2012-05-17 12:12, Simon Sapin wrote:
>> RFC 3986 (the latest on URIs) only uses a subset of ASCII characters.
>> Everything else is invalid/illegal, including all characters above U+007F.
>
> ...because characters above U+007F are not ASCII characters.

Yes of course. I only wanted to point out that it is easy for authors to 
write something that is not valid according to RFC 3986.

> Also, my understanding is that HTML5 doesn't make anything valid that is
> invalid as IRI.

If I read correctly, compared to URIs, IRIs "only" add non-private 
non-ASCII codepoints to the list of unreserved characters. In HTML5 on 
the other hand, every codepoint that is not reserved or '%' is 
unreserved. For example '>' is valid in the later but not in the former.

http://tools.ietf.org/html/rfc3987#section-2.2
http://www.w3.org/TR/html5/urls.html#parsing-urls

>> For defining the<url>  type, both css21 and css3-values have a reference
>> to RFC 3986. Do we really want to be that restrictive? In CSS syntax,
>> this declaration parses with a valid URI token. Should the URI inside be
>> invalid?
>>
>> list-style-image: url("Hello<世界>.png");
>
> What do implementations do with it?

Simpler test case:

http://dabblet.com/gist/2719200
<div style="background: url('>é')">

In both Firefox 12 and Chrome 18, an HTTP request is sent to %3E%C3%A9
That is, a string that is invalid in RFC 3987 ('>' is not allowed) is 
accepted as valid and handled according to RFC 3987 (UTF-8 then %-encoding)

Other test case: url('%é') is turned to %%C3%A9 without a warning by the 
browser, but is refused by the server with 400 Bad Request. (Both forms 
are invalid, either as an IRI or URI.)

>> I suggest we relax the syntax and do something like HTML5. Maybe mention
>> IRIs and their conversion to URIs.
>
> I recommend to stick with the relevant specs, such as either URI or IRI.

I’m fine with that, as long as it is explicit in a spec.

>> 3. Make sure that all Unicode strings are parsable/valid. (I don’t know
>> if this is doable *or* a good idea.)
>
> Making something valid which is invalid "even" in HTML5 doesn't seem
> like a good idea to me.

Yes indeed.

-- 
Simon Sapin

Received on Thursday, 17 May 2012 14:27:10 UTC