- From: Simon Sapin <simon.sapin@kozea.fr>
- Date: Thu, 17 May 2012 16:26:34 +0200
- To: Julian Reschke <julian.reschke@gmx.de>
- CC: www-style list <www-style@w3.org>
Le 17/05/2012 12:37, Julian Reschke a écrit :
> On 2012-05-17 12:12, Simon Sapin wrote:
>> RFC 3986 (the latest on URIs) only uses a subset of ASCII characters.
>> Everything else is invalid/illegal, including all characters above U+007F.
>
> ...because characters above U+007F are not ASCII characters.
Yes of course. I only wanted to point out that it is easy for authors to
write something that is not valid according to RFC 3986.
> Also, my understanding is that HTML5 doesn't make anything valid that is
> invalid as IRI.
If I read correctly, compared to URIs, IRIs "only" add non-private
non-ASCII codepoints to the list of unreserved characters. In HTML5 on
the other hand, every codepoint that is not reserved or '%' is
unreserved. For example '>' is valid in the later but not in the former.
http://tools.ietf.org/html/rfc3987#section-2.2
http://www.w3.org/TR/html5/urls.html#parsing-urls
>> For defining the<url> type, both css21 and css3-values have a reference
>> to RFC 3986. Do we really want to be that restrictive? In CSS syntax,
>> this declaration parses with a valid URI token. Should the URI inside be
>> invalid?
>>
>> list-style-image: url("Hello<世界>.png");
>
> What do implementations do with it?
Simpler test case:
http://dabblet.com/gist/2719200
<div style="background: url('>é')">
In both Firefox 12 and Chrome 18, an HTTP request is sent to %3E%C3%A9
That is, a string that is invalid in RFC 3987 ('>' is not allowed) is
accepted as valid and handled according to RFC 3987 (UTF-8 then %-encoding)
Other test case: url('%é') is turned to %%C3%A9 without a warning by the
browser, but is refused by the server with 400 Bad Request. (Both forms
are invalid, either as an IRI or URI.)
>> I suggest we relax the syntax and do something like HTML5. Maybe mention
>> IRIs and their conversion to URIs.
>
> I recommend to stick with the relevant specs, such as either URI or IRI.
I’m fine with that, as long as it is explicit in a spec.
>> 3. Make sure that all Unicode strings are parsable/valid. (I don’t know
>> if this is doable *or* a good idea.)
>
> Making something valid which is invalid "even" in HTML5 doesn't seem
> like a good idea to me.
Yes indeed.
--
Simon Sapin
Received on Thursday, 17 May 2012 14:27:10 UTC