Re: [css3-values] Invalid URI and IRI

On 2012-05-17 12:12, Simon Sapin wrote:
> Hi,
>
>
> There are multiple definitions of what is a valid URL/URI/IRI:
>
> RFC 3986 (the latest on URIs) only uses a subset of ASCII characters.
> Everything else is invalid/illegal, including all characters above U+007F.

...because characters above U+007F are not ASCII characters.

> IRIs (RFC 3897) extend the grammar to allow most non-ASCII Unicode
> characters, and defines an how to turn an IRI into an URI (in short:
> UTF-8 then %-encode)
>
> HTML5 (chapter 2.6: URLs) goes even further and allows all characters
> from U+0 to U+10FFFF although it has a convoluted way of saying it, and
> some string can still be invalid.

The "URL" definition in HTML5 is a moving target. My understanding is 
that the webapps WG is now working on a "URL"" document; and that HTML5 
is going to reference that once it's ready.

Also, my understanding is that HTML5 doesn't make anything valid that is 
invalid as IRI.

> For defining the <url> type, both css21 and css3-values have a reference
> to RFC 3986. Do we really want to be that restrictive? In CSS syntax,
> this declaration parses with a valid URI token. Should the URI inside be
> invalid?
>
> list-style-image: url("Hello <世界>.png");

What do implementations do with it?

> I suggest we relax the syntax and do something like HTML5. Maybe mention
> IRIs and their conversion to URIs.

I recommend to stick with the relevant specs, such as either URI or IRI.

> Wherever the limit for validity ends up at, what should happen to
> invalid URIs? The options I can think of are:
>
> 1. Make the value and thus the declaration/rule invalid. The cascade
> does its usual fallback. Just like only some HASH tokens are valid
> hexadecimal <color> values, only some URI tokens would be valid <url>
> values.
>
> 2. Have them resolve to an invalid URI that always fails to be fetched.
> As with an HTTP 404 error, other fallbacks occur (list-style-type is
> used instead of list-style-image, ...)
>
> 3. Make sure that all Unicode strings are parsable/valid. (I don’t know
> if this is doable *or* a good idea.)

Making something valid which is invalid "even" in HTML5 doesn't seem 
like a good idea to me.

Best regards, Julian

Received on Thursday, 17 May 2012 10:38:28 UTC