W3C home > Mailing lists > Public > www-style@w3.org > May 2012

[css3-values] Invalid URI and IRI

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Thu, 17 May 2012 12:12:21 +0200
Message-ID: <4FB4CF05.3070101@kozea.fr>
To: www-style list <www-style@w3.org>

There are multiple definitions of what is a valid URL/URI/IRI:

RFC 3986 (the latest on URIs) only uses a subset of ASCII characters. 
Everything else is invalid/illegal, including all characters above U+007F.

IRIs (RFC 3897) extend the grammar to allow most non-ASCII Unicode 
characters, and defines an how to turn an IRI into an URI (in short: 
UTF-8 then %-encode)

HTML5 (chapter 2.6: URLs) goes even further and allows all characters 
from U+0 to U+10FFFF although it has a convoluted way of saying it, and 
some string can still be invalid.

For defining the <url> type, both css21 and css3-values have a reference 
to RFC 3986. Do we really want to be that restrictive? In CSS syntax, 
this declaration parses with a valid URI token. Should the URI inside be 

list-style-image: url("Hello <世界>.png");

I suggest we relax the syntax and do something like HTML5. Maybe mention 
IRIs and their conversion to URIs.

Wherever the limit for validity ends up at, what should happen to 
invalid URIs? The options I can think of are:

1. Make the value and thus the declaration/rule invalid. The cascade 
does its usual fallback. Just like only some HASH tokens are valid 
hexadecimal <color> values, only some URI tokens would be valid <url> 

2. Have them resolve to an invalid URI that always fails to be fetched. 
As with an HTTP 404 error, other fallbacks occur (list-style-type is 
used instead of list-style-image, ...)

3. Make sure that all Unicode strings are parsable/valid. (I don’t know 
if this is doable *or* a good idea.)

Simon Sapin
Received on Thursday, 17 May 2012 10:12:57 UTC

This archive was generated by hypermail 2.4.0 : Monday, 23 January 2023 02:14:15 UTC