W3C home > Mailing lists > Public > www-validator@w3.org > November 2014

Unicode Character 'PILE OF POO' (U+1F4A9) and validator test suite

From: Mark Rogers <mark.rogers@powermapper.com>
Date: Sat, 8 Nov 2014 13:59:34 -0600
To: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <1F68EA0E0CBFBE44A9A64274E1AC01A122F74256A7@DFW1MBX23.mex07a.mlsrvr.com>
Hi

Is the Unicode character U+1F4A9 used in the conformance checker test suite for URLs really invalid? It’s marked as novalid in test suite files like:

conformance-checkers/html/elements/a/href/userinfo-username-contains-pile-of-poo-novalid.html

In RFC 3987 this character is listed in the 10000-1FFFD  range in the iuserinfo  -> iunreserved -> ucschar production:

iuserinfo      = *( iunreserved / pct-encoded / sub-delims / ":" )

iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

   ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

In the Whatwg URL standard it’s listed as a valid URL code point, and will be converted to percent encoding during the normalisation process, but won’t flag an error. See
https://url.spec.whatwg.org/#url-code-points

https://url.spec.whatwg.org/#authority-state


Best Regards
Mark

Mark Rogers - mark.rogers@powermapper.com<mailto:mark.rogers@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com<http://www.powermapper.com>
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL


Received on Saturday, 8 November 2014 19:59:45 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 13 September 2016 06:30:31 UTC