- From: Michael[tm] Smith <mike@w3.org>
- Date: Mon, 10 Nov 2014 10:34:41 +0900
- To: Mark Rogers <mark.rogers@powermapper.com>
- Cc: "www-validator@w3.org" <www-validator@w3.org>
- Message-ID: <20141110013441.GP4173@jay.w3.org>
Hi Mark, Mark Rogers <mark.rogers@powermapper.com>, 2014-11-08 13:59 -0600: > Is the Unicode character U+1F4A9 used in the conformance checker test > suite for URLs really invalid? No, it's valid. Thanks for catching this and taking time report it. > It’s marked as novalid in test suite files like: > > conformance-checkers/html/elements/a/href/userinfo-username-contains-pile-of-poo-novalid.html Yeah, I'll need to fix that. But before I do, I'll wait for a fix to the upstream code of the URL parsing library the validator uses, called galimatias. I've already filed a pull request with a proposed fix: https://github.com/smola/galimatias/pull/46 I expect that'll get fixed relatively soon. > In RFC 3987 this character is listed in the 10000-1FFFD range in the > iuserinfo -> iunreserved -> ucschar production: > > iuserinfo = *( iunreserved / pct-encoded / sub-delims / ":" ) > > iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar > > ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF > / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD > / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD > / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD > / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD > / %xD0000-DFFFD / %xE1000-EFFFD > > In the Whatwg URL standard it’s listed as a valid URL code point, and > will be converted to percent encoding during the normalisation process, > but won’t flag an error. See > https://url.spec.whatwg.org/#url-code-points > https://url.spec.whatwg.org/#authority-state Yup. Your reading of the spec is right. I'd made the mistake of being lazy and having the test suite just follow the (buggy in this particular case) behavior galimatias on this, rather than checking it against the spec. I'll follow up here after I've got it all fixed. --Mike -- Michael[tm] Smith https://people.w3.org/mike
Received on Monday, 10 November 2014 01:34:43 UTC