- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Jul 2011 19:58:14 +0900
- To: "Phillips, Addison" <addison@lab126.com>
- CC: Chris Weber <chris@lookout.net>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
On 2011/07/22 9:30, Phillips, Addison wrote: > In IRI terms, there are characters and there are "random octets". When mapping to URI, percent encoding is applied to both. However, the UTF-8 sequences can be decoded back to characters. The random octets not so much. > In other words, leaving aside the query part for a moment, shouldn't IRI really say that valid UTF-8 sequences are interpreted as characters and invalid UTF-8 sequences are treated as bytes? Yes, it should. And RFC 3987 already does. > Looking at your test page, I'm not sure how valid a test it is. The page declares an encoding of ISO 8859-1. Having a "UTF-8 encoded path" in the page is a lie. Those bytes are all valid windows-1252 characters (per HTML5, nearly all browsers treat ISO8859-1 as windows-1252). So the path isn't actually "UTF-8 encoded". To me the test looks broken. A test isn't broken if it tests weird coincidences. It may be that the description can be improved, though. Regards, Martin.
Received on Monday, 25 July 2011 10:59:38 UTC