Re: How browsers display IRI's with mixed encodings

On 2011/07/22 9:30, Phillips, Addison wrote:

> In IRI terms, there are characters and there are "random octets". When mapping to URI, percent encoding is applied to both. However, the UTF-8 sequences can be decoded back to characters. The random octets not so much.

> In other words, leaving aside the query part for a moment, shouldn't IRI really say that valid UTF-8 sequences are interpreted as characters and invalid UTF-8 sequences are treated as bytes?

Yes, it should. And RFC 3987 already does.


> Looking at your test page, I'm not sure how valid a test it is. The page declares an encoding of ISO 8859-1. Having a "UTF-8 encoded path" in the page is a lie. Those bytes are all valid windows-1252 characters (per HTML5, nearly all browsers treat ISO8859-1 as windows-1252). So the path isn't actually "UTF-8 encoded". To me the test looks broken.

A test isn't broken if it tests weird coincidences. It may be that the 
description can be improved, though.

Regards,    Martin.

Received on Monday, 25 July 2011 10:59:38 UTC