- From: Phillips, Addison <addison@lab126.com>
- Date: Tue, 26 Jul 2011 18:13:30 -0700
- To: Leif H Silli <xn--mlform-iua@xn--mlform-iua.no>, "duerst@it.aoyama.ac.jp" <duerst@it.aoyama.ac.jp>
- CC: "chris@lookout.net" <chris@lookout.net>, "public-iri@w3.org" <public-iri@w3.org>
> > > >> It is one thing that %FC needs to work (in some sense - like > >> quirks-mode pages also have to work even if it is not valid). But if > >> there is no good necessary usecase for %FC, then we should help > >> authors avoid problems by encourage validators to warn against it use. > > > > There's nothing invalid with %FC. > > My suggestion was that it should *become* invalid/get a warning in - let's say - > HTML5 docs. Making the literal sequence %FC invalid would be a Bad Thing. It would make it impossible to encode certain resources that are otherwise completely valid. > > > A URI that contains %FC is perfectly valid (check RFC 3986). Because it's a > valid URI, it's also a valid IRI. > > But an author which -today- inserts %FC is likely to do a mistake - or at least > make a bad choice, no? An author who inserts u-umlaut and expects to get %FC is making a mistake. An author who inserts %FC and expects to see u-umlaut is making a mistake (or should be). But an author who inserts %FC because that's what her server expects? Valid. And an author who inserts u-umlaut and expects it to display as u-umlaut and send (as %C3%BC in URI form)? Also valid, IMHO. > > > And it's useful in some circumstances. Imagine a server where all the > resource names are encoded in iso-8859-1 (or any other legacy (single-byte) > encoding). What you tell http (or whatever other scheme/protocol) by > using %FC is that you want the resource with the name with the <0xFC> byte in > it. > > How common are such servers these days? They should be really really common, since that's what URI *says* %FC means. > > My focus is authors. And of course it could be the author meant %FC. But might > it not more often be simply a result of a bad %-encoder or on a misconception? > The problem, as I see it, is not with the sequence %FC. It is with the character U+00FC appearing in an HTML document inside a URI path. I tend to think that the interpretation of %FC using page encoding is bad because an IRI (or URI) lacks the necessary context to make that determination. I agree with Boris's earlier message on the list that showing %FC is a bad user experience. But shouldn't we be trying to close on a well-defined set of behaviors that content authors (and others) can understand? I think such an approach would include the behavior described above, even at the expense of some usability. And who looks at those really long URIs full of percent gunk anyway? :-)) Addison
Received on Wednesday, 27 July 2011 01:14:01 UTC