- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 16 Jan 2014 16:11:47 +0100
- To: Nicolas Mailhot <nicolas.mailhot@laposte.net>
- CC: Zhong Yu <zhong.j.yu@gmail.com>, Gabriel Montenegro <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <osamam@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <michael.bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
On 2014-01-16 15:57, Nicolas Mailhot wrote: > ... >>> And it's useless if you can't interpret it reliably. May as well log the >>> output of /dev/random at the time. Don't have time to get humans comb >>> millions of log lines to fix encoding errors. >> >> Define "encoding error" in the context of a URI. > > Any URI that can not be reliably decoded in the textual representation the > URL creator preferred by a random http processor (web site, intermediary, > web client) without outside help. A valid URI is all US-ASCII. There's nothing that needs to be decoded at all. > And there *is* a preferred textual representation because you know, people > do not enter URLs in binary editors. > >>>>> I favour making URLs UTF-8 by default in HTTP/2 (just as it was in >>>>> XML, >>>>> that's one part of the XML spec that worked very well) and require >>>>> http/1 >>>>> to 2 bridges to translate to the canonical form. Helping clients push >>>>> local 8bits encodings will just perpetuate pre-2000 legacy mess. >>>> >>>> How do you translate a URI with unknown URI encoding to UTF-8? >>> >>> You treat it as UTF-8. If it fails UTF-8 sanity rules you fail with an >>> error. That will make people fix their encodings quickly. >> >> This is not going to work: >> >> a) People may have chosen a non-UTF8 encoding by accident (system locale >> etc) and can't change it retroactively, > > They can add an UTF-8 translator gateway at http/2 adoption time. No > different and much easier than the mass of documents that needed to be > fixed once people started exchanging them in binary not dead wood form. And that translator rewrites all URIs that appear in payloads? > Some past mistakes need correction you can't grandfather them eternally at > the cost of eternal future interop problems. The only interop problem I'm aware of is when clients construct *new* URIs, such as is common in WebDAV. The way to fix this is to *advocate* UTF-8. But even if everybody agrees on UTF-8 there's still the NFC/NFD mismatch between OSX and the rest of the world. >> b) There might be actual *binary* data in the URI. > > So just define the canonical binary-to-utf8 mapping. If you don't your URL > will crash as soon as it needs to be displayed in an address bar, network > console or activity log. Again, much easier to define a single > binary-to-utf8 mapping than random encoding to random display encoding > rules (hint: it is not possible and that's the core problem). I still have no clue what problem you are trying to solve. Sorry. > ... >> Hm, no. They just happen to work in a way different from your >> preference, but they do just work fine. > > No, they don't work. Working is not "avoid any automated processing and > for christsakes never use anything but ASCII because the state is > undefined and things will randomly break" The state is fully defined. It's just that you don't like that state. > ... It seems we aren't getting anywhere. Can somebody else help me understanding what this is all about? :-) Best regards, Julian
Received on Thursday, 16 January 2014 15:12:20 UTC