- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Fri, 29 Aug 2014 15:32:43 +0900
- To: John Cowan <cowan@mercury.ccil.org>, John C Klensin <john+w3c@jck.com>
- CC: Larry Masinter <masinter@adobe.com>, Richard Ishida <ishida@w3.org>, "Phillips, Addison" <addison@lab126.com>, www-international@w3.org
On 2014/08/29 12:04, John Cowan wrote: > John C Klensin scripsit: > >> it is also noteworthy that the number of web >> browsers, or even web servers, in use is fairly small. By >> contrast, the number of SMTP clients and servers, including >> independently-developed submission clients built into embedded >> devices, is huge and the number of mail user agents even larger. > > Very true. But the number of web pages reduces all the distinct > Internet software programs in the world to nothing at all. It's not so much overall number that counts (otherwise we would have to count the number of emails, too) but the relationship between the number of 'senders' and 'recepients'. The Web is characterized by an extreme number of senders (essentially, each Web page is its own sender), and just very few recipients (browsers). For mail, each sender is also a recipient. This creates very different ecosystems for the standards in those different fields. >> So an instruction from the IETF (or W3C or some other entity) to >> those email systems to abandon the IANA Registry's definitions >> in favor of some other norm would, pragmatically, be likely to >> make things worse rather than better, > > +1 It's not only email systems. There are many libraries that provide encoding conversion. These libraries are used in all kinds of contexts (programming languages, databases, applications,...). These contexts expect the same amount of consistency over time with respect to encoding definitions as Web pages expect from Web browsers. As the creator and sometime maintainer of such a library (the one in the Ruby programming language since v1.9), I can assure everybody that such maintainers will not change what e.g. "US-ASCII" or "iso-8859-1" means very soon, because they would shoot themselves and most of their users in the feet. And not making changes doesn't mean it won't be compatible with the Web. It very much depends on where and how these libraries are used. Being used on a server to generate content will not be a problem, because for any *sane* data, the result will be okay by the "Encoding" spec, too. There's no problem converting non-ASCII data to numeric character references if a page is served as "US-ASCII", and there is no problem serving "iso-8859-1" content without bytes in the C1 range. Even when used to consume Web content, there's no big problem. The average Web spider project doesn't suffer significantly from a few encoding hickups because statistically, the amount of pages that conforms to both IANA and "Encoding" is a large majority. Even long before you get to Google scale, mislabelings probably are a bigger problem than encoding details in otherwise correctly labeled data. On the other hand, changing how a transcoding library works risks to put of lots of users, and create lots of difficult to trace bugs, and so is best avoided. That libraries have to stay on the conservative side is made clear by the fact that Microsoft has difficulties implementing the "Encoding" spec. IE uses a common Windows library for transcoding, and both changing this library and splitting it into separate Web and non-Web versions is highly unattractive to Microsoft. My guess is that Microsoft will just sit things out until UTF-8 is the only thing that counts. So claims such as "Which applications don't want to be compatible with the web?" (implying a single overreaching unification is desirable and possible) ignore the (messy) reality on the ground. >> Sure. But that and scale measured in numbers of deployed >> independent implementations and the difficulties associated with >> changing them, would seem to argue strongly for at least mostly >> changing the web browsers to conform to what is in the IANA >> registry In theory, this is correct. If all browsers agreed to do it, they could do so easily. But it's a prisoner's dilemma type of situation, with more than two players who can ruin things, and there are enough equivalent situations in the HTML5,... area that strongly suggest that going back to the IANA registry won't happen. >> (possibly there are Registry entries that might need >> tuning too --the IETF Charset procedures don't allow that at >> present but, at you point out, they could, at least in >> principle, be changed) You are right that RFC 2978 (http://tools.ietf.org/html/rfc2978) doesn't mention any procedure for updating registrations. But that hasn't made updates impossible. Updates can be and have been made in analogy to new registrations (think reregistration). But they need more backing than "Web browsers do it this way, so that's what everybody else has to do, too". So the problem is not one of process, but one of "rough consensus and running code". And there's lots of running code on the IANA side, too. Regards, Martin. >> rather than trying to retune the Internet >> to match what a handful of browser vendors are doing. > > Both are hopeless efforts, and each group must maintain its own standards. >
Received on Friday, 29 August 2014 06:33:24 UTC