Re: [whatwg/encoding] Allow other encodings (#207)

Going along with your line of reasoning for the sake of argument, I have to make a couple of remarks. I'll alternate between prescriptive and descriptive points of view. First of all, I can understand the desire to make browsers behave more uniformly, but in this particular case I think you may be taking things one step too far. Not every aspect can be 100% identical, not every aspect is 100% identical and not every aspect will be 100% identical. In case of IBM437, some browsers support it (Mobile: Edge; Desktop: Chromium-based browsers on Linux, Safari, the old Edge and Internet Explorer) whereas some don't (Mobile: all browsers I checked on iOS and Android; Desktop: Chromium-based browsers on Windows (including the new Edge) and Firefox). Secondly, while the spec does say that other encodings should not be supported by user agents, it does not (to the best of my knowledge) say how this non-support should work out in practice. The purest solution would be to display an error message instead of the requested page, but that may be too draconic. At the very least I would expect a warning message saying the page may not be displayed correctly, but none of the browsers I checked exhibit this behavior. Finally, the fact that there is a list of supported legacy encodings at all must be some kind of compromise. I could understand (but not fully agree with) a decision to support UTF-8 only, but instead there is a pretty long list of encodings that frankly, seems (and objectively speaking probably is) rather arbitrary. So even if I were to accept your premise that different browsers supporting different encodings is somehow harmful (which I don't), I'm not seeing that idea having been worked out in a consistent manner in the current specification.

Because I get the feeling that I'm not going to convince you to go with any of my three suggestions, alternatively I'd like to ask you to add IBM437 to the list of supported encodings. This is because this encoding is very prevalent among text files originating from the early eighties to about the mid-nineties and sees some usage in HTML documents as well. Many users, including me, also use their browsers to view text files they find online and I would rather not suggest that browsers support IBM437 for text files but not for HTML files. Besides, the ability to view IBM437-encoded HTML files is useful in itself.

I expect my claim that IBM437 used to be a common (if not the most common) encoding for text files to be uncontroversial, but I realize that some might state that its use in HTML files is very rare. I do encounter IBM437-encoded HTML files once in a while and tried to look for objective research to prove my point. Unfortunately I did not find it, although I didn't find any refutation either. The closest I came to objective numbers relates to usenet and email:

http://quetzalcoatal.blogspot.com/2014/03/understanding-email-charsets.html

I'm not advocating adding any other encodings along with IBM437, but (hypothetically) if you were to say "If we include IBM437 we have to include IBM850 too, or else the spec will become inconsistent" or something along those lines, I would not oppose to it either. I only want to make it clear that I'm not advocating adding a sundry list of encodings to the spec, just IBM437.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/207#issuecomment-618080351

Received on Wednesday, 22 April 2020 22:52:24 UTC