Re: [whatwg/encoding] Allow other encodings (#207)

> How "page loading" works is defined by HTML and it does go into your questions to some extent, though not as far as I would like. The Encoding Standard primarily concerns itself with mapping labels to encodings and byte sequences to scalar values (and vice versa).
> Looking at https://en.wikipedia.org/wiki/Code_page_437#Character_set there's a good reason to never support that: https://encoding.spec.whatwg.org/#security-background.

Assuming you are talking about the control character glyphs, I agree that it would be wise to explicitly define this: I support ignoring the glyphs and interpreting all code points under 128 as ASCII.

> Do these text files declare IBM437? If not, how would viewing English text files that use the non-ASCII parts of IBM437 only for box drawing work in browsers that don't have a character encoding menu? (Presumably, an encoding detector would want to err on the side of the box drawing bytes being non-ASCII letters from windows-125x, since those cases are for more common on the Web.)

Generally, an HTTP server will include headers indicating the use of IBM437 if it is configured correctly. Browsers that respect those headers will show the box drawing characters correctly.

> I understand that there is certain historical appeal to e.g. textfiles.com serving the original bytes instead of converting to UTF-8 on the server side. However, all files that I've encountered there that use the non-ASCII parts of IBM437 use parts that are the same in IBM866, so they'd work in browsers if declared as IBM866.

Well, I suppose you could pretend a file is encoded as IBM866 if it is encoded as IBM437 but does not use any of the code points in which the two differ, but that seems rather hackish to me. I can also imagine the use of diacritics and/or Greek letters in IBM437-encoded texts along with the box drawing characters, although I don't have an immediate example in front of me. I also note that there are multiple variations of IBM866, so without looking into the details of the version you selected I can't say for certain if it contains all mathematical symbols included in IBM437.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/207#issuecomment-618364420

Received on Thursday, 23 April 2020 12:08:20 UTC