Re: [whatwg/encoding] Add UTF-7 to replacement encoding list? (#68)

> Blink began to use Compact Encoding Detector ( google/compact_enc_det ) when no encoding label is found (http, meta).

Whoa! Do you mean Blink now uses more unspecified heuristics than before even for non-Japanese locales? Why more heuristics? Why without a spec? ಠ_ಠ

> When 7-bit encoding detection is on, it detects ISO-2022-{KR,CN}, HZ-GB AND UTF-7 in addition to ISO-2022-JP. 7-bit encoding detection is ON for ISO-2022-JP, but we want to suppress other 7-bit encodings. I think the best way to 'suppress' (unsupport) them is to turn the whole input to U+FFFD. 

It seems to me that it's bad for Blink to adopt a library that doesn't do what Blink needs to do as a black box and then tweak the output of the black box as opposed to writing code that does what's needed in the first place.

As for the issue of mapping the label "utf-7" to the replacement encoding generally, I think it's a matter of determining if real Web content relies on the label "utf-7" being unrecognized so as to fall back on an ASCII-compatible encoding.

If Blink is willing to experiment with shipping a mapping from "utf-7" to replacement and assessing breakage, that would be cool.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/68#issuecomment-237504955

Received on Thursday, 4 August 2016 12:08:02 UTC