Re: [whatwg/encoding] Amount of bytes to sniff for encoding detection (#102)

I'd like some more detail on this detector. I assume the goal is to avoid reloading, so would we essentially stop the parser if we hit non-ASCII in the first 4k and then wait for the 4k or end-of-file to determine the encoding for the remainder of the bytes?

Does that also mean we'd never detect ISO-2022-JP or UTF-16, which are ASCII-incompatible?

Does it only run as a last resort?

I also know that Henri has been exploring alternative strategies for locale-specific fallback. Rather than using the user's locale, use the locale inferred from the domain's TLD. It would be interesting to know how that contrasts with a detector.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/102#issuecomment-302633363

Received on Friday, 19 May 2017 07:45:09 UTC