- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 18 Nov 2013 13:41:53 -0500
- To: Pete Cordell <petejson@codalogic.com>
- Cc: Tim Bray <tbray@textuality.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org
Pete Cordell scripsit: > Do you mean that the presence of a UTF-8 BOF sequence doesn't prove > that it's not Windows cp-1252 or do you mean you can tell apart a > UTF-8 and cp-1252 file without BOMs? I meant the latter, but the former is true, too. A plain text document beginning "" in Windows-1252 will appear to begin with an 8-BOM in the absence of out of band information. > If the latter, do the relevant tools take the time to distinguish > the 2 without BOMs? Some tools do, some don't. The IRC client I use, XChat, attempts to convert input as UTF-8, and if that fails, converts it as Latin-1. I have not yet seen it produce mojibake. -- John Cowan cowan@ccil.org http://www.ccil.org/~cowan Most languages are dramatically underdescribed, and at least one is dramatically overdescribed. Still other languages are simultaneously overdescribed and underdescribed. Welsh pertains to the third category. --Alan King
Received on Monday, 18 November 2013 18:42:29 UTC