RE: XHTML validator doesn't completely support Unicode from Peter Sheerin on 2001-04-29 (www-validator@w3.org from April 2001)

From: Peter Sheerin <psheerin@cmp.com>
Date: Sun, 29 Apr 2001 19:32:22 -0400 (EDT)
To: "'Bertilo Wennergren'" <bertilow@bertilo.se.fm>, <www-validator@w3.org>
Message-ID: <002701c0d102$28b8bfb0$8810960a@cadencesheerin>

 
> Are you sure it's the end of line characters that give the problem?
> 
> I'd guess it's the BOM ("U+FEFF") that's the culprit. It's 
> not very common to use a BOM in UTF-8 files. Some even say 
> it's not allowed in UTF-8. It's certainly not necessary to 
> use a BOM in UTF-8.

Actually, I do find it necessary, and it exists for a very good reason.

Without it at the beginning of a file, an editor has no sure way of
determining what format the text file is. In practical testing with SC
UniPad, I have found that the use of either the BOM or EOL character is
the only sure way for the editor to open the file as UTF-8. Their
absence causes the editor to assume plain ASCII-UCN, which causes real
problems if you don't notice and correct it.

Received on Sunday, 29 April 2001 21:21:23 UTC