On Wed, 6 Dec 2006, Eric J. Bowman wrote: > The case in point is Macromedia HomeSite, which is still widely used by > working web developers but is not Unicode compliant. Opening and saving XML > documents in HomeSite will lead to multiple BOMs -- the first one may be > standards-compliant but the rest are unsightly! "Multiple BOMs" is not an error, and doesn't even exist. The character U+FEFF is to be interpreted as BOM only at the start of a file or data stream. Otherwise, it has the semantics suggested by its Unicode name, ZERO-WIDTH NO-BREAK SPACE. Such usage is not recommended in the standard; we are supposed to use U+2060 WORD JOINER instead. (Here on Earth, however, U+FEFF seems to be better supported than U+2060.) Yet, such usage is standards-conforming, and conforming software must not simply remove "the second BOM" when it gets data that starts with U+FEFF U+FEFF. (It may make an informed decision to ignore the latter code point but only because it decides to ignore a leading zero-width no-break space.) Of course, generating several U+FEFF at the start of a file is a bad idea and may confuse software that purports to support Unicode but doesn't. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/Received on Wednesday, 6 December 2006 15:34:08 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 20 September 2007 14:34:22 GMT