BOM in plain text data transfer from Karl Tomlinson on 2010-06-14 (www-math@w3.org from June 2010)

From: Karl Tomlinson <w3@karlt.net>
Date: Mon, 14 Jun 2010 22:31:39 +1200
To: www-math@w3.org
Message-ID: <87r5k9ucqc.fsf@karlt.net>

I was a bit surprised to see the recommendation "the BOM SHOULD be
omitted for Unicode text encoded as UTF-16" for inter-application
data transfer at
http://www.w3.org/TR/MathML3/chapter6.html#world-int-transf-flavors

I can understand the spec giving recommendation re the content of
the transfer, but I tend to think of the BOM marker more as
packaging for the content, and IMO it doesn't seem right for the
spec to specify how plain text should be transferred.

Is the assumption here that the platform will have some other
mechanism, such as a charset specification, to indicate the byte
order?

I infer from the surrounding context that the motivation for this
recommendation is legacy applications.

However, I thought of the BOM more as an indication of the
encoding used in the data transfer so that an application can
decode the binary data (converting to its own internal encoding)
to extract the content.

I should confess that I haven't researched the conventions here,
but thought it was easier to raise the issue now than later to
check that this is going to be practical.

If the situation that led to this recommendation is only relevant
on platforms where the flavor name already indicates the encoding,
then maybe this recommendation should be restricted to such
platforms.

Thanks,
Karl.

Received on Monday, 14 June 2010 10:32:17 UTC