Re: BOMs from Norm Tovey-Walsh on 2023-04-12 (public-ixml@w3.org from April 2023)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Wed, 12 Apr 2023 19:11:00 +0100
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: Steven Pemberton <steven.pemberton@cwi.nl>, public-ixml@w3.org
Message-ID: <m2v8i17yue.fsf@saxonica.com>

> If an implementation's I/O routines don't handle BOMs, then surely an
> implementor can work around that with an ad hoc routine when opening a
> stream?
>
> Presumably I'm missing something.  What is it?

My impression is that I/O subsystems consume the BOM on UTF-16+ systems
because they need it to work out the byte order. They don’t need it on
UTF-8, and it’s discouraged on UTF-8, so they ignore it and pass it
through.

I think ignoring the BOM on input grammars is perfectly reasonble and we
should say that.

Ignore the BOM on input documents is a little harder because what if I
am parsing a non-text document that happens to begin #FEFF. But I’d let
that be up to the discretion of the implementor because I think it’s
very unlikely.

My plan is an option on CoffeePot to ignore the BOM if the input file is
UTF-8 with the default set to “true”.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Wednesday, 12 April 2023 18:13:58 UTC