Re: BOMs

> If an implementation's I/O routines don't handle BOMs, then surely an
> implementor can work around that with an ad hoc routine when opening a
> stream?
>
> Presumably I'm missing something.  What is it?

My impression is that I/O subsystems consume the BOM on UTF-16+ systems
because they need it to work out the byte order. They don’t need it on
UTF-8, and it’s discouraged on UTF-8, so they ignore it and pass it
through.

I think ignoring the BOM on input grammars is perfectly reasonble and we
should say that.

Ignore the BOM on input documents is a little harder because what if I
am parsing a non-text document that happens to begin #FEFF. But I’d let
that be up to the discretion of the implementor because I think it’s
very unlikely.

My plan is an option on CoffeePot to ignore the BOM if the input file is
UTF-8 with the default set to “true”.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Wednesday, 12 April 2023 18:13:58 UTC