Re: BOMs

> We could change the ixml grammar to start:
>
>  ixml: BOM?, s, prolog?, rule++RS, s.
>  -BOM: -#FEFF.
>
> so that the processor doesn't complain about them, but I'm less sure
> what to do about input.

I think we should say that if the grammar is in UTF-8 and begins with a
BOM, the BOM must be ignored by the processor. I don’t see any reason to
surface this wart in the grammar.

> My current feeling is we should warn users that if their inputs are
> likely to start with a BOM to add them to the grammar, and that we
> don't automatically ignore them.

If I understood the Slack discussion, it’s very hard to tell Windows
*not* to put the BOM on the front of UTF-8 files, so anyone using
Windows is going to have this problem. That means everyone who writes a
grammar is going to end up putting the “ignore BOM” wart on the front of
it. That strikes me as even worse than putting it in our grammar.

The only reservation I have about saying a processor must ignore the BOM
on inputs is that there’s nothing preventing someone from writing a
grammar to parse binary inputs where that sequence isn’t a BOM.

But that seems like something that’s only going to effect the tiniest
minority of users, unlike the BOM thing which becomes everyone’s problem
as soon as iXML has enough regular users on Windows.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Wednesday, 12 April 2023 10:48:07 UTC