Re: BOMs

Er, if UTF-8 files created on Windows will very often have BOMs, then
it's presumably not enough to have BOM ignored at the beginning of a
grammar; it will also be necessary to ignore it at the beginning of
every input grammar as well.

I would rather find some way of saying that this is handled by
lower-level systems and is invisible to ixml -- a bit like whether the
machine is big- or little-endian.

If an implementation's I/O routines don't handle BOMs, then surely an
implementor can work around that with an ad hoc routine when opening a
stream?

Presumably I'm missing something.  What is it?

Michael


Norm Tovey-Walsh <norm@saxonica.com> writes:

> [[PGP Signed Part:Undecided]]
>> We could change the ixml grammar to start:
>>
>>  ixml: BOM?, s, prolog?, rule++RS, s.
>>  -BOM: -#FEFF.
>>
>> so that the processor doesn't complain about them, but I'm less sure
>> what to do about input.
>
> I think we should say that if the grammar is in UTF-8 and begins with a
> BOM, the BOM must be ignored by the processor. I don’t see any reason to
> surface this wart in the grammar.
>
>> My current feeling is we should warn users that if their inputs are
>> likely to start with a BOM to add them to the grammar, and that we
>> don't automatically ignore them.
>
> If I understood the Slack discussion, it’s very hard to tell Windows
> *not* to put the BOM on the front of UTF-8 files, so anyone using
> Windows is going to have this problem. That means everyone who writes a
> grammar is going to end up putting the “ignore BOM” wart on the front of
> it. That strikes me as even worse than putting it in our grammar.
>
> The only reservation I have about saying a processor must ignore the BOM
> on inputs is that there’s nothing preventing someone from writing a
> grammar to parse binary inputs where that sequence isn’t a BOM.
>
> But that seems like something that’s only going to effect the tiniest
> minority of users, unlike the BOM thing which becomes everyone’s problem
> as soon as iXML has enough regular users on Windows.
>
>                                         Be seeing you,
>                                           norm


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Wednesday, 12 April 2023 16:21:51 UTC