- From: John Lumley <john@saxonica.com>
- Date: Wed, 12 Apr 2023 17:26:06 +0100
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: Norm Tovey-Walsh <norm@saxonica.com>, Steven Pemberton <steven.pemberton@cwi.nl>, public-ixml@w3.org
Given that my browser-based processor handled the BOM, in both input and grammar files, with no special cases, perhaps it is part of the ‘implementation framework/platform/environment’ responsibility. John Lumley Sent from my iPad > On 12 Apr 2023, at 17:21, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > Er, if UTF-8 files created on Windows will very often have BOMs, then > it's presumably not enough to have BOM ignored at the beginning of a > grammar; it will also be necessary to ignore it at the beginning of > every input grammar as well. > > I would rather find some way of saying that this is handled by > lower-level systems and is invisible to ixml -- a bit like whether the > machine is big- or little-endian. > > If an implementation's I/O routines don't handle BOMs, then surely an > implementor can work around that with an ad hoc routine when opening a > stream? > > Presumably I'm missing something. What is it? > > Michael > > > Norm Tovey-Walsh <norm@saxonica.com> writes: > >> [[PGP Signed Part:Undecided]] >>> We could change the ixml grammar to start: >>> >>> ixml: BOM?, s, prolog?, rule++RS, s. >>> -BOM: -#FEFF. >>> >>> so that the processor doesn't complain about them, but I'm less sure >>> what to do about input. >> >> I think we should say that if the grammar is in UTF-8 and begins with a >> BOM, the BOM must be ignored by the processor. I don’t see any reason to >> surface this wart in the grammar. >> >>> My current feeling is we should warn users that if their inputs are >>> likely to start with a BOM to add them to the grammar, and that we >>> don't automatically ignore them. >> >> If I understood the Slack discussion, it’s very hard to tell Windows >> *not* to put the BOM on the front of UTF-8 files, so anyone using >> Windows is going to have this problem. That means everyone who writes a >> grammar is going to end up putting the “ignore BOM” wart on the front of >> it. That strikes me as even worse than putting it in our grammar. >> >> The only reservation I have about saying a processor must ignore the BOM >> on inputs is that there’s nothing preventing someone from writing a >> grammar to parse binary inputs where that sequence isn’t a BOM. >> >> But that seems like something that’s only going to effect the tiniest >> minority of users, unlike the BOM thing which becomes everyone’s problem >> as soon as iXML has enough regular users on Windows. >> >> Be seeing you, >> norm > > > -- > C. M. Sperberg-McQueen > Black Mesa Technologies LLC > http://blackmesatech.com >
Received on Wednesday, 12 April 2023 16:26:16 UTC