- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 12 Apr 2023 10:14:15 -0600
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: Steven Pemberton <steven.pemberton@cwi.nl>, public-ixml@w3.org
Er, if UTF-8 files created on Windows will very often have BOMs, then it's presumably not enough to have BOM ignored at the beginning of a grammar; it will also be necessary to ignore it at the beginning of every input grammar as well. I would rather find some way of saying that this is handled by lower-level systems and is invisible to ixml -- a bit like whether the machine is big- or little-endian. If an implementation's I/O routines don't handle BOMs, then surely an implementor can work around that with an ad hoc routine when opening a stream? Presumably I'm missing something. What is it? Michael Norm Tovey-Walsh <norm@saxonica.com> writes: > [[PGP Signed Part:Undecided]] >> We could change the ixml grammar to start: >> >> ixml: BOM?, s, prolog?, rule++RS, s. >> -BOM: -#FEFF. >> >> so that the processor doesn't complain about them, but I'm less sure >> what to do about input. > > I think we should say that if the grammar is in UTF-8 and begins with a > BOM, the BOM must be ignored by the processor. I don’t see any reason to > surface this wart in the grammar. > >> My current feeling is we should warn users that if their inputs are >> likely to start with a BOM to add them to the grammar, and that we >> don't automatically ignore them. > > If I understood the Slack discussion, it’s very hard to tell Windows > *not* to put the BOM on the front of UTF-8 files, so anyone using > Windows is going to have this problem. That means everyone who writes a > grammar is going to end up putting the “ignore BOM” wart on the front of > it. That strikes me as even worse than putting it in our grammar. > > The only reservation I have about saying a processor must ignore the BOM > on inputs is that there’s nothing preventing someone from writing a > grammar to parse binary inputs where that sequence isn’t a BOM. > > But that seems like something that’s only going to effect the tiniest > minority of users, unlike the BOM thing which becomes everyone’s problem > as soon as iXML has enough regular users on Windows. > > Be seeing you, > norm -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Wednesday, 12 April 2023 16:21:51 UTC