- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Jun 2007 17:54:53 +0900
- To: olivier Thereaux <ot@w3.org>
- Cc: www-validator@w3.org
At 14:18 07/06/25, olivier Thereaux wrote: >Hi Martin, > >Thanks for looking into the regexp, and especially for spotting one >of my mistakes. Much appreciated. > >On Jun 23, 2007, at 13:31 , Martin Duerst wrote: >> Strictly speaking, an XML declaration can go over more than one line. >> It has to start with an '<' as the very first character of the file, >> but then it can include linebreaks. > >Indeed. Also as Ville noted, there could be a BOM at the beginning too. I'm a bit sceptical here. The BOM is part of guessing the encoding family, but the regexp is ASCII-based, so it comes after guessing the encoding family. At least in theory, at one point, there was some code that would have allowed to also validate EBCDIC-based stuff, or UTF-16 or UTF-32-based stuff, and the regexp we are working on here should come at least after we made a general attempt at transcoding the start of the document into the relevant encoding family. >> - There can be space around the equal sign. > >I see in your resulting regexp that you are using >[\x20|\x9|\xD|\xA]+ = [\x20|\x9|\xD|\xA]+ >whereas I suspect it should be >[\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]* >since there could also be no space, right? Yes, sorry, I think I cought some of that before sending out my mail, but apparently not everything. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Monday, 25 June 2007 10:11:05 UTC