- From: olivier Thereaux <ot@w3.org>
- Date: Mon, 25 Jun 2007 15:55:40 +0900
- To: "www-validator@w3.org Community" <www-validator@w3.org>
- Cc: Martin Duerst <duerst@it.aoyama.ac.jp>
On Jun 25, 2007, at 14:18 , olivier Thereaux wrote:
> Hence, when allowing no space around equal sign:
>
> /^<\?xml [\x20|\x9|\xD|\xA]+ version
> [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
> ("1.0"|"1.1"|'1.0'|'1.1')
> ([\x20|\x9|\xD|\xA]+ encoding
> [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
> ("[A-Za-z][a-zA-Z0-9-_]+"|'[A-Za-z][a-zA- Z0-9_]+')
> )?
> ([\x20|\x9|\xD|\xA]+)+ standalone
> [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
> ("yes"|"no"|'yes'|'no')
> )?
> [\x20|\x9|\xD|\xA]* \?>
> /x
And after checking that only the BOM (i.e no whitespace, no comment)
may exist before the XMLdecl, and some comments (to be formatted a
little nicer in fonal code), it gives us:
/^[\xEF\xBB\xBF]? # we may have a BOM at the beginning before <?xml,
nothing else
<\?xml [\x20|\x9|\xD|\xA]+ version # for documents, version info
is mandatory
[\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]* # x20, x9, xD and xA are
the allowed "xml white space"
("1.0"|"1.1"|'1.0'|'1.1') # hardcoding the existing XML versions.
Maybe we should use \d\.\d
([\x20|\x9|\xD|\xA]+ encoding # encoding info is optional
[\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
("[A-Za-z][a-zA-Z0-9-_]+"|'[A-Za-z][a-zA- Z0-9_]+')
)?
([\x20|\x9|\xD|\xA]+)+ standalone # ditto standalone info, optional
[\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
("yes"|"no"|'yes'|'no')
)?
[\x20|\x9|\xD|\xA]* \?>
/x
--
olivier
Received on Monday, 25 June 2007 06:55:50 UTC