W3C home > Mailing lists > Public > www-validator@w3.org > June 2007

Re: Problems validating XML

From: olivier Thereaux <ot@w3.org>
Date: Mon, 25 Jun 2007 15:55:40 +0900
Message-Id: <1785F406-4438-4871-85AA-6A6CD6D66DDF@w3.org>
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>
To: "www-validator@w3.org Community" <www-validator@w3.org>


On Jun 25, 2007, at 14:18 , olivier Thereaux wrote:

> Hence, when allowing no space around equal sign:
>
> /^<\?xml [\x20|\x9|\xD|\xA]+ version
>   [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
>   ("1.0"|"1.1"|'1.0'|'1.1')
>   ([\x20|\x9|\xD|\xA]+ encoding
>    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
>    ("[A-Za-z][a-zA-Z0-9-_]+"|'[A-Za-z][a-zA- Z0-9_]+')
>   )?
>   ([\x20|\x9|\xD|\xA]+)+ standalone
>    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
>    ("yes"|"no"|'yes'|'no')
>   )?
>   [\x20|\x9|\xD|\xA]* \?>
> /x

And after checking that only the BOM (i.e no whitespace, no comment)  
may exist before the XMLdecl, and some comments (to be formatted a  
little nicer in fonal code), it gives us:


/^[\xEF\xBB\xBF]? # we may have a BOM at the beginning before <?xml,  
nothing else
   <\?xml [\x20|\x9|\xD|\xA]+ version # for documents, version info  
is mandatory
   [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]* # x20, x9, xD and xA are  
the allowed "xml white space"
   ("1.0"|"1.1"|'1.0'|'1.1') # hardcoding the existing XML versions.  
Maybe we should use \d\.\d
   ([\x20|\x9|\xD|\xA]+ encoding # encoding info is optional
    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
    ("[A-Za-z][a-zA-Z0-9-_]+"|'[A-Za-z][a-zA- Z0-9_]+')
   )?
   ([\x20|\x9|\xD|\xA]+)+ standalone # ditto standalone info, optional
    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
    ("yes"|"no"|'yes'|'no')
   )?
   [\x20|\x9|\xD|\xA]* \?>
/x

-- 
olivier
Received on Monday, 25 June 2007 06:55:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:24 GMT