Re: Problems validating XML

On Jun 25, 2007, at 14:18 , olivier Thereaux wrote:

> Hence, when allowing no space around equal sign:
>
> /^<\?xml [\x20|\x9|\xD|\xA]+ version
>   [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
>   ("1.0"|"1.1"|'1.0'|'1.1')
>   ([\x20|\x9|\xD|\xA]+ encoding
>    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
>    ("[A-Za-z][a-zA-Z0-9-_]+"|'[A-Za-z][a-zA- Z0-9_]+')
>   )?
>   ([\x20|\x9|\xD|\xA]+)+ standalone
>    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
>    ("yes"|"no"|'yes'|'no')
>   )?
>   [\x20|\x9|\xD|\xA]* \?>
> /x

And after checking that only the BOM (i.e no whitespace, no comment)  
may exist before the XMLdecl, and some comments (to be formatted a  
little nicer in fonal code), it gives us:


/^[\xEF\xBB\xBF]? # we may have a BOM at the beginning before <?xml,  
nothing else
   <\?xml [\x20|\x9|\xD|\xA]+ version # for documents, version info  
is mandatory
   [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]* # x20, x9, xD and xA are  
the allowed "xml white space"
   ("1.0"|"1.1"|'1.0'|'1.1') # hardcoding the existing XML versions.  
Maybe we should use \d\.\d
   ([\x20|\x9|\xD|\xA]+ encoding # encoding info is optional
    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
    ("[A-Za-z][a-zA-Z0-9-_]+"|'[A-Za-z][a-zA- Z0-9_]+')
   )?
   ([\x20|\x9|\xD|\xA]+)+ standalone # ditto standalone info, optional
    [\x20|\x9|\xD|\xA]* = [\x20|\x9|\xD|\xA]*
    ("yes"|"no"|'yes'|'no')
   )?
   [\x20|\x9|\xD|\xA]* \?>
/x

-- 
olivier

Received on Monday, 25 June 2007 06:55:50 UTC