'Leading White Space' Topic

On 27 June, 2008, I wrote to xml-editor@w3.org regarding XML
Recommendation (V1.0, Editions 2-5) description of how leading white
space is defined in well-formed documents. I contend that the
recommendation allows leading white space; that is white space before
the prolog. Yet, many implementations fail to consider an XML document
with leading white space as well-formed, and claim productions [22],
[23], and [1] completely describe their implementation [while also
relying on the non-normative section F]. Section 2.4 clearly describes
any white space outside the document entity as markup and is allowed.
 
There has been no activity on this topic to clarify leading white space
in well-formed XML documents as allowed or prohibited. I think namespace
has everybody's time and attention at the moment.
 
Since there doesn't appear to be any interest in revising the
recommendation regarding leading white space, and many current
implementations consider the non-normative description that no leading
white space is required for well-formed, I would like to discuss this
here to explore if the test cases match the recommendation regarding
leading white space outside the document entity. I believe they don't
adequately test for this and should contain test cases where leading
white space outside the document entity validates as well-formed.
 
Here is the text of my email on 27 June, 2008:
 
>> Subject: XML Recommendation Inconsistencies Regarding Leading White
Space in Well-Formed Documents
>> 
>> There appears to be some difficulty interpreting the
Recommendation's
>> specification regarding leading white space that occurs prior to the
xml
>> declaration as being prohibited or well-formed. Researching the
Internet
>> indicates that leading white space is a frequent error at the
>> application level. In discussions on expat mailing list, it is
claimed
>> that expat, i.e., is following the XML recommendation as specified
>> regarding leading white space in that it is not allowed. Typically,
>> productions [22] prolog, and [23] XMLDecl, are cited as the formal
>> specification that prohibits leading white space.
>> 
>> On reviewing the latest XML recommendation (Fifth Edition), I found
>> this to be not true. Section 2.4 (as far back as the Second Edition)
is
>> very clear that any white space at the top level of the document
entity
>> can exist in a well-formed xml document. I found other sections
that
>> support this. If this email leads to further discussions, I will be
>> happy to enumerate in detail.
>>  
>> I did find one reference in Section F Autodetection of Character
>> Encodings (Non-Normative), that stated '... the XML encoding
declaration
>> is restricted in position and content in order ...', but nowhere
else in
>> the recommendation exists such a restriction, except in Section F.1
>> Detection Without External Encoding Information, where it states,
>> 'Because each XML entity not accompanied by external encoding
>> information and not in UTF-8 or UTF-16 encoding must begin with an
XML
>> encoding declaration, in which the first characters must be
'<?xml',
>> ....'. As this is a Non-Normative exception case, I don't interpret
it as
>> a restriction in position and content of the normative case.
>>  
>> Depending on the intent of the recommendation regarding leading
white
>> space being prohibited or well-formed, I would like to contribute
>> suggestions that make this more concise. 
Thank you for considering this topic,
 
Steve Fogoros
Manager of Academic Systems and Programming
Academic Information Services
University of North Texas Health Science Center



** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. **

Received on Thursday, 17 September 2009 18:41:43 UTC