- From: Steve Fogoros <sfogoros@hsc.unt.edu>
- Date: Thu, 17 Sep 2009 13:42:06 -0500
- To: <public-xml-testsuite@w3.org>
- Message-Id: <4AB23C65.C2A1.0037.0@hsc.unt.edu>
On 27 June, 2008, I wrote to xml-editor@w3.org regarding XML Recommendation (V1.0, Editions 2-5) description of how leading white space is defined in well-formed documents. I contend that the recommendation allows leading white space; that is white space before the prolog. Yet, many implementations fail to consider an XML document with leading white space as well-formed, and claim productions [22], [23], and [1] completely describe their implementation [while also relying on the non-normative section F]. Section 2.4 clearly describes any white space outside the document entity as markup and is allowed. There has been no activity on this topic to clarify leading white space in well-formed XML documents as allowed or prohibited. I think namespace has everybody's time and attention at the moment. Since there doesn't appear to be any interest in revising the recommendation regarding leading white space, and many current implementations consider the non-normative description that no leading white space is required for well-formed, I would like to discuss this here to explore if the test cases match the recommendation regarding leading white space outside the document entity. I believe they don't adequately test for this and should contain test cases where leading white space outside the document entity validates as well-formed. Here is the text of my email on 27 June, 2008: >> Subject: XML Recommendation Inconsistencies Regarding Leading White Space in Well-Formed Documents >> >> There appears to be some difficulty interpreting the Recommendation's >> specification regarding leading white space that occurs prior to the xml >> declaration as being prohibited or well-formed. Researching the Internet >> indicates that leading white space is a frequent error at the >> application level. In discussions on expat mailing list, it is claimed >> that expat, i.e., is following the XML recommendation as specified >> regarding leading white space in that it is not allowed. Typically, >> productions [22] prolog, and [23] XMLDecl, are cited as the formal >> specification that prohibits leading white space. >> >> On reviewing the latest XML recommendation (Fifth Edition), I found >> this to be not true. Section 2.4 (as far back as the Second Edition) is >> very clear that any white space at the top level of the document entity >> can exist in a well-formed xml document. I found other sections that >> support this. If this email leads to further discussions, I will be >> happy to enumerate in detail. >> >> I did find one reference in Section F Autodetection of Character >> Encodings (Non-Normative), that stated '... the XML encoding declaration >> is restricted in position and content in order ...', but nowhere else in >> the recommendation exists such a restriction, except in Section F.1 >> Detection Without External Encoding Information, where it states, >> 'Because each XML entity not accompanied by external encoding >> information and not in UTF-8 or UTF-16 encoding must begin with an XML >> encoding declaration, in which the first characters must be '<?xml', >> ....'. As this is a Non-Normative exception case, I don't interpret it as >> a restriction in position and content of the normative case. >> >> Depending on the intent of the recommendation regarding leading white >> space being prohibited or well-formed, I would like to contribute >> suggestions that make this more concise. Thank you for considering this topic, Steve Fogoros Manager of Academic Systems and Programming Academic Information Services University of North Texas Health Science Center ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. **
Received on Thursday, 17 September 2009 18:41:43 UTC