- From: David Carlisle <davidc@nag.co.uk>
- Date: Wed, 12 Nov 2003 14:17:35 GMT
- To: xml-editor@w3.org
1.3 Rationale for XML 1.1 states: In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the conventions used on IBM and IBM-compatible mainframes. As a result, XML documents on mainframes are not plain text files according to the local conventions. XML 1.0 documents generated on mainframes must either violate the local line-end conventions, or employ otherwise unnecessary translation phases before parsing and after generation. Allowing straightforward interoperability is particularly important when data stores are shared between mainframe and non-mainframe systems (as opposed to being copied from one to the other). Therefore XML 1.1 adds NEL (#x85) to the list of line-end characters. For completeness, the Unicode line separator character, #x2028, is also supported This rationale fails to mention the simpler and less disruptive alternative that does not require "unnecessary translation phases before parsing and after generation" namely to use a text encoding specified in the xml or text declaration that maps NEL in the file to a Unicode newline. This would have avoided the disruptive changes to the XML white space rules. Even if (as I suspect will happen) the WG decides to keep the addition of NEL to the line end normalisation characters, I think that the option of using an encoding (and a rationale for why it wasn't used) should be mentioned, or failing that, this rationale ought to be removed from spec (its not clear that such discussion belongs in the spec anyway). The current rules in 2.11 End-of-Line Handling appear to be self contradicting. First they say the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating all of the following to a single #xA character: This would imply that a reasonable strategy would be to run an off-the-shelf line end normaliser over the file before parsing however if you do that you can not (so easily) comply with the final rule of that section The characters #x85 and #x2028 cannot be reliably recognized and translated until an entity's encoding declaration (if present) has been read. Therefore, it is a fatal error to use them within the XML declaration or text declaration. If these characters MUST (appear to) have been normalised away before parsing, ie before the text declaration is recognised, how can you tell they appear in a text declaration? Clearly some form of words could be constructed that say what you mean here, but the fact that the description needs to become more convoluted is perhaps an indication that this change isn't as "straightforward" as the current rationale implies. As a side remark on "fatal error" as it appears in the line quoted above. Is there any chance that this (and other similar terms) could be hyperlinked to the definition of this term in the glossary. Currently there is no typographic or hypertextual indication that this is a defined term. which means basically changing "fatal error" to <termref def="dt-fatal">fatal error</termref> everywhere it appears if it's not already so marked. David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________
Received on Wednesday, 12 November 2003 09:21:20 UTC