- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Tue, 13 Jun 2023 13:25:04 +0100
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
- Message-ID: <m2ilbra66p.fsf@saxonica.com>
> Perhaps say something like > > A conformant Invisible XML processor is required to produce > well-formed XML output, but its choices in serializing the XML > are not otherwise constrained. Okay. > - In the section on hints for implementers, I think the reference to > "the ability to round-trip XML data", without further elaboration, is […] > Perhaps it would be better to talk about possible information loss? > And mention that sometimes information loss is what is desired? I reworded your proposed text a bit: <p><add>An application has some latitude when serializing XML. Particular attention should be paid to serializing whitespace and other control characters. Consider, for example, the case where the characters <code>#a</code> or <code>#d</code> appear in a value serialized as an attribute. When that serialized XML is parsed, the XML parser will replace <code>#a</code> and <code>#d</code> characters with spaces when it performs whitespace normalization on the attribute value. Similarly, the sequence <code>#d#a</code> will be translated to a single <code>#a</code> by standard XML parsing. If the user of the grammar expects to see the original characters in the XML output, it will be necessary to encode them using numeric character references when serializing the XML output. If on the other hand the user of does <em>not</em> expect to see the original characters in the output, then carefully preserving them using numeric character references is likely to be unhelpful. See [<a href="#serialization">Serialization</a>] for detailed discussions.</add></p> > - Now that I have started to worry about the applicability of the > concept of round-tripping, I'm also uneasy about the sentence […] > perhaps read > > Some aspects of the serialization will impact whether or not all > characters of the input (e.g. #a#d as a line separator, or either > of those characters within attribute values) are retained after > the serialized XML is parsed with a conforming XML parser. Okay. > - We should discuss John Cowan's suggestion, which I understand as an > attempt to take the option of preserving details line separators in > the input off the table entirely, thus rendering moot almost > everything in the current draft but the references to #a in attribute > values. > > John's option would work fine for me under normal conditions. I am > not quite sure what conditions might make me want something else, so I > don't know what would make sense by way of an ability to turn off > whitespace normalization. Right. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Tuesday, 13 June 2023 12:38:17 UTC