Re: comments on PR 179 from Norm Tovey-Walsh on 2023-06-13 (public-ixml@w3.org from June 2023)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Tue, 13 Jun 2023 13:25:04 +0100
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: public-ixml@w3.org
Message-ID: <m2ilbra66p.fsf@saxonica.com>

>   Perhaps say something like
>
>       A conformant Invisible XML processor is required to produce
>       well-formed XML output, but its choices in serializing the XML
>       are not otherwise constrained.

Okay.

> - In the section on hints for implementers, I think the reference to
>   "the ability to round-trip XML data", without further elaboration, is
[…]
>   Perhaps it would be better to talk about possible information loss?
>   And mention that sometimes information loss is what is desired?

I reworded your proposed text a bit:

<p><add>An application has some latitude when serializing XML.
Particular attention should be paid to serializing whitespace and
other control characters. Consider, for example, the case where the
characters <code>#a</code> or <code>#d</code> appear in a value
serialized as an attribute. When that serialized XML is parsed, the
XML parser will replace <code>#a</code> and <code>#d</code> characters
with spaces when it performs whitespace normalization on the attribute
value. Similarly, the sequence <code>#d#a</code> will be translated to
a single <code>#a</code> by standard XML parsing. If the user of the
grammar expects to see the original characters in the XML output, it
will be necessary to encode them using numeric character references
when serializing the XML output. If on the other hand the user of does
<em>not</em> expect to see the original characters in the output, then
carefully preserving them using numeric character references is likely
to be unhelpful. See [<a href="#serialization">Serialization</a>] for
detailed discussions.</add></p>

> - Now that I have started to worry about the applicability of the
>   concept of round-tripping, I'm also uneasy about the sentence
[…]
>   perhaps read 
>
>       Some aspects of the serialization will impact whether or not all
>       characters of the input (e.g. #a#d as a line separator, or either
>       of those characters within attribute values) are retained after
>       the serialized XML is parsed with a conforming XML parser.

Okay.

> - We should discuss John Cowan's suggestion, which I understand as an
>   attempt to take the option of preserving details line separators in
>   the input off the table entirely, thus rendering moot almost
>   everything in the current draft but the references to #a in attribute
>   values.
>
>   John's option would work fine for me under normal conditions.  I am
>   not quite sure what conditions might make me want something else, so I
>   don't know what would make sense by way of an ability to turn off
>   whitespace normalization.

Right.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Tuesday, 13 June 2023 12:38:17 UTC