- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 12 Jun 2023 18:24:20 -0600
- To: ixml <public-ixml@w3.org>
Thank you, Norm, for this draft. I have some comments.
- The sentence
A conformant Invisible XML processor is only required to produce
well-formed XML output, it is not otherwise constrained.
makes me nervous.
An ixml processor is not constrained to use any particular set of
serialization options, or even to conform to the serialization spec
(with whatever framing assumptions are needed to make that idea make
sense), but it *is* constrained to produce XML representing a
successful parse of the input, if there is such a parse.
Perhaps say something like
A conformant Invisible XML processor is required to produce
well-formed XML output, but its choices in serializing the XML
are not otherwise constrained.
?
- In the section on hints for implementers, I think the reference to
"the ability to round-trip XML data", without further elaboration, is
unhelpful. In the context of the serialization spec, it's clear (at
least on reflection) that the round trip starts with an XDM instance
and goes through serialization to XML and through XML parsing back
into an XDM instance. In an ixml context, the route back to the input
format is undefined.
Perhaps it would be better to talk about possible information loss?
And mention that sometimes information loss is what is desired?
An application has some latitude when serializing XML. Particular
attention should be paid to serializing whitespace and other control
characters. It should be noted, for example, that if characters #a
and #d appear in a value to be serialized as an attribute and are
serialized normally, the #a and #d characters in the value will be
removed by the XML parser when it performs whitespace normalization
on the attribute value. The sequence #a#d will similarly be
translated to #a by standard XML parsing. If the user of the
grammar expects to see the original characters in the XML output, it
will be necessary to encode them using numeric character references
when serializing the XML output. If on the other hand the user of
does *not* expect to see the original characters in the output, then
carefully preserving them using numeric character references is
likely to be unhelpful. See [Serialization] for detailed
discussions.
- Now that I have started to worry about the applicability of the
concept of round-tripping, I'm also uneasy about the sentence
Some aspects of the serialization will impact whether or not the
document can be perfectly reconstruc[t]ed by the XML parser.
So for
Some aspects of the serialization will impact whether or not the
document can be perfectly reconstruced by the XML parser.
perhaps read
Some aspects of the serialization will impact whether or not all
characters of the input (e.g. #a#d as a line separator, or either
of those characters within attribute values) are retained after
the serialized XML is parsed with a conforming XML parser.
- For `reconstruced` read `reconstructed`.
- We should discuss John Cowan's suggestion, which I understand as an
attempt to take the option of preserving details line separators in
the input off the table entirely, thus rendering moot almost
everything in the current draft but the references to #a in attribute
values.
John's option would work fine for me under normal conditions. I am
not quite sure what conditions might make me want something else, so I
don't know what would make sense by way of an ability to turn off
whitespace normalization.
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Tuesday, 13 June 2023 00:44:34 UTC