- From: Graydon <graydonish@gmail.com>
- Date: Sun, 11 Sep 2022 14:47:36 -0400
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
On Sun, Sep 11, 2022 at 10:45:12AM +0100, Norm Tovey-Walsh scripsit:
> > I'm not seeing much upside to allowing literal control characters not
> > permitted in XML in the grammar via some additional notational
> > mechanism.
>
> I wasn’t proposing an additional notational mechanism. I can literally
> type a U+0013 character into a string in my editor. I can save that ixml
> file. (I used ^S because I was sure that a literal Control-S character
> wouldn’t survive email transmission; also because my editor renders a
> literal Control-S as a single character marked by two glyphs, ^ followed
> by S.)
I think -- very possibly wrongly! -- that ^S, ^M, etc. are a notational
convention to represent characters with no associated glyph.
> Anyway. I can create an iXML file that has a literal U+0013 in it.
I would not dare argue.
> If that’s forbidden, that’s fine. If it’s allowed but not required, I
> think that introduces an interoperability issue. If it’s required,
> that’s kind of a challenge because my parser builds its grammar from the
> XML representation, so it has no way to get from ixml text to parser
> without XML in the middle. (I can work around this problem with some
> clever escaping, but I’m not going to bother if it’s forbidden :-) )
This reminds of me of when there were issues with the Microsoft "high
ascii" characters, where (for example) cp1252 0097 was the em-dash
character but the code point wasn't legal in XML until the fifth
edition.
As I recall, pre-fifth-edition, I could have an 0097 codepoint character
in something that looked like an XML file, but it wouldn't parse. I
think this is legitimately the same case; if you've got U+0013 as a code
point in ixml, it shouldn't parse.
--
Graydon Saunders | graydonish@gmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor ("That passed, so may this.")
Received on Sunday, 11 September 2022 18:47:51 UTC