- From: Graydon <graydonish@gmail.com>
- Date: Sun, 11 Sep 2022 14:47:36 -0400
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
On Sun, Sep 11, 2022 at 10:45:12AM +0100, Norm Tovey-Walsh scripsit: > > I'm not seeing much upside to allowing literal control characters not > > permitted in XML in the grammar via some additional notational > > mechanism. > > I wasn’t proposing an additional notational mechanism. I can literally > type a U+0013 character into a string in my editor. I can save that ixml > file. (I used ^S because I was sure that a literal Control-S character > wouldn’t survive email transmission; also because my editor renders a > literal Control-S as a single character marked by two glyphs, ^ followed > by S.) I think -- very possibly wrongly! -- that ^S, ^M, etc. are a notational convention to represent characters with no associated glyph. > Anyway. I can create an iXML file that has a literal U+0013 in it. I would not dare argue. > If that’s forbidden, that’s fine. If it’s allowed but not required, I > think that introduces an interoperability issue. If it’s required, > that’s kind of a challenge because my parser builds its grammar from the > XML representation, so it has no way to get from ixml text to parser > without XML in the middle. (I can work around this problem with some > clever escaping, but I’m not going to bother if it’s forbidden :-) ) This reminds of me of when there were issues with the Microsoft "high ascii" characters, where (for example) cp1252 0097 was the em-dash character but the code point wasn't legal in XML until the fifth edition. As I recall, pre-fifth-edition, I could have an 0097 codepoint character in something that looked like an XML file, but it wouldn't parse. I think this is legitimately the same case; if you've got U+0013 as a code point in ixml, it shouldn't parse. -- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")
Received on Sunday, 11 September 2022 18:47:51 UTC