- From: Graydon <graydonish@gmail.com>
- Date: Mon, 12 Sep 2022 07:33:30 -0400
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
On Mon, Sep 12, 2022 at 09:55:07AM +0100, Norm Tovey-Walsh scripsit: [snip] > The discussion here is about U+0013 in an UTF-8 (or US ASCII similarly > encoded) document. Which I admit, I did not make clear. I am easily befuddled! I think there are maybe three questions -- 1. does the source document fed to an ixml parser have any constraints on contents beyond all being in some encoding known to the parser? 2. is the ixml grammar document a representation of XML, using the same rules as an XML document with respect to what code points are permissible in the document? 3. if the ixml grammar document is NOT a representation of XML, are there restrictions on the contents? I think the answers are appropriately "no", "yes", and "not relevant due to 2 being yes". If 3 requires an answer, I get stuck on "the parsed result is XML so we need mapping rules for what happens when a not-XML character gets used where it would become an element name" and so on. That seems like a hard problem, and I don't know of any compelling reason to try to solve it. If it's just "you can have anything as a terminal symbol in your ixml grammar", there's still the issue of "and you just created a text node with that non-XML character in it". You original example is OK because it drops U+0013; it wouldn't be if it put that character into a text node. General case rules for what to do in that case also seem hard. All of which makes me think I'm missing something. Why would you want to allow arbitrary literal code points in the ixml grammar? -- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")
Received on Monday, 12 September 2022 11:33:45 UTC