Re: What about this grammar?

I propose that we make an amendment to the spec along these lines:

- An iXML grammar must be capable of being serialized to XML when parsed using the iXML specification grammar.

This would mean that a grammar with a literal U+0019 control character in it would be non-conforming, because that character cannot be represented literally in XML. But a grammar using a hex-encoded U+0019 character (i.e. #19) would be fine, because the XML serialization would be well-formed:

match: -#19, ‘a’.

<rule name=“match”>
  <alt>
    <literal tmark=“-“ hex=“19”/>
  </alt>
</rule>

I think it would also be a good idea to add some wording spelling out the implications, such as:

- In an ixml grammar, characters that are not legal in XML must be represented as encoded characters, and must be excluded from the output by being marked with a “-”.

I’m not making a pull request for any of this, since I’m not yet clear on what we’re doing towards v-next.

All best,

BTW

> On 12 Sep 2022, at 12:33, Graydon <graydonish@gmail.com> wrote:
> 
> On Mon, Sep 12, 2022 at 09:55:07AM +0100, Norm Tovey-Walsh scripsit:
> [snip]
>> The discussion here is about U+0013 in an UTF-8 (or US ASCII similarly
>> encoded) document. Which I admit, I did not make clear.
> 
> I am easily befuddled!
> 
> I think there are maybe three questions --
> 
> 1. does the source document fed to an ixml parser have any constraints
> on contents beyond all being in some encoding known to the parser?
> 
> 2. is the ixml grammar document a representation of XML, using the same
> rules as an XML document with respect to what code points are
> permissible in the document?
> 
> 3. if the ixml grammar document is NOT a representation of XML, are
> there restrictions on the contents?
> 
> I think the answers are appropriately "no", "yes", and "not relevant due
> to 2 being yes".
> 
> If 3 requires an answer, I get stuck on "the parsed result is XML so we
> need mapping rules for what happens when a not-XML character gets used
> where it would become an element name" and so on. That seems like a hard
> problem, and I don't know of any compelling reason to try to solve it.
> 
> If it's just "you can have anything as a terminal symbol in your ixml
> grammar", there's still the issue of "and you just created a text node
> with that non-XML character in it".  You original example is OK because
> it drops U+0013; it wouldn't be if it put that character into a text
> node.  General case rules for what to do in that case also seem hard.
> 
> All of which makes me think I'm missing something.  Why would you want
> to allow arbitrary literal code points in the ixml grammar?
> 
> -- 
> Graydon Saunders  | graydonish@gmail.com
> Þæs oferéode, ðisses swá mæg.
> -- Deor  ("That passed, so may this.")
> 

Received on Monday, 12 September 2022 13:41:54 UTC