- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Sat, 28 May 2022 14:24:30 +0100
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- CC: public-ixml@w3.org
- Message-ID: <m2bkvhkcec.fsf@Hackmatack-eth.fritz.box>
Hello Steven, Here are my proposed editorial changes to the draft. (An updated version is attached for your convenience.) Remove the extra full stop: ! <p>This document describes the final version of ixml version 1.0..</p> Replace “;” with “,” after "C": ! <pre>{"temperature": {"scale": "C"; "value": 21}}</pre> Add “for example” in this paragraph: <p>A grammar is used to describe the input format. An input is parsed using this grammar, and the resulting parse tree is serialized as XML. Special marks ! in the grammar affect details of this serialization, for example, excluding parts of the tree, or serializing parts as attributes instead of elements.</p> Add error S12 for the version declaration: <p>The optional prolog declares the version of ixml being used. If absent, ! version 1.0 is assumed. A grammar <span id="ref-s12" class="conform">must</span> conform to ! the syntax and semantics of the version declared or assumed ! (<a class="error" href="#err-s12">error S12</a>).</p> Remove a few remaining references to “structured” and “unstructured”: <p>A mark is one of <code></code><code>^, @</code> or <code>-</code>, and ! indicates whether the item so marked will be serialized as an element ! with its children (<code>^</code>) which is the default, as ! an attribute (<code>@</code>), or deleted, so that only its children are serialized (<code>-</code>).</p> <pre class="frag">@mark: ["@^-"].</pre> Break code examples into separate <code> blocks with commas between them: <p>Similarly, a factor repeated one or more times is followed by a plus, or a double plus and a separator, e.g. <code>abc+</code> and <code>abc++","</code>. ! For instance <code>"a"++"#"</code> would match <code>a </code><code>a#a ! a#a#a</code> etc., but not the empty string.</p> <pre class="frag">repeat1: factor, (-"+", s; -"++", s, sep).</pre> And <p>Similarly, a factor repeated one or more times is followed by a plus, or a double plus and a separator, e.g. <code>abc+</code> and <code>abc++","</code>. ! For instance <code>"a"++"#"</code> would match <code>a</code>, <code>a#a</code>, ! <code>a#a#a</code>, etc., but not the empty string.</p> <pre class="frag">repeat1: factor, (-"+", s; -"++", s, sep).</pre> And (also note that I propose to replace “E.g” with “For example,”): ! <p>A separator can be any factor. For example, <code>abc**def</code> or <code>abc**(","; ".")</code>. For instance <code>"a"++("#"; "!")</code> would ! match <code>a#a</code>, <code>a!a</code>, <code>a#a!a</code>, <code>a!a#a</code>, ! <code>a#a#a</code>, etc.</p> <pre class="frag">sep: factor.</pre> Similarly, I’ve put the examples of identical strings in separate <code> blocks and added “and” between them: <p id="ref-s11">A string cannot extend over a line-break (<a class="error" href="#err-s11">error S11</a>). The enclosing quote is represented in a string ! by doubling it; these two strings are identical: <code>'Isn''t it?'</code> and <code>"Isn't ! it?"</code>, as are these: <code>"He said ""Don't!"""</code> and <code>'He said "Don''t!"'</code>.</p> More separation of <code> blocks and commas added: <p>An inclusion is enclosed in square brackets, and represents the set of characters defined by any combination of literal characters, a range of ! characters, hex encoded characters, or Unicode classes. For example, ! <code>["a"-"z"]</code>, <code>["xyz"]</code>, <code>[Lc]</code>, ! and <code>["0"-"9"; "!@#"; Lc]</code>. Note that ! <code>["abc"]</code>, <code>["a"; "b"; "c"]</code>, <code>["a"-"c"]</code>, ! and <code>[#61-#63]</code> all represent the same set of characters.</p> Changed “would” to “will”: ! <p>Note that the empty inclusion <code>[]</code> will fail to match any ! character in the input; on the other hand <code>~[]</code> will match any one character, whatever it is.</p> The section on Parsing seems to have become mangled in editing somehow. There are a couple of stray references to an attribute named “serializ” (sic). Here’s my rewrite: <p>Processors <span class="conform">must</span> accept and parse any conforming grammar, and produce at least one parse of supplied input that matches the grammar starting at the root symbol. If more than one parse results, one is chosen; it is not defined how this choice is made, but the resulting serialization <span class="conform">should</span> including the attribute <code>ixml:state</code> on the document element with a value that includes the word <code>ambiguous</code>. Different processors <span class="conform">may</span> vary in whether input is detected as ambiguous or not. The ixml namespace URI is "<code>http://invisiblexml.org/NS</code>". Known algorithms that accept and parse any context-free grammar include [<a href="#earley">Earley</a>], [<a href="#unger">Unger</a>], [<a href="#cyk">CYK</a>], [<a href="#glr">GLR</a>], and [<a href="#gll">GLL</a>]; see also [<a href="#grune">Grune</a>].</p> <p>If the parse fails, some XML document <span class="conform">must</span> be produced with <code>ixsl:state</code> on the document element with a value that includes the word <code>failed</code>. The document <span class="conform">should</span> provide helpful information about where and why it failed; it <span class="conform">may</span> be a partial parse tree that includes parts of the parse that succeeded.</p> Added “the”: <li><strong>Attribute</strong>: the node is serialized as an XML attribute whose name is the name of the node, and whose value is the serialization of ! all non-deleted terminal descendants of the node (regardless of the marking of ! the intermediate nonterminals), if any, in order.</li> I reworded the note about XML conformance requirements. The prohibition is on an attribute named “xmlns”, not one that has a name that begins with “xmlns”. I also added an error code for it: <p>Note: This requirement means for instance that names of serialized elements and attributes <span class="conform">must</span> match the XML requirements, an element <span id="ref-d02" class="conform">must not</span> contain more than one attribute of a given name (<a class="error" href="#err-d02">error D02</a>); an element <span id="ref-d07" class="conform">must not</span> contain an attribute named “xmlns” (<a class="error" href="#err-d07">error D07</a>); the names of all elements and attributes <span id="ref-d03" class="conform">must</span> conform to the requirements for XML names; invalid characters <span id="ref-d04" class="conform">must not</span> be serialized (<a class="error" href="#err-d04">error D04</a>); a nonterminal being serialized as root element <span id="ref-d05" class="conform">must not</span> be marked as an attribute (<a class="error" href="#err-d05">error D05</a>); in order to match the XML requirement of a single-rooted document, if the root rule is marked as hidden, all of its productions <span id="ref-d06" class="conform">must</span> produce exactly one non-hidden non-attribute nonterminal and no non-hidden terminals before or after that nonterminal (<a class="error" href="#err-d06">error D06</a>).</p> And finally, here is an updated errors section. Remove S04: - <dt id="err-s04"><a href="#ref-s04">S04</a></dt> - <dd>It is an error to mark a terminal as an attribute.</dd> Add S12: + <dt id="err-s12"><a href="#ref-s12">S12</a></dt> + <dd>It is an error if the grammar does not conform to the implied or declared version.</dd> And add D07: + <dt id="err-d07"><a href="#ref-d07">D07</a></dt> + <dd>It is an error if an attribute named “xmlns” appears on an element.</dd>
Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Attachments
- text/html attachment: stored
Received on Saturday, 28 May 2022 13:36:17 UTC