Editorial remarks on the 2022-05-27 draft from Norm Tovey-Walsh on 2022-05-28 (public-ixml@w3.org from May 2022)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Sat, 28 May 2022 14:24:30 +0100
To: Steven Pemberton <steven.pemberton@cwi.nl>
CC: public-ixml@w3.org
Message-ID: <m2bkvhkcec.fsf@Hackmatack-eth.fritz.box>

Hello Steven,

Here are my proposed editorial changes to the draft. (An updated version
is attached for your convenience.)

Remove the extra full stop:

! <p>This document describes the final version of ixml version 1.0..</p>

Replace “;” with “,” after "C":

! <pre>{"temperature": {"scale": "C"; "value": 21}}</pre>
  
Add “for example” in this paragraph:
  
  <p>A grammar is used to describe the input format. An input is parsed using
  this grammar, and the resulting parse tree is serialized as XML. Special marks
! in the grammar affect details of this serialization, for example, excluding parts of the
  tree, or serializing parts as attributes instead of elements.</p>

Add error S12 for the version declaration:
  
  <p>The optional prolog declares the version of ixml being used. If absent,
! version 1.0 is assumed. A grammar <span id="ref-s12" class="conform">must</span> conform to
! the syntax and semantics of the version declared or assumed
! (<a class="error" href="#err-s12">error S12</a>).</p>

Remove a few remaining references to “structured” and “unstructured”:
  
  <p>A mark is one of <code></code><code>^, @</code> or <code>-</code>, and
! indicates whether the item so marked will be serialized as an element
! with its children (<code>^</code>) which is the default, as
! an attribute (<code>@</code>), or deleted, so that only its children are
  serialized (<code>-</code>).</p>
  <pre class="frag">@mark: ["@^-"].</pre>

Break code examples into separate <code> blocks with commas between them:
  
  <p>Similarly, a factor repeated one or more times is followed by a plus, or a
  double plus and a separator, e.g. <code>abc+</code> and <code>abc++","</code>.
! For instance <code>"a"++"#"</code> would match <code>a </code><code>a#a
! a#a#a</code> etc., but not the empty string.</p>
  <pre class="frag">repeat1: factor, (-"+", s; -"++", s, sep).</pre>

And
  
  <p>Similarly, a factor repeated one or more times is followed by a plus, or a
  double plus and a separator, e.g. <code>abc+</code> and <code>abc++","</code>.
! For instance <code>"a"++"#"</code> would match <code>a</code>, <code>a#a</code>,
! <code>a#a#a</code>, etc., but not the empty string.</p>
  <pre class="frag">repeat1: factor, (-"+", s; -"++", s, sep).</pre>

And (also note that I propose to replace “E.g” with “For example,”):
  
! <p>A separator can be any factor. For example, <code>abc**def</code> or
  <code>abc**(","; ".")</code>. For instance <code>"a"++("#"; "!")</code> would
! match <code>a#a</code>, <code>a!a</code>, <code>a#a!a</code>, <code>a!a#a</code>,
! <code>a#a#a</code>, etc.</p>
  <pre class="frag">sep: factor.</pre>

Similarly, I’ve put the examples of identical strings in separate <code>
blocks and added “and” between them:
 
  <p id="ref-s11">A string cannot extend over a line-break (<a class="error"
  href="#err-s11">error S11</a>). The enclosing quote is represented in a string
! by doubling it; these two strings are identical: <code>'Isn''t it?'</code> and <code>"Isn't
! it?"</code>, as are these: <code>"He said ""Don't!"""</code> and <code>'He said
  "Don''t!"'</code>.</p>

More separation of <code> blocks and commas added:
  
  <p>An inclusion is enclosed in square brackets, and represents the set of
  characters defined by any combination of literal characters, a range of
! characters, hex encoded characters, or Unicode classes. For example,
! <code>["a"-"z"]</code>, <code>["xyz"]</code>, <code>[Lc]</code>,
! and <code>["0"-"9"; "!@#"; Lc]</code>. Note that
! <code>["abc"]</code>, <code>["a"; "b"; "c"]</code>, <code>["a"-"c"]</code>,
!  and <code>[#61-#63]</code> all represent the same set of characters.</p>

Changed “would” to “will”:
  
! <p>Note that the empty inclusion <code>[]</code> will fail to match any
! character in the input; on the other hand <code>~[]</code> will match any one
  character, whatever it is.</p>

The section on Parsing seems to have become mangled in editing somehow.
There are a couple of stray references to an attribute named
“serializ” (sic). Here’s my rewrite:

<p>Processors <span class="conform">must</span> accept and parse any conforming
grammar, and produce at least one parse of supplied input that matches the
grammar starting at the root symbol. If more than one parse results, one is
chosen; it is not defined how this choice is made, but the resulting
serialization <span class="conform">should</span> including the attribute
<code>ixml:state</code> on the document element with a value that includes the
word <code>ambiguous</code>. Different processors <span
class="conform">may</span> vary in whether input is detected as ambiguous or
not. The ixml namespace URI is "<code>http://invisiblexml.org/NS</code>". Known
algorithms that accept and parse any context-free grammar include [<a
href="#earley">Earley</a>], [<a href="#unger">Unger</a>], [<a
href="#cyk">CYK</a>], [<a href="#glr">GLR</a>], and [<a href="#gll">GLL</a>];
see also [<a href="#grune">Grune</a>].</p>

<p>If the parse fails, some XML document <span class="conform">must</span> be
produced with <code>ixsl:state</code> on the document element with a value that
includes the word <code>failed</code>. The document <span
class="conform">should</span> provide helpful information about where and why
it failed; it <span class="conform">may</span> be a partial parse tree that
includes parts of the parse that succeeded.</p>

Added “the”:

    <li><strong>Attribute</strong>: the node is serialized as an XML attribute
      whose name is the name of the node, and whose value is the serialization of
!     all non-deleted terminal descendants of the node (regardless of the marking of
!     the intermediate nonterminals), if any, in order.</li>

I reworded the note about XML conformance requirements. The prohibition
is on an attribute named “xmlns”, not one that has a name that begins
with “xmlns”. I also added an error code for it:
  
<p>Note: This requirement means for instance that names of serialized elements
and attributes <span class="conform">must</span> match the XML requirements,
an element <span id="ref-d02" class="conform">must not</span> contain
more than one attribute of a given name (<a class="error" href="#err-d02">error
D02</a>); an element
<span id="ref-d07" class="conform">must not</span> contain
an attribute named “xmlns” (<a class="error" href="#err-d07">error
D07</a>);
the names of all elements and attributes <span id="ref-d03"
class="conform">must</span> conform to the requirements for XML names; invalid
characters <span id="ref-d04" class="conform">must not</span> be serialized (<a
class="error" href="#err-d04">error D04</a>); a nonterminal being serialized as
root element <span id="ref-d05" class="conform">must not</span> be marked as an
attribute (<a class="error" href="#err-d05">error D05</a>); in order to match
the XML requirement of a single-rooted document, if the root rule is marked as
hidden, all of its productions <span id="ref-d06" class="conform">must</span>
produce exactly one non-hidden non-attribute nonterminal and no non-hidden
terminals before or after that nonterminal (<a class="error"
href="#err-d06">error D06</a>).</p>

And finally, here is an updated errors section. Remove S04:

-   <dt id="err-s04"><a href="#ref-s04">S04</a></dt>
-     <dd>It is an error to mark a terminal as an attribute.</dd>

Add S12:

+   <dt id="err-s12"><a href="#ref-s12">S12</a></dt>
+     <dd>It is an error if the grammar does not conform to the implied or declared version.</dd>

And add D07:
  
+   <dt id="err-d07"><a href="#ref-d07">D07</a></dt>
+     <dd>It is an error if an attribute named “xmlns” appears on an element.</dd>

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Attachments

text/html attachment: stored

Received on Saturday, 28 May 2022 13:36:17 UTC