- From: Tom Hillman <tom@expertml.com>
- Date: Thu, 14 Oct 2021 09:25:22 +0100
- To: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
- Message-ID: <4e311b99-1a46-444b-882b-0de4924726d6@Spark>
I think this is a great example of a test case, though: I'll have to test JayParser against this to be sure it won't fall over! _________________ Tomos Hillman eXpertML Ltd +44 7793 242058 On 13 Oct 2021, 16:11 +0100, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>, wrote: > Thank you; I had neglected, overlooked, or forgotten the words "regardless of marking of intermediate nonterminals”. > > Michael > > > On 13,Oct2021, at 3:16 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > > > > It may be just early in the morning and the coffee hasn't yet kicked in, but I don't see the problem. > > > > I checked in my implementation, making the grammar unambiguous in the process: > > > > S : @able, baker, @charlie. > > able: string. > > baker: string. > > charlie: string. > > string: ["abc"]*, ".". > > > > Input: > > aaa.bbb.ccc. > > > > Result: > > <S able='aaa.' charlie='ccc.'> > > <baker> > > <string>bbb.</string> > > </baker> > > </S> > > > > Which was what I was expecting. > > > > So assuming I'm not missing something obvious, I suspect that you need to reread the serialisation section of the spec: > > > > " > > • A nonterminal attribute is serialised by outputting the name of the node as an attribute, and serialising all non-hidden terminal descendants of the node (regardless of marking of intermediate nonterminals), in order, as the value of the attribute. > > " > > which I think covers what you are asking for. > > > > The other side of this coin is: > > > > " > > • A nonterminal element is serialised by outputting the name of the node as an XML tag, serialising all exposed attribute descendants, and then serialising all non-attribute children in order. An attribute is exposed if it is an attribute child, or an exposed attribute of a hidden element child (note this is recursive). > > " > > > > Steven > > > > On Wednesday 13 October 2021 04:19:52 (+02:00), C. M. Sperberg-McQueen wrote: > > > > > Consider the grammar > > > > > > S : @able, baker, @charlie. > > > able: string. > > > baker: string. > > > charlie: string. > > > string: ~[]*. > > > > > > Is this grammar OK? (Yes, it’s hopelessly ambiguous, but that’s beside the point.) > > > > > > If we ignored the annotations, a raw parse tree for this grammar might look like this: > > > > > > <S> > > > <able mark=“@"><string>aaa</string></able> > > > <baker><string>aaa</string></able> > > > <charlie mark=“@"><string>ccc</string></able> > > > </S> > > > > > > Note that ‘string’ is implicitly marked serializable (^). > > > > > > When a nonterminal marked to be serialized as an element appears as a child of a nonterminal marked to be serialized as an attribute (as ’string’ here appears as a child of @able and @charlie), is the rule > > > > > > - Raise an error because the grammar cannot be serialized that way? > > > > > > - Omit the content of ’string’ from the value of @able and @charlie by analogy with what happens when we calculate the text node children of an element? > > > > > > - Ignore the marking on ’string’ on the grounds that we have already been told that @able is an attribute. Since elements cannot appear within attributes, the implicid ^ marking on ’string’ is ignored. > > > > > > The grammar for ixml offers two examples that seem relevant: in a raw parse tree, @name will dominate nodes labeled namestart and namefollower, which are explicitly marked non-serializable (-). @dstring and @sstring similarly dominate nodes labeled dchar and schar, which are implicitly marked ^. The attributes @from and @to directly dominate nodes labeled ‘character’ (marked -) and indirectly dominate nodes labeled ‘dchar’ and ’schar’ (implicitly ^). > > > > > > In the spirit of making things as simple as possible for the grammar authors, I suppose the right rule is “when constructing the value of an attribute, treat nonterminals marked ^ and - the same: recur through them” (the last possibility mentioned above). > > > > > > I apologize if this has been discussed before - I have the guilty sensation that it has been, and that I did not retain the answer. > > > > > > Michael > > > > > > > > > > > > > >
Received on Thursday, 14 October 2021 08:25:45 UTC