W3C home > Mailing lists > Public > public-ixml@w3.org > October 2021

Re: Attribute markings - a question

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 14 Oct 2021 10:19:23 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Steven Pemberton <steven.pemberton@cwi.nl>, public-ixml@w3.org
Message-Id: <DC04B749-5975-491E-BDB5-C5392FDBC098@blackmesatech.com>
To: Tom Hillman <tom@expertml.com>
There’s an instance of this in the current ixml grammar for ixml (dchar, schar),
so if JayParser produces a correct parse of that grammar, it passes this test.

Michael

> On 14,Oct2021, at 2:25 AM, Tom Hillman <tom@expertml.com> wrote:
> 
> I think this is a great example of a test case, though: I'll have to test JayParser against this to be sure it won't fall over!
> 
> _________________
> Tomos Hillman
> eXpertML Ltd
> +44 7793 242058
> On 13 Oct 2021, 16:11 +0100, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>, wrote:
>> Thank you; I had neglected, overlooked, or forgotten the words "regardless of marking of intermediate nonterminals”.
>> 
>> Michael
>> 
>>> On 13,Oct2021, at 3:16 AM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>>> 
>>> It may be just early in the morning and the coffee hasn't yet kicked in, but I don't see the problem.
>>> 
>>> I checked in my implementation, making the grammar unambiguous in the process:
>>> 
>>> S : @able, baker, @charlie.
>>> able: string.
>>> baker: string.
>>> charlie: string.
>>> string: ["abc"]*, ".".
>>> 
>>> Input:
>>> aaa.bbb.ccc.
>>> 
>>> Result:
>>> <S able='aaa.' charlie='ccc.'>
>>> <baker>
>>> <string>bbb.</string>
>>> </baker>
>>> </S>
>>> 
>>> Which was what I was expecting.
>>> 
>>> So assuming I'm not missing something obvious, I suspect that you need to reread the serialisation section of the spec:
>>> 
>>> "
>>> • A nonterminal attribute is serialised by outputting the name of the node as an attribute, and serialising all non-hidden terminal descendants of the node (regardless of marking of intermediate nonterminals), in order, as the value of the attribute.
>>> "
>>> which I think covers what you are asking for.
>>> 
>>> The other side of this coin is:
>>> 
>>> "
>>> • A nonterminal element is serialised by outputting the name of the node as an XML tag, serialising all exposed attribute descendants, and then serialising all non-attribute children in order. An attribute is exposed if it is an attribute child, or an exposed attribute of a hidden element child (note this is recursive).
>>> "
>>> 
>>> Steven
>>> 
>>> On Wednesday 13 October 2021 04:19:52 (+02:00), C. M. Sperberg-McQueen wrote:
>>> 
>>>> Consider the grammar
>>>> 
>>>> S : @able, baker, @charlie.
>>>> able: string.
>>>> baker: string.
>>>> charlie: string.
>>>> string: ~[]*.
>>>> 
>>>> Is this grammar OK? (Yes, it’s hopelessly ambiguous, but that’s beside the point.)
>>>> 
>>>> If we ignored the annotations, a raw parse tree for this grammar might look like this:
>>>> 
>>>> <S>
>>>> <able mark=“@"><string>aaa</string></able>
>>>> <baker><string>aaa</string></able>
>>>> <charlie mark=“@"><string>ccc</string></able>
>>>> </S>
>>>> 
>>>> Note that ‘string’ is implicitly marked serializable (^).
>>>> 
>>>> When a nonterminal marked to be serialized as an element appears as a child of a nonterminal marked to be serialized as an attribute (as ’string’ here appears as a child of @able and @charlie), is the rule
>>>> 
>>>> - Raise an error because the grammar cannot be serialized that way?
>>>> 
>>>> - Omit the content of ’string’ from the value of @able and @charlie by analogy with what happens when we calculate the text node children of an element?
>>>> 
>>>> - Ignore the marking on ’string’ on the grounds that we have already been told that @able is an attribute. Since elements cannot appear within attributes, the implicid ^ marking on ’string’ is ignored.
>>>> 
>>>> The grammar for ixml offers two examples that seem relevant: in a raw parse tree, @name will dominate nodes labeled namestart and namefollower, which are explicitly marked non-serializable (-). @dstring and @sstring similarly dominate nodes labeled dchar and schar, which are implicitly marked ^. The attributes @from and @to directly dominate nodes labeled ‘character’ (marked -) and indirectly dominate nodes labeled ‘dchar’ and ’schar’ (implicitly ^).
>>>> 
>>>> In the spirit of making things as simple as possible for the grammar authors, I suppose the right rule is “when constructing the value of an attribute, treat nonterminals marked ^ and - the same: recur through them” (the last possibility mentioned above).
>>>> 
>>>> I apologize if this has been discussed before - I have the guilty sensation that it has been, and that I did not retain the answer.
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
Received on Thursday, 14 October 2021 16:18:52 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 13 September 2022 10:02:05 UTC