Re: Draft iXML minutes, 4 March 2025 from David Birnbaum on 2025-03-18 (public-ixml@w3.org from March 2025)

From: David Birnbaum <djbpitt@gmail.com>
Date: Tue, 18 Mar 2025 10:55:33 -0400
To: Steven Pemberton <steven.pemberton@cwi.nl>
Cc: public-ixml@w3.org
Message-ID: <CAP4v81rhQc2jo+9aATOwre6Eqi=GXot8dT_V7va9HAytMGrGmw@mail.gmail.com>
Dear All,

Perhaps we can learn a lesson from Relax NG compact syntax about whether we
want to permit productions that license structures that would not be
well-formed. Here are a few thoughts:

A Relax NG production that allows attributes to repeat on an element is
permitted. Relax NG compact syntax also licenses constructions that cannot
be expressed in XML; for example, `text?`, `text+`, and `text*` are all
permitted in Relax even though sequential text nodes are not permitted in
XML and even though `text` is defined as zero or more characters (so that
zero textual characters matches `text`, making the occurrence indicator on
`text?` meaningless). It further allows attributes to be represented as
part of mixed content, even though they are not content, e.g., `mixed { x
}` where `x` is later defined as an attribute; this expresses element
content that is text (that is, zero or more textual characters), and not
mixed.

I think of a schema as an expression of a theory of the text that (unlike a
lower level grammar to which it might be compiled) is intended for human
legibility. For that reason, I regret that Relax NG compact syntax allows
productions that license XML that would not be well formed. Yes, we'll
avoid repeating attributes in our XML and well-formedness checking will
catch it if we don't, but allowing a schema rule to say that a
well-formedness violation is allowed compromises the legibility of the
schema.


ixml has different resources at its disposal than Relax NG compact syntax
and I don't mean to suggest that what Relax NG compact syntax permits (or
not, or should permit, or should not) is directly applicable to ixml. But I
wonder whether there might be a common issue: A Relax NG schema or ixml
grammar that says that it allows something that is not possible in XML
compromises how accurately, effectively, and legibly the schema or grammar
models the documents it purports to model. I care about this in the Relax
NG case because it's either completely unnecessary (e.g., Relax NG could
have prohibited repetition indicators on attributes or `text`) or could
have been avoided with a different (and, I think, clearer) design (e.g.,
attributes could have been represented in a way that distinguished them
from element content). I don't know whether it's as avoidable in ixml, but
perhaps 1) acknowledging that a grammar that says X is allowed when X is a
well-formedness violation is less legible than one that doesn't and 2)
distinguishing what can be avoided easily from what could be avoided only
at excessive cost would give us a useful perspective on the question.

Best,

David

On Tue, Mar 18, 2025 at 9:09 AM Steven Pemberton <steven.pemberton@cwi.nl>
wrote:

> Bethan: One of the things I've been thinking about recently is that we
> have a bunch of dynamic errors that are things like there not being a
> single root node, or two attributes with the same name on the same element.
> … Earlier the spec says that conformance is about processors and grammars,
> not the combination of a grammar and in input.
> … That being the case, I think these should be static errors on the
> grammar; not errors that are only thrown if a particular input produces
> not-well formed XML.
> … I'm convinced you can do it.
>
> Real-life example: I have an input where repeatedly I get A, B and C, but
> not necessarily in that order.
> I know the input is correct, so I write:
>
> input: event*.
> event: (A; B; C)+, #a.
> @a: date.
> @b: who.
> @c: where.
> date: ... etc
>
> This grammar allows <event a="..." a="..."/>, but it will never happen
> because my input will never generate it.
>
> If this were to be classed as a static error, then I would have to write:
>
> input: event*.
> event: (ABC; ACB; BAC; BCA; CAB; CBA), #a.
> ABC: a, b, c.
> ACB: a, c, b.
> BAC: b, a, c.
> BCA: b, c, a.
> CAB: c, a, b.
> CBA: c, b, a.
> @a: date.
> @b: who.
> @c: where.
> date: ... etc
>
> which I would rather not have to do.
>
> Steven
>
> On Tuesday 18 March 2025 13:50:26 (+01:00), Steven Pemberton wrote:
>
> Looks like you had a great discussion in the status reports section. Sorry
> I missed it.
>
> Bethan says: "What I'm interested in working on are tools that will treat
> your grammar as a generator rather than a recognizer."
>
> I've written several of these in the past (for instance, when I wrote a
> version of Eliza, the Rogerian psychotherapist, I wrote another program to
> generate random paranoid ramblings for Eliza to respond to (
> https://cwi.nl/~steven/Talks/2024/09-oxford/ai.html#L2734)
>
> In fact, they are quite easy to write, since it is just a recursive random
> path through the grammar tree. This is the complete code, where 'thing' is
> either a terminal or nonterminal ('choice' returns a random element of a
> sequence, in this case returning a random alternative from a rule):
>
> HOW TO GENERATE thing
> FROM grammar:
>     SELECT:
>
> nonterminal(thing):
>             FOR symbol
> IN choice grammar[thing]:
>
>     GENERATE symbol FROM grammar
>
> ELSE:
>             WRITE thing, "
> "
>
>
> And you generate one rambling with "GENERATE '<sentence>' FROM sentences"
>
> Steven
>
> On Tuesday 04 March 2025 16:42:47 (+01:00), Norm Tovey-Walsh wrote:
>
> > Hi folks,
> >
> > Draft minutes are online:
> >
> > https://www.w3.org/2025/03/04-ixml-minutes.html
> >
> > Be seeing you,
> > norm
> >
> > --
> > Norm Tovey-Walsh
> > Saxonica
> >
> >
>
>
Received on Tuesday, 18 March 2025 14:55:50 UTC