issue #24 does an ixml processor have to match everything?

At the end of the last call, we agreed that at our next call we would
talk about issue #24, "Does ixml have to match the whole input?"

In preparation for this topic, I would like to ask a procedural question
and lay out a preliminary position in preparation for our discussion.

................................................................

The procedural question is:  why are we re-opening this question?

We discussed this at some length [1] last April and reached a decision.
Old decisions should of course be re-opened when new information comes
to light; my question is: what do we know now that we did not know last
April when we decided that as far as the definition of conformance to
the ixml spec is concerned, the answer to the question is "yes"?

[1] https://lists.w3.org/Archives/Public/public-ixml/2021Apr/0007.html

Last April, as the record shows, I argued in favor of prefix-matching
being conformant behavior but did not carry the day; none of the
arguments in favor of prefix-matching were found persuasive by the
majority in the group, and none of the scenarios in which
prefix-matching is useful or necessary was held to be a scenario a
conforming ixml processor needs to support.

What has changed?

What has changed for me is fairly simple: the discussion last April
persuaded me that if we attempt to define prefix-matching as a
conformant behavior, I will not be happy with the definition.  My
perception is that the group proved incapable of finding a coherent
formulation, or even a coherent position to formulate, and I concluded
it would be better to say nothing about prefix-matching than to say
something incoherent.  Recent discussions have not changed my mind; they
are full of suggestions that the ixml spec prescribe all sorts of
behaviors and conditions that I think should not be prescribed and which
make impossible pretty much every single use case I mentioned as cases
where prefix matching makes sense and is useful.

................................................................

So my position on issue #24, going in to the discussion, is yes, a
conforming ixml processor reports that the input, in its entirety, is a
sentence in the language defined by the specified grammar, or that it is
not. 

That has several consequences:

- A processor that wishes to support input streams of indeterminate
  length will do so as a non-standard extension.

  At one point I believe I proposed that conforming processors which
  support extensions to ixml must provide a user option to turn off all
  extensions.  I don't think that made it into the spec, but I still
  think it would be a good idea.
  
- A processor that wishes to offer a mode of operation in which
  successively larger prefixes of the input are identified as sentences
  in the language will do so as a non-standard extension.

- A processor that wishes to inform the user that while the input string
  as a whole is not a sentence in the language, a particular substring
  of the input *is* a sentence in the language can do so as part of its
  error diagnostics.  If it falsely reports that the input as a whole is
  a sentence, it is not conforming.

In all three cases, the rationale for the decision is the same: every
attempt the CG has made to describe how conformance would work if the
rule were changed has been unsatisfactory.

................................................................

Since the idea that a processor should "maximally consume" the input
keeps coming up, it may be worth addressing it directly.  I do not think
the phrase has a clear meaning, but from context I think the idea is to
require that a processor identify the longest prefix of the input that
satisfies the grammar.  This proposal is problematic in several ways:

1 It is incoherent in the case of infinite input.  It assumes that there
is a longest prefix of that description, which is not guaranteed the
case. So in what appears to be the single most compelling use case for
prefix matching, it is ill defined.

2 It is counterproductive for those who want an exhaustive enumeration
of prefixes of the input which are grammatical (similar to the behavior
of the Prolog 'phrase' predicate described in the April discussion).

3 It makes the primary use case for ixml (parse this input against this
grammar and give me the parse tree) into a special case that requires
special wording (and so far every attempt at sketching that wording has
been clumsy and inaccurate).

If anyone who has proposed "maximal consumption" of the input meant
something different, then (a) please explain what you meant, and (b)
please note how dismally the phrase failed to convey what you meant.  It
would be helpful if you can find a better way to express your meaning.



-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Thursday, 3 February 2022 18:06:06 UTC