Re: The Science of Insecurity from Steven Pemberton on 2025-01-26 (public-ixml@w3.org from January 2025)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Sun, 26 Jan 2025 12:40:05 +0000
To: Gunther Rademacher <grd@gmx.net>, bytheway@linguacelta.com
Cc: public-ixml@w3.org
Message-Id: <1737891170672.1032220771.3060413609@cwi.nl>

Steven Pemberton writes:
> ixml is deterministic.

I respectfully disagree: while iXML guarantees a parse tree for valid input, it is not deterministic in the strict sense, as ambiguous grammars can produce different results across implementations, or even in multiple executions of the same implementation.
I expect we agree, but that I didn't explain myself properly. Given a deterministic input, ixml produces a deterministic output. I doesn't add any indeterminacy, ixml is in itself deterministic. However, given an indeterministic input, it reflects that on the output, and warns for it, saying effectively "It's not clear what is intended, but here is one possibility".

> Also rightly, ixml warns you if you design a language that turns out to be ambiguous

I would disagree with this as well: ixml does not reliably warn you at design time about ambiguity; it may only become apparent later, when specific ambiguous input is encountered. ixml cannot warn generally about ambiguous languages, as the problem of detecting ambiguity is undecidable.
True enough, though my implementation does warn you in cases it can detect at 'compile' time.

Steven

While this applies to arbitrary grammars, there are means to prove the absence of ambiguity for some classes of grammars. As Grune and Jacobs note in Parsing Techniques (2nd ed., Sec. 9.14): "The most effective ambiguity test for a CF grammar we have at present is the construction of the corresponding LR(k) automaton,' as this proves the absence of ambiguity."

Gunther

Gesendet: Freitag, 24. Januar 2025 um 15:18
Von: "Steven Pemberton" <steven.pemberton@cwi.nl>
An: "Bethan Tovey-Walsh" <bytheway@linguacelta.com>
CC: ixml <public-ixml@w3.org>
Betreff: Re: The Science of Insecurity
Patterson suggests that regular or deterministic context-free languages are the only securable input languages. iXML therefore already fails to satisfy her requirements, since it’s non-deterministic.

I would disagree with that characterisation. ixml is deterministic. However, it does allow you to process languages that are non-deterministic, as it must, and should, since some people design such languages.

Also rightly, ixml warns you if you design a language that turns out to be ambiguous, because you should avoid that.

> I think (and better minds may correct me) that implementers might therefore improve the security of iXML if they could provide a “reject ambiguity” mode. This would mean that parsing an input with a grammar would fail if the parse is ambiguous, instead of returning one of the possible parses.

This is not ixml's fault, and I would be against requiring ixml to fail in that case. On the other hand, I feel that the requirement expressed in the spec as "the resulting serialization should include the attribute ixml:state on the document element with a value that includes the word ambiguous. ", ought to be a *must* not a *should*, but we made it a should as a compromise, at Tom Hillman's request, though I don't remember the reasoning any more. I still think it's a bad idea.

Steven

This wouldn’t be enough to make iXML deterministic, of course, since there’s still no guaranteed way to decide whether a given grammar is ambiguous. But it would, at least, prevent the use of ambiguous grammars if and when their ambiguity is exposed by the processor.

Regarding pragmas, I think Patterson’s work suggests we should give users a guardrail, so that they can ensure that pragmas aren’t going to affect their use of iXML in unexpected ways. This would, I think, involve simply requiring in the spec that all iXML processors permit users to disable pragmas. Users could then permit the use of arbitrary iXML grammars as inputs, and know that they will be treated as though they contained no pragmas at all.

If anyone’s interested in Patterson’s work, but isn’t a fan of videos, this paper covers some of the same content: https://digitalcommons.dartmouth.edu/cgi/viewcontent.cgi?article=1337&context=cs_tr

BTW

****************************************************

Dr. Bethan Tovey-Walsh

linguacelta.com <http://linguacelta.com/>

Golygydd | Editor geirfan.cymru <http://geirfan.cymru/>

Croeso i chi ysgrifennu ataf yn y Gymraeg

On 15 Jan 2025, at 11:48, Steven Pemberton <steven.pemberton@cwi.nl> wrote:

I was recommended this talk, and I think it exposes some of the issues on pragmas.

The talk is 38 minutes long.

https://media.ccc.de/v/28c3-4763-en-the_science_of_insecurity

Steven

Received on Sunday, 26 January 2025 12:40:22 UTC