Re: Line/string triage? from Graydon Saunders on 2025-03-23 (public-ixml@w3.org from March 2025)

From: Graydon Saunders <graydonish@fastmail.com>
Date: Sat, 22 Mar 2025 21:41:21 -0400
To: "Bethan Tovey-Walsh" <bytheway@linguacelta.com>, "David Birnbaum" <djbpitt@gmail.com>
Cc: public-ixml@w3.org
Message-Id: <88406383-4ff5-4892-950d-4052a878de0c@app.fastmail.com>

I find myself feeling like it isn't necessarily a good idea to mix in regular expression concepts.

An ixml grammar is obliged to relate the whole input to a tree. Constraints on the mapping -- not that, not there, this only when -- might be usefully expressed as constraints on the tree by feeding in a schema of some kind rather than trying to express those in the grammar?

(When going from flat content to hierarchical content in regular XML processing it seems to work a lot better to give everything its name and then apply hierarchy. Maybe that kind of separation of function would be useful in an ixml grammar. And there's already a "don't put this name in the result" concept, so it might make sense to have the schema use all the defined names whether or not they're going in the final result tree.)

I have no sense of whether this takes a hard problem and makes it intractable, but it seems like a more "this is a tree, it's just hiding" approach to the problem.

On Sat, Mar 22, 2025, at 21:24, Bethan Tovey-Walsh wrote:
>> Anchoring (matching without consuming) does feel different, but don’t we already have ~[L] to match and consume one non-letter character.
> 
> Yes, true. The discussion about a negation operator has, so far, mainly focused on negating longer strings - helping to solve the kind of problem with which you started this thread. I'd expect ~[L] to be equivalent to ![L], therefore, if ! were to be permitted as an operator on things other than strings. 
> 
> There's a case, I think, for limiting a negation operator to strings, since otherwise the negated part can become arbitrarily complex, which I guess could be challenging to implement. Then again, I can also imagine that users might find it very convenient to be able to negate nonterminals or other constructs. 
> 
> If we want to talk about adding lookahead, I do feel that it should be a separate question from that of negation. If nothing else, having negative lookahead without positive lookahead feels rather haphazard. 
> 
> BTW
> 
> ****************************************************
> 
> Dr. Bethan Tovey-Walsh
> 
> _linguacelta.com_
> 
> Golygydd | Editor geirfan.cymru
> 
> Croeso i chi ysgrifennu ataf yn y Gymraeg
> 
> 
>> On 22 Mar 2025, at 08:34, David Birnbaum <djbpitt@gmail.com> wrote:
>> 
>>>> I see another nice use case for a not construct:
>>>> 
>>>> identifier: [L]+, ![L].
>>>> 
>>>> This would be useful to terminate an identifier only at the end of the string of letters. The ![L]+ will not match anything, it will not generate any DOM/xml. It merely checks the next character.
>>> 
>>> That feels like it would be a significant departure from the current semantics where I would expect ![L] to match and consume one non-letter character.
>> 
>> Anchoring (matching without consuming) does feel different, but don’t we already have ~[L] to match and consume one non-letter character.

Received on Sunday, 23 March 2025 01:41:46 UTC