Re: Thoughts on pragmas and iXML from Bethan Tovey-Walsh on 2025-02-14 (public-ixml@w3.org from February 2025)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Fri, 14 Feb 2025 18:32:45 +0000
To: Graydon Saunders <graydonish@gmail.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <EC204CD5-D9AD-45E3-B438-AF8B42F492A7@linguacelta.com>
> I would contend that the syntax of pragmas, on the other hand, needs to
> allow someone reading a grammar to know which pragmas pertain to which
> grammar constructs.

I agree, but I don't think this extends to knowing which pragmas *actually have an effect on the processing of* a given construct. It just means knowing which construct the pragma is attached to, and which construct should therefore provide the context for that pragma's semantics to be interpreted.

> To extend this a bit; the user also needs to know something about the
> rules for multiple adjectives; can the tove be slithy AND mimsy at one
> time? is a slithy slithy tove more slithy than a slithy tove, not slithy
> (because the sentence must stop for repairs), or only so slithy as a
> tove declared slithy once?

That, I would argue, is no longer a syntactic question, but one which is answered by considering semantic information about the adjectives. (Bear with me: this comes back around to pragmas in the end!)

Can you have a "woody, woody hillside"? Yeah, that seems okay - the hillside isn't just woody, it's woody to the max. The repetition carries the semantics of intensification. 

Can you have a "wooden, wooden chair"? Okay, I don't know about that. Intensification doesn't seem to apply to an adjective describing the material constituent of the noun. Can a chair be extremely made-of-wood? Seems strange, and I can't come up with any alternative semantics for the repetition.

However, a "wooden, wooden actor" is fine: this dude has less acting ability than Roger Moore's left eyebrow. We're back to intensification.

Can you have "eleven, eleven chairs"? Nah, that's super weird, and I can't work out what the repetition might mean. The chairs aren't extremely eleven, so is this twenty-two chairs? Or a hundred and twenty-one chairs?

But what about "twenty two chairs"? In writing, you'd ideally put a dash between "twenty" and "two", to show that this is really a compound adjective; in speech there's no indication that "twenty two chairs" and "two twenty chairs" are different structures. We need semantic rules to understand that "two twenty chairs" is like "eleven eleven chairs", whereas "twenty two chairs" is like "many chairs".

And what if we are combining different, non-numeral, adjectives?  Is a "happy tired child" the same as a "tired happy child"? Yeah, probably.  Is a "proud former vegetarian" the same as a "former proud vegetarian"? No, probably not. Could you be a "proud, proud former vegetarian"? Sure. How about a "former, former, proud vegetarian"? That sounds questionable to me.

None of this is controlled by the syntactic rules of English. English says that "adjective, adjective noun" is syntactically allowable, and defers more granular acceptability rules to the semantics.

What you're suggesting would be like English syntax adding the following rules, given adjectives X, Y, and noun N:

1. "X, X N" is always allowable, and always carries the semantics of intensification of the adjective X. 

2. (pick one of)
 a) "X, Y N" is always allowable, and has the same meaning as "Y, X N".
or
 b) "X, Y N" is always allowable, but never has the same meaning as "Y, X N".
or
 c)  "X, Y N" is allowable, but "Y, X N" is not.

Placing these extra rules into the syntax has numerous effects. We have to resolve the semantic incongruity of "eleven, eleven chairs" and "wooden, wooden chairs" somehow, so maybe we make a new word class, which has all the same properties as adjectives, but cannot participate in "X, X N" constructions. Similarly, we need to deal with our intuition that "proud former vegetarian" and "former proud vegetarian" mean different things but "happy tired child" and "tired happy child" do not, given that our syntactic rules now insist that the order cannot change the semantics (or, alternatively, that it must change the semantics, or that only one of the orderings is allowable). And we may have to permit both "a red plastic bottle" and "a plastic red bottle", for the same reason. 

These are harder problems to solve. We could choose to end up with a lot of new word classes, defined in terms like  "can be the first noun-modifier, but cannot be followed by modifiers of categories C, D, K, or N"; "can only be the last modifier; can't follow a modifier from categories Q, T, or Y, unless that modifier is preceded by a modifier of category D". These classes would replace the current class "adjective". Alternatively, we just give up on the notion that adjectives face any constraints whatever on ordering, which means that neat, efficient expressions of different meaning like "proud former vegetarian" and "former proud vegetarian" aren't available, and we need to use more verbose constructions to get shades of meaning across.

The semantic rules for adjectives are much harder to learn than the syntactic ones, both for children learning English as a primary language and (especially) for adults learning it as a new language. And yet, if a separation of concerns between syntax and semantics had no advantage for language, it wouldn't be a key feature of every natural language we know. This separation, ultimately, makes the most of the trade-off between expressive power and systemic complexity. Making semantic rules into syntactic ones means that we must either complicate the system considerably, or sacrifice expressive power.

Grammar authors will need to become familiar with the semantics of the pragmas they are using (just as language learners must become familiar with the semantics of types of English adjectives) - that stands to reason. But keeping syntactic rules to a minimum means that they will not be forced to write pragmas in ways that violate their intuitive sense of semantic logic, or just find themselves entirely unable to write all of the pragmas they need, because the spec says (for example) that a pragma always overrides any previous ones which have the same element in scope, or that a pragma on a child element is always subordinate to a pragma on its parent.

> But if absolutely everything is up to the
> implementer, I then have to figure out which processor version I'm using
> and go read some documentation; is this an implementation which applies
> the last associated pragma? the first? 

That's not what I'm proposing, though. I'm saying that the semantics of the pragma itself must define its behaviour in interaction with other pragmas. An implementation could not decide unilaterally that *all* pragmas follow its own arbitrary rules. An implementer might decide that all *pragmas she defines* will follow some arbitrary rules of precedence (she'd be something of an idiot if she did, but there's nothing stopping her). But if she proposes to recognise pragmas from other implementers, she must obey their semantics. 

This also means that the grammar author may encounter some limitations; some pragmas may not play well with others in particular ways, just as some adjectives cannot be used before or after or alongside some others. This seems unlikely to be a major issue in reality, since I doubt anyone's going to be stuffing hundreds of pragmas into their grammars. But it will be part of learning the skill of more advanced grammar-writing, and that's the price one pays for a technology that can do more of what you want, more flexibly.

BTW


___________________________________________________ 
Dr. Bethan Tovey-Walsh 

linguacelta.com

Golygydd | Editor geirfan.cymru

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 14 Feb 2025, at 16:25, Graydon <graydonish@gmail.com> wrote:
> 
> On Fri, Feb 14, 2025 at 01:40:21AM +0000, Bethan Tovey-Walsh scripsit:
>>> Which makes me think there will need to be some requirement to
>>> resolve the meaning of overlapping scope
>> 
>> Right, I see. (I say that with the same inevitable caveat as before.)
> 
> Talking about the requirements of an already abstract unnatural grammar
> does not come naturally to anyone, at least not so far as I've observed.
> 
> Communication is observed to occur; I think this is success.
> 
>> I still believe that this must be managed by the semantics of the
>> pragma.
> 
> What the processor does, yes, that goes into the semantics of the
> pragma, is implementation-specific, and up to the implementer. 
> 
> I would contend that the syntax of pragmas, on the other hand, needs to
> allow someone reading a grammar to know which pragmas pertain to which
> grammar constructs. It should be possible to know by inspection which
> pragmas pertain to a construct even while it should NOT be possible to
> know what they're going to do.
> 
> As a user, if I am reading a grammar which contains pragmas and I see
> something that could be abstracted as
> 
> pragma{A} pragma{A} pragma{B} LHS = RHS
> 
> how do I read it?
> 
> I ought to have a pretty good idea that all three instances of pragmas
> apply to that left hand side. But if absolutely everything is up to the
> implementer, I then have to figure out which processor version I'm using
> and go read some documentation; is this an implementation which applies
> the last associated pragma? the first? is it going to throw an error?
> how many times is pragma A going to be applied? A user should not, by
> design, be that confused; they should be able to learn some few simple
> rules about the pragma part of speech and be able to apply those during
> inspection.
> 
> [snip]
>> On a more practical level, it's much easier to have a requirement that
>> pragma-data (and thus semantics) are not part of the specification,
>> than to say "pragma semantics  should usually be down to the
>> implementer, except in these four cases..." Slippery slopes, and all
>> that.
> 
> Don't want to specify the semantics; do want sufficiently specific
> syntax that I can tell by inspection (for some "in principle" value of
> inspection if necessary) what pragmas pertain to what grammar
> constructs.
> 
>>> I think an analogy from natural language may apply; one may not know
>>> what the word means, but one would still know where to put it in a
>>> sentence if one knew what part of speech it was.
>> 
>> Ooh, I like this. What are slithy toves? Who knows! But "slithy" is an
>> adjective and "toves" is a plural noun, and you can therefore have "a
>> slithy tove", and make it the subject of a verb if you feel like it.
> 
> To extend this a bit; the user also needs to know something about the
> rules for multiple adjectives; can the tove be slithy AND mimsy at one
> time? is a slithy slithy tove more slithy than a slithy tove, not slithy
> (because the sentence must stop for repairs), or only so slithy as a
> tove declared slithy once?
> 
> Much appreciated,
> Graydon
> 
> 
> --  
> Graydon Saunders  | graydonish@fastmail.com
> Þæs oferéode, ðisses swá mæg.
> -- Deor  ("That passed, so may this.")
>
Received on Friday, 14 February 2025 18:33:04 UTC