[csswg-drafts] [css-values] ambiguity when matching syntax to value (#6695)

idoros has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-values] ambiguity when matching syntax to value ==
[css-values] ambiguity when matching syntax to value

I can't seem to find a place in the spec or a discussion on how ambiguous syntax definition is handled when matched against a value. It might be the case that there is no definite answer and formal syntax is checked to have no edge cases before approval, but I haven't found that criteria either. 

for example if a data-type ends with an optional ending that matches the beginning of another data-type: 

```
syntax = <A> || <B>
<A>    = <number>+
<B>    = <number>
```

For the value "1 2 3", does `<A>` matches "1 2 3" ? it seems like `<B>` would never be matched (and it is optional). 

What if the syntax was using all-of: `<A> && <B>`, should the matcher do some kind of negative-lookahead to stop `<A>` at "1 2"? or maybe fail when `<A>` takes all the value, swap the order so `<B>` would match "1" and `<A>` would match "2 3"?

-----

The other ambiguity is with `<custom-ident>` that the spec [states](https://www.w3.org/TR/css-values-4/#custom-idents) that "When parsing positionally-ambiguous keywords in a property value, a <custom-ident> production can only claim the keyword if no other unfulfilled production can claim it."

For example:
```
syntax  = <C> && <D>
<C>     = <custom-ident>
<D>     = abc
```

For the value "abc ccc", `<C>` would match "ccc" and `<D>` would match "abc", since it claimed it and have priority. I assume that if the syntax was defined with juxtaposition: `<C> <D>`, then for the same value "abc ccc", `<D>` wouldn't be able to take the first part of the value and the match would fail.

To emphasis the possible order of (mis)matching:

```
syntax  = [<E> | <F>] && [<G> | <H>]
<E>     = <custom-ident>
<F>     = <number>
<G>     = abc
<H>     = <number>
```

For the value "abc 1", what order of matches/claims should the matcher take?

Just thinking how it might work, maybe the matcher uses the following concepts while matching

* register claim ambiguity - add to a stack of match states (used when `<custom-ident>` is matched to a keyword)
* force claim - rollback to previous match state and force next match option
* next match option - move to next match option if available.

and claim could be treated in several ways:

- defer claim - iterate through syntax options that don't match a (taken?) claim, and only after mismatching all other non claim matches, call "force claim" to rollback the match.
    - `<E>`=="abc" `<custom-ident>` - "register claim ambiguity"
    - `<G>`!="1" - not claimed since there is another option: "match next option" in `[<G>|<H>]`
    - `<H>`=="1" `<number>`
- immediate claim - immediately "force claim" to rollback the match to a different path in which the keyword is not taken before claimed.
    - `<E>`=="abc" `<custom-ident>` - "register claim ambiguity"
    - `<G>`!="1" - call "force claim" rollback and match next option `<F>`
    - `<F>`!="abc" - next match option: reorder top level all-of `[<G>|<H>] && [<E>|<F>]`
    - `<G>`=="abc" (abc keyword)
    - `<E>`!="1" - "next match option" in `[<E>|<F>]`
    - `<F>`=="1" (number)
- probably other ways are possible

<details>

<summary>
more matching examples:

</summary>

```
syntax  = <I> || <K> || <N>
<I>     = <custom-ident>
<K>     = abc
<N>     = <number>
```

For the value "abc 1" (notice diff between the way claim is handled):

- defer claim:
    - `<I>` -> "abc"
    - `<N>` -> "1"
- immediate claim:
    - `<K>` -> "abc"
    - `<N>` -> "1"

```
syntax  = <I> || <M> | <K> || <N>
<I>     = <custom-ident>
<M>     = <number>
<K>     = abc
<N>     = <number>
```

For the value "abc 1" match first `<I>||<M>`, and then preferring `<K><N>` in re-claim process:

- defer claim:
    - `<K>` -> "abc"
    - `<N>` -> "1"
- immediate claim:
    - `<K>` -> "abc"
    - `<N>` -> "1"

</details>

-----

So I have some questions:

- Does the order of syntax definition changes the matching process? would a sort within a group (one-of,any-of,all-of) change the meaning of a syntax?
- The order that a matcher would go through the possibilities would change the result. Are there rules for the order of matching (mismatching)?
- Are there any other positionally-ambiguous cases like `<custom-ident>`? matches that allow registering match state that can be reclaimed in some way?
- Can there be more then 1 `<custom-ident>` in a syntax?
- Regarding the rule that says that `<custom-ident>` is only valid when unclaimed. Does it take affect if the potential keyword in the syntax is never checked against? for example when the keyword is at the end of some optional group and the value is matched completely before reaching it.

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6695 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 29 September 2021 14:28:41 UTC