[svgwg] Clarify systemLanguage parsing and evaluation algorithm (#1089) from Karl Dubost via GitHub on 2026-04-13 (public-svg-issues@w3.org from April 2026)

From: Karl Dubost via GitHub <noreply@w3.org>
Date: Mon, 13 Apr 2026 12:38:23 +0000
To: public-svg-issues@w3.org
Message-ID: <issues.opened-4254679080-1776083898-noreply@w3.org>
karlcow has just created a new issue for https://github.com/w3c/svgwg:

== Clarify systemLanguage parsing and evaluation algorithm ==
The `systemLanguage` attribute ([5.6.5](https://w3c.github.io/svgwg/svg2-draft/struct.html#ConditionalProcessingSystemLanguageAttribute)) references the HTML "set of comma-separated tokens" microsyntax but does not define a processing algorithm for the parsed result. This leads to ambiguity and interoperability issues.

### 1. No explicit parsing or evaluation algorithm

The spec says the value is a "[set of comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#set-of-comma-separated-tokens), each of which must be a Language-Tag value, as defined in BCP 47." It then describes the evaluation result in prose:

> Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match of one of the language tags given in the value of this parameter [...]

But there is no step-by-step algorithm connecting the parsing to the evaluation. Implementors must infer what to do with the parsed list.

### 2. Empty tokens are unaddressed

The HTML comma-separated token spec explicitly allows empty tokens. For example, `" a ,b,,d d "` produces four tokens: `"a"`, `"b"`, `""`, `"d d"`. The SVG spec says tokens "must be a Language-Tag value", but does not say what happens when a token is the empty string:

- Should empty tokens be silently removed after parsing?
- Should they cause the entire attribute to evaluate to false?
- Should they be treated as invalid but harmless (won't match any language)?


### 3. "Null string or empty string" is ambiguous

The spec says:

> If a null string or empty string value is given to attribute 'systemLanguage', the attribute evaluates to "false".

This appears to mean the entire attribute value is `""`. But it could be read as referring to individual tokens within the parsed list. The spec should clarify this refers to the unparsed attribute value.

### 4. Invalid BCP 47 tags are unaddressed

The spec says tokens "must be" BCP 47 Language-Tags, but does not define what happens with invalid tags. Are they:

- Silently dropped during parsing?
- Stored but ignored during evaluation (they won't match any user language)?
- A conformance error that makes the whole attribute evaluate to false?

Current browser behavior: invalid tokens are stored and simply don't match any user language.

### 5. Missing code examples

The section has only one inline code example (`<text systemLanguage="mi, en">`). It would benefit from:

- A `<switch>` example showing language fallback with a catch-all
- A BCP 47 subtag example demonstrating the prefix matching rule

## Proposed algorithm

To make this section unambiguous, we propose adding an explicit algorithm:

> **To evaluate the `systemLanguage` attribute** with value *V*:
>
> 1. Let *tags* be the result of parsing *V* as a [set of comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#set-of-comma-separated-tokens).
> 2. Remove from *tags* any token that is the empty string.
> 3. If *tags* is empty, return false.
> 4. Let *user languages* be the user's ordered list of preferred languages.
> 5. For each *tag* in *tags*, for each *user-lang* in *user languages*:
>    1. If *tag* is an [ASCII case-insensitive](https://infra.spec.whatwg.org/#ascii-case-insensitive) match for *user-lang*, return true.
>    2. If *user-lang* is an ASCII case-insensitive prefix of *tag* and the code point in *tag* immediately following the prefix is U+002D HYPHEN-MINUS (-), return true.
> 6. Return false.

This algorithm:
- Explicitly uses the HTML comma-separated token parsing
- Handles empty tokens (step 2)
- Covers the "empty string evaluates to false" rule (step 3)
- Preserves the existing prefix matching semantics (step 5.2)
- Does not require BCP 47 validation — invalid tags simply won't match

## Proposed examples

### `<switch>` with language fallback and catch-all

```xml
<switch>
  <text systemLanguage="fr" x="10" y="30">Bonjour</text>
  <text systemLanguage="ja" x="10" y="30">こんにちは</text>
  <text x="10" y="30">Hello</text>
</switch>
```

### BCP 47 subtags and prefix matching

```xml
<switch>
  <text systemLanguage="zh-Hans" x="10" y="30">简体中文</text>
  <text systemLanguage="zh-Hant" x="10" y="30">繁體中文</text>
  <text systemLanguage="zh" x="10" y="30">中文</text>
</switch>
```

A user with language preference `zh-Hans-CN` would match the first child, because `zh-Hans` is a case-insensitive prefix of `zh-Hans-CN` followed by `-`.


Please view or discuss this issue at https://github.com/w3c/svgwg/issues/1089 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 13 April 2026 12:38:24 UTC