- From: Karl Dubost via GitHub <noreply@w3.org>
- Date: Mon, 13 Apr 2026 12:38:23 +0000
- To: public-svg-issues@w3.org
karlcow has just created a new issue for https://github.com/w3c/svgwg: == Clarify systemLanguage parsing and evaluation algorithm == The `systemLanguage` attribute ([5.6.5](https://w3c.github.io/svgwg/svg2-draft/struct.html#ConditionalProcessingSystemLanguageAttribute)) references the HTML "set of comma-separated tokens" microsyntax but does not define a processing algorithm for the parsed result. This leads to ambiguity and interoperability issues. ### 1. No explicit parsing or evaluation algorithm The spec says the value is a "[set of comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#set-of-comma-separated-tokens), each of which must be a Language-Tag value, as defined in BCP 47." It then describes the evaluation result in prose: > Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match of one of the language tags given in the value of this parameter [...] But there is no step-by-step algorithm connecting the parsing to the evaluation. Implementors must infer what to do with the parsed list. ### 2. Empty tokens are unaddressed The HTML comma-separated token spec explicitly allows empty tokens. For example, `" a ,b,,d d "` produces four tokens: `"a"`, `"b"`, `""`, `"d d"`. The SVG spec says tokens "must be a Language-Tag value", but does not say what happens when a token is the empty string: - Should empty tokens be silently removed after parsing? - Should they cause the entire attribute to evaluate to false? - Should they be treated as invalid but harmless (won't match any language)? ### 3. "Null string or empty string" is ambiguous The spec says: > If a null string or empty string value is given to attribute 'systemLanguage', the attribute evaluates to "false". This appears to mean the entire attribute value is `""`. But it could be read as referring to individual tokens within the parsed list. The spec should clarify this refers to the unparsed attribute value. ### 4. Invalid BCP 47 tags are unaddressed The spec says tokens "must be" BCP 47 Language-Tags, but does not define what happens with invalid tags. Are they: - Silently dropped during parsing? - Stored but ignored during evaluation (they won't match any user language)? - A conformance error that makes the whole attribute evaluate to false? Current browser behavior: invalid tokens are stored and simply don't match any user language. ### 5. Missing code examples The section has only one inline code example (`<text systemLanguage="mi, en">`). It would benefit from: - A `<switch>` example showing language fallback with a catch-all - A BCP 47 subtag example demonstrating the prefix matching rule ## Proposed algorithm To make this section unambiguous, we propose adding an explicit algorithm: > **To evaluate the `systemLanguage` attribute** with value *V*: > > 1. Let *tags* be the result of parsing *V* as a [set of comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#set-of-comma-separated-tokens). > 2. Remove from *tags* any token that is the empty string. > 3. If *tags* is empty, return false. > 4. Let *user languages* be the user's ordered list of preferred languages. > 5. For each *tag* in *tags*, for each *user-lang* in *user languages*: > 1. If *tag* is an [ASCII case-insensitive](https://infra.spec.whatwg.org/#ascii-case-insensitive) match for *user-lang*, return true. > 2. If *user-lang* is an ASCII case-insensitive prefix of *tag* and the code point in *tag* immediately following the prefix is U+002D HYPHEN-MINUS (-), return true. > 6. Return false. This algorithm: - Explicitly uses the HTML comma-separated token parsing - Handles empty tokens (step 2) - Covers the "empty string evaluates to false" rule (step 3) - Preserves the existing prefix matching semantics (step 5.2) - Does not require BCP 47 validation — invalid tags simply won't match ## Proposed examples ### `<switch>` with language fallback and catch-all ```xml <switch> <text systemLanguage="fr" x="10" y="30">Bonjour</text> <text systemLanguage="ja" x="10" y="30">こんにちは</text> <text x="10" y="30">Hello</text> </switch> ``` ### BCP 47 subtags and prefix matching ```xml <switch> <text systemLanguage="zh-Hans" x="10" y="30">简体中文</text> <text systemLanguage="zh-Hant" x="10" y="30">繁體中文</text> <text systemLanguage="zh" x="10" y="30">中文</text> </switch> ``` A user with language preference `zh-Hans-CN` would match the first child, because `zh-Hans` is a case-insensitive prefix of `zh-Hans-CN` followed by `-`. Please view or discuss this issue at https://github.com/w3c/svgwg/issues/1089 using your GitHub account -- Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 13 April 2026 12:38:24 UTC