[csswg-drafts] [css-selectors] `:lang` for documents without content language and for elements of unknown language; consider `:lang("")` over `:not(:lang("*"))` (#6915)

myfonj has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-selectors] `:lang` for documents without content language and for elements of unknown language; consider `:lang("")` over `:not(:lang("*"))` ==
### Brief questions to answer

1. Is `:lang("*")` really valid selector?  (Safari supports it and Chrome accepted [Issue 1281157](https://bugs.chromium.org/p/chromium/issues/detail?id=1281157) to implement it.)
2. How to address document that failed to define it's content language?
3. Should be HTML `<el lang=""></el>` or equivalent `<el lang></el>` matched by CSS `:lang("")` or even `:lang()`?  (Not yet proposed nor implemented.)
4. Is erroneous "undefined" document language equivalent of `<html lang="">` or is it something different?


### Trivia

1. It seems that there have never been a way for `lang()` functional pseudo-class to precisely target document without defined content language nor element sub-tree set as such with `lang=""` attribute.
2. [Selectors-4 draft introduces wildcard support](https://drafts.csswg.org/selectors-4/#the-lang-pseudo) ([md](https://github.com/w3c/csswg-drafts/blob/da9f4c11ca40e850c22dd8387e4069cb818e702e/selectors-4/Overview.bs#L1896)) in string argument for matching "any" language of given script or region group (like `"*-Latn"` or `"*-ch"`), opening possibility (loophole?) for matching *any **specified** language* value using `:lang("*")` and with conjunction with negation pseudo-class opens possibility to use `:not(:lang("*"))` for targeting elements that belong to "no specified language" from previous point.  While plain `"*"` value is not explicitly mentioned in the draft, this reportedly already works in current Safari. 
3. Undetermined language occur on any HTML document that does not have explicit `lang` attribute and does not come with `content-language` HTTP header or it's "deprecated nasty" `<meta http-equiv>` counterpart.
4. For marking non-language content it is advised to use `lang=""` attribute. While it is possible to target it with attribute selector, is does not seem like a right tool and introduces nesting / inheritance problems.  To solve this kind of problems `lang()` selector was created.

### Pseudo-code samples / introductory use cases

#### Sample 1: Document without content language

```
HTTP/2 200 OK
[no `content-language: xy,zz` here]

<html [no `lang="xy"]` here>
 <head>
  [no `<meta http-equiv="content-language" content="xy,zz">` here]
 </head>
 <body>
  [I want target this document.]
 </body>
</html>
```

#### Sample 2: element with unknown language content

```
<html lang="xy">
<body>
 <p>xyx:
  <samp lang="">
   000<em>111<var lang="xy">x</var></em>000
  </samp>
 </p>
 [I want to target elements with digits and omit elements with letters.]
</body>
</html>
```

### Problems

Inability to "legally" target specifically undetermined language document or element seems to be quite minor issue, since most CSS approaches tend to start with "common" defaults and progress to language specific with selector of higher specificity.  Like

```CSS
/* pseudo example of common approach for setting language related styles */
/* hand-waving default */ :root { quotes: '"' '"' "'" "'"; } 
/* specific known language */ :lang(x-whatever) { quotes "→" "←" "☛" "☚"; }
```

However this approach fails short if CSS' author needs to specifically address poorly marked-up document lacking any content language hint

```CSS
/* Pseudo example "let's put a country flag representing language of the document on it's beginning" */
/* known languages */
body:lang(en-gb)::before { content: url(./flags-gb.svg) / "British English document: "; }
body:lang(en-gb)::before { content: url(./flags-us.svg) / "American English document: "; }
/* [etc] */

/* default for "unknown" */
body::before { content: url(./flags-specified-but-unknown.svg) / "Content language of this document is specified but not included in style sheets - please contact style maintainer. "; font-size: small; color: GrayText; }

/* "error message" for "unspecified". topic of this issue */
body:not(:lang("*"))::before { content: url(./flags-unspecified-error.svg) / "This document appears to have no content language specified. If you are it's maintainer fix it ASAP, please. "; }`
body:not(:lang("*")) { background-color: var(--bg-error, Canvas); color: var(--fg-error, GrayText); font-family: cursive; }
```

(think Sample 1 above)


### Links, resources and notes

- my initial [SO question](https://stackoverflow.com/questions/70354616/css-lang-selector-for-elements-in-documents-of-undetermined-language) (thanks onkar ruikar for answer, analysis and discussion.) bugreport, and [tweet](https://twitter.com/myfonj/status/1472362381338628099)
- https://www.w3.org/International/questions/qa-css-lang
- https://www.w3.org/International/questions/qa-no-language
  - "unknown": `lang=""` / `xml:lang="und"`
  - "known but non-linguistic": `lang="zxx"`; 
- https://www.w3.org/International/articles/language-tags/
- Internationalization of the Hypertext Markup Language
https://www.ietf.org/rfc/rfc2070.txt


Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6915 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 28 December 2021 01:19:05 UTC