Re: [csswg-drafts] Consider Canonicalization of language tags in :lang() selector maching (#4154)

Section 4.5 talks about canonicalising items that are marked _in the registry_ as deprecated (eg. grandfathered tags) or that contain extlang subtags (eg. `zh-yue` -> `yue`).  These are things that can be determined automatically by using the information in the registry.  But it doesn't include `zh-HK`.

Section 3.2 mentions zh-CN as possible equivalent to zh-Hans, but as i read this as a configuration option offered to the user, not an assumption that they meant the same thing in all contexts (which is what canonicalisation means) or that the author intended one while writing the other.  And I think assuming that `zh-HK` means `yue` is more of a stretch than the `zh-CN/zh-Hans` assumption.

I think we need to be careful about making inappropriate assumptions on the behalf of the content author.   `zh-HK` may be used to mean _Mandarin_ chinese, but with traditional script, or possibly written text that includes the few additional characters that are used in Hong Kong - in fact, i thought that the predominant legacy usage for `zh-HK` arose from an early workaround before `zh-Hant` existed - not for `yue`.  Of course, it could also be used for Cantonese.  But it's possible that it's even used for Hakka or Minnan Chinese as spoken in Hong Kong.  It depends, really, on what the author needed and intended. It may also depend on whether it is used for a spoken or written phrase.

So i think it makes sense to assume equivalence for language tags that can be automatically paired using information in the registry, but not for things like `zh-HK`.

-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/4154#issuecomment-517239167 using your GitHub account

Received on Thursday, 1 August 2019 11:13:45 UTC