- From: Dael Jackson <daelcss@gmail.com>
- Date: Tue, 3 Dec 2019 18:14:20 -0500
- To: www-style@w3.org
- Cc: www-international@w3.org
=========================================
These are the official CSSWG minutes.
Unless you're correcting the minutes,
Please respond by starting a new thread
with an appropriate subject line.
=========================================
Joint Meeting with Internationalization
+++++++++++++++++++++++++++++++++++++++
Selectors
---------
- There is complex interaction between the different historic sets
of :lang() tags and subtags which makes resolving issue #4154
(Canonicalization of :lang() selectors) complex. The PR doesn't
capture all the complexity so florian will work with addison to
see if a safe subset can be defined. florian will also look at
the impacts of his proposal on various languages to ensure it's
safe.
CSS Text
--------
- RESOLVED: The presence of soft break opportunities between spans
which change soft breaking rules is undefined
(Issue #3897: Breaking Rules at inline element
boundaries)
- The i18n group will look further into issue #3481 (Remove
collapsible line breaks adjacent to word separators) especially
around cases such as the Ogham space mark.
===== FULL MINUTES BELOW =======
Agenda: https://wiki.csswg.org/planning/tpac-2019#agenda
Scribe: heycam
Joint Meeting with Internationalization
+++++++++++++++++++++++++++++++++++++++
Selectors
=========
Canonicalization of :lang() selectors
-------------------------------------
github: https://github.com/w3c/csswg-drafts/issues/4154
florian: The :lang selector lets you select pieces of the DOM for
styling based on the language
florian: It's already somewhat smart, since lang tags are structured
florian: Selecting zh, and the document saing zh-Hant, it will do
the right thing and match it
florian: that logic is already built in
florian: The IANA maintains a registry of the languages that exist
and what they mean
florian: tags and subtags
florian: and in addition to just listing them, there is logic in
that registry. Some languages are a deprecated version of
some other languages
florian: Cantonese used to be zh-yue. That is deprecated and
replaced with yue
florian: The lang selector does not take that logic into account
florian: So if you have a document marked as lang="yue", and you are
matching :lang(zh) or :lang(zh-yue), it won't match
florian: We may want to use the registry definitions of how to match
florian: I propose we do that
addison: Some tag canonicalization is defined by BCP 47 to consume
some of the information in the registry
addison: You've been corresponding on the IETF languages list and I
think some of your questions have been about handling
macro-languages -- zh-yue is a macro language
florian: zh-yue is a macro language, zh is a macro language
addison: There's a separate thing. Previous to the current BCP 47,
there was a mechanism for registering whole tags
addison: that's grandfathered now
addison: Some of them match subtags, some don't
addison: [...] is replaced by xtg
addison: Ignoring grandfathered tags, they all map to something. The
ones you're referring to are structurally identical, the
tags are composed of subtags
addison: like zh-yue
florian: The way I'm looking at this, there are variety of reasons
for why certain languages might be the same
florian: there is a defined canonicalization that handles some of
them
addison: For the BCP 47 canonicalization, that will do away with the
grandfathered ones and other structural weirdness
florian: It won't deal with the two types of Norwegian
florian: This is a complicated topic with many weird variants
addison: There's a subset there that's well defined
addison: There's a second set of rules, which are in CLDR
addison: UTR 35
addison: for handling some additional cases around Chinese, where
you have different script subtags that you want to appear
or not in some circumstances
addison: some of those may be of interest, but it's more complicated
addison: I don't want to pretend that doesn't exist, but they do
florian: If you have a link, please drop it
addison: Defining matching, if you're just using BCP 47 "lookup" IINM
florian: Extended filtering
florian: the text for extended filtering says you should canonicalize
addison: Yes you should
florian: Thanks for bringing up that the topic is broader
addison: If you do the minimum set, it'll make it the most
predictable. the other aspects are worth studying
addison: there are some annoying corner cases in Chinese
florian: I hear support for the current proposal, and complicated
problems to think about in addition to that
addison: Yes I agree with your current proposal and then do further
study, and track the other standards happening in that space
florian: There is a PR for this
addison: Should we review that?
<fantasai> https://github.com/frivoal/csswg-drafts/commit/3cff5d844b6415ef30d3e2dac221f9479e0ec7aa
florian: If you haven't I suggest you do
AmeliaBR: The other question on the topic, do we have implementor
commitments?
r12a: The current text I'm looking at says "... must be converted to
x-lang form"
r12a: that's a slightly different discussion from what you
canonicalize it as
r12a: zh-yue would become yue
florian: I had that discussion on the list as well
florian: This is the right direction
florian: zh doesn't match yue so if you canonicalize both to x-lang
format, it'll match
florian: I raised this on the mailing list, and they agreed it was
the right form to canonicalize it to
addison: Some people on the list did
addison: The challenge is that this will bring you more promiscuous
matching than the author may have intended
addison: It'll make Canontese match Mandarin Chinese in some cases
florian: If you want to match Mandarin specifically that's also
possible
addison: Normally Mandarin is tagged just as zh
r12a: For all the macro languages there's usually a preferred
language
fantasai: If the author cares that much, they can put the
information there
addison: That's right
addison: you don't want to have them with a correctly tagged
document, have the :lang match things they were [...]
<addison> http://www.unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers
duerst: That mailing list is no longer a WG
duerst: so people can give you opinions and background knowledge,
but no formal resolutions
<AmeliaBR> So, to cases: (A) author used zh in stylesheet and yue in
HTML; doesn't expect a match. (B) author used zh in
stylesheet and zh-yue in HTML; does expect a match.
Canonicalizing both yue and zh-yue to the same value will
break one or the other.
florian: I agree that the problem can exist in both directions, too
much or not enough, I think since we're doing it for
typographical purposes, and the languages are related, most
of the time if you have zh styles you want it to match
Cantonese too
florian: It's possible to style Mandarin differently from Cantonese,
Hakka, etc., but that's rare
<addison> http://www.unicode.org/reports/tr35/#Likely_Subtags
r12a: It's not just Chinese we're talking about
r12a: There are other languages that have much more differentiation
between the language depending on which of the subtags you
choose
r12a: The point I wanted to make was that we said that let's go
ahead with the proposal at the moment
r12a: Looking at the issue, there was a proposal you wrote, I
responded saying you had to modify that
r12a: the PR doesn't say much
r12a: not sure what the exact proposal is
r12a: I think this information we're talking about now should also
be part of that
florian: The earlier proposal that you rightfully pointed out I
wrote too much, including making zh-HK match yue and things
like this, that's not defined in the repo I'm referring to
florian: I'm just saying, just the canonicalization to x-lang form
as defined by BCP 47
florian: and as supported by the mailing list that used to be the WG
that used to define that document
florian: but whichever way we go, including no change at all, has a
risk of mismatching things in some cases
addison: Not all tags match all values, otherwise what's the point
addison: The problem is to arrive at something that authors
understand how to get the results they want
addison: we'll make some compromises, the question in which ones
fantasai: Based on the conversation so far, it seems like I don't
think canonicalizing yue to zh-yue is going to be good.
Either we don't canonicalize, or in a direction where zh
encompasses Cantonese
fantasai: I am sure there are style sheets that just use :lang(zh),
and they'll expect it to match
addison: The other possibility is that the inclusion or
non-inclusion of the enclosing subtag -- in this case zh --
is a choice the author is making deliberately. if they've
made that choice deliberately, if we mess with their tags
when doing matching it may produce results they don't expect
addison: Most of the matching algorithms are strict "remove from
right" subtag matching
addison: to make it obvious what's happening
addison: What's you start adding or subtracting subtags in ways
other than the deprecation/renaming, I think that has more
risk to it in your space
addison: since it's not obvious what's going to happen
addison: I would support doing the mappings that's in the registry,
since that's where if you have multiple variations, because
people have older documents and style sheets, they'll get
the right answer
addison: That's different than adding or subtracting subtags
AmeliaBR: We covered a lot of what I was going to say, but with a
different conclusion
AmeliaBR: It's important that when matching a style sheet and a
document that we respect the way that the author matched
it, don't want to introduce spurious matching from
canonicalization
AmeliaBR: also don't want to break matching
AmeliaBR: From the examples brought up, it's obvious that any
canonicalization may end up breaking one site or the other
AmeliaBR: The question is then how do we make it easier in the
general case for having new style sheets or new UA style
rules deal with all these deprecated synonyms
AmeliaBR: At the UA style sheet, that can just be an advice to UAs
to look up the BCP deprecation list
AmeliaBR: then also included the deprecated synonymous
AmeliaBR: That doesn't work for things like a style sheet that is
coming from a library or CSS reset
AmeliaBR: or the case of newer code, writing a new new style sheet,
but still apply to the old pages with the older language
tags
AmeliaBR: One approach that might address that use case is something
like what we do with case insensitive selector matching
AmeliaBR: a flag in the selector that means "this value or any
synonyms"
florian: So an opt in for canonicalization
addison: There are three sets
addison: the grandfathered list is permanently fixed and has been
for 10 years
addison: all those tags have explicit mappings, you can safely map
them to modern equivalents or vv
addison: Individual subtags that have mappings, it's mostly about
countries going out of business
addison: yiddish has two subtags, hebrew has two subtags, there's a
canonical one
addison: The third thing is the x-lang thing, which is inconvenient
addison: because there's two ways to say things. With or without the
enclosing subtag
addison: The canonicalization rule in BCP 47 says you can drop the
primary language subtag and use the x-lang by itself
addison: it's permissible for implementations to do that
addison: I don't recall it says you can put it back
florian: There are 2 sets of rules
florian: one that just strips it off. The other says when you're
done stripping it off, put it back
r12a: It says you could consider doing that
addison: The first two are completely safe
r12a: You want to do those
r12a: for interop
r12a: The x-lang thing, I think you can choose
r12a: whether to put the enclosing subtag on
r12a: The challenge is that Chinese you'd want to do that, but some
of the other macro languages are not as crisp. Arabic is one
of these, Malaysian
<r12a> https://r12a.github.io/app-subtags/
r12a: Omani Arabic and Moroccan Arabic, which treat certain things
differently, may have different font requirements
r12a: but they both resolve to "ar" if we follow this PR
r12a: but that's used for standard Arabic
<myles> thanks for the link, r12a is the best
<fantasai> +1
florian: I think we're not ready to merge the PR
florian: Action items: the safe subset of canonicalization, I don't
think it's defined as a canonicalizing operation separately
from the x-lang thing
florian: Action on me to find out if we can
addison: This is an area that probably deserves better documentation
from us
addison: We can go offline and make sure we get the right answer
addison: We can go back and talk to the locale folks at Unicode and
the languages list and make sure we're capturing the sense
of this
florian: One, figure it out if the safe subset exists as a standard
operation
florian: Two, if we do what I'm proposing, look at the affected
languages and see if it's good for them
CSS Text 3
==========
Breaking Rules at inline element boundaries
-------------------------------------------
github: https://github.com/w3c/csswg-drafts/issues/3897
fantasai: There was an issue raised about what happens when you have
two inline elements that have different breaking rules
fantasai: 3 properties control this. white-space, word-break,
line-break
fantasai: Looking at an example (in the GitHub issue)
fantasai: at the boundaries of the span, which line breaking rules
applies when it has a different word-break prop value to
the rest of the text
fantasai: for white-space, the nearest common ancestor is used
fantasai: The complication for word-break and line-break is that the
determining rules for where you're allowed to break
requires running an analysis on a lot of text
fantasai: and every time that value changes you have to do another
run, so impl wise it's a bit awkward
fantasai: There's been some discussion about what's the best
behavior here
fantasai: I wanted to ask i18n if you have feedback on this issue
fantasai: and ask the WG if this proposal to leave this undefined
for L3, give impl time to experiment
fantasai: Doesn't seem to be a terribly high importance case to
solve at the moment
florian: I think one of the more interesting cases -- and I support
making it undefined -- is if the parent div allows a break
between every latter, and two spans next to each other
which don't
florian: Can you break between the spans or not
florian: Current spec says yes, but it's hard-ish to implement
florian: If we need time to think about this, undefine it for a
while, seems reasonable
florian: but that's the kind of case this brings to the surface
Rossen: Any objections to leaving it undefined?
r12a: we should look at it as a group offline
r12a: It's quite a long thread. I seem to remember someone brought
up an example that didn't work
nmccully: Are there layout engines working on this that would
benefit from the extra time?
fantasai: Part of the issue is the ICU APIs make it awkward
for the rules to change in the middle of the line
fantasai: so impl wise it's awkward
fantasai: Could be factorial if you're changing it every other
letter in the line
fantasai: so there's some hesitancy to impl that given the current
infrastructure
fantasai: but there doesn't seem to be great solutions
fantasai: Some of the behaviors you'd get from doing an easy thing
would be non-symmetric
fantasai: you'd be switching slightly less if you use the current
rule in the spec, but that's all
fantasai: There's not a high pressure to solve this and get interop
fantasai: Look at it again in L4
myles: Tangential comment, the general thing we're discussing is
styling element boundaries
myles: This is something letter-spacing also does
myles: The spec says something that all browsers disagree with
myles: With we do come up with a good way to describe boundary
behavior, we should try to use this system to describe
letter-spacing too
fantasai: I think the spec is right on letter-spacing
nigel: I think it would be good to have a general way to handle this
florian: we have a current generalized rule, that is general, and
does the right thing, and is painful to impl
RESOLVED: It's undefined
<myles> the presence of soft break opportunities between spans which
change soft breaking opportunities is undefined heycam
RESOLVED: i.e., the presence of soft break opportunities between
spans which change soft breaking opportunities is undefined
Remove collapsible line breaks adjacent to word separators
----------------------------------------------------------
github: https://github.com/w3c/csswg-drafts/issues/3481
fantasai: We generally have this concept in CSS and HTML that you
can use white space to format your source, and we collapse
white space, including line breaks, down to a single space
fantasai: essentially unbreaking the source lines to create a
paragraph
fantasai: For Chinese and Japanese which don't use spaces, we have
some rules to remove the space; otherwise you will be
forced to put entire paragraphs on one line always
fantasai: There are some rules for doing that based on character
classes
fantasai: What we didn't consider thoroughly is languages that use a
word separator that's not a space
fantasai: We do special case ZWSP, for Thai and other languages
fantasai: but we don't have something similar for Ethiopic word space
fantasai: Probably don't also want a regular space added there
fantasai: Proposal is when there's a word separator character
adjacent to a line break, the line break just goes away
fantasai: I think the characters that are affected here are Ogham
space mark and Ethiopic word space and the Tibetan tsek
<koji> https://drafts.csswg.org/css-text-3/#word-separator
AmeliaBR: Does this map to something in Unicode? or do we need to
maintain this list?
r12a: I think there is something, not sure if it's fit for this
purpose
r12a: archaic scripts have other examples
fantasai: [reads definition in the spec right now for word-spacing]
florian: We need to maintain a list
myles: Let's ask Unicode to do it
myles: If there is such a facility for these character lists, hard
to believe it's specific for the web platform
myles: and not needed in text editors for example
myles: I don't think the web specs should maintain this list
florian: I agree with part of your statement, should try to work
this out with Unicode
florian: This one specifically maybe, but some are specifically web
platform relatively
florian: since this is relevant to turning HTML markup into text
myles: There are many different markup languages...
fantasai: There are 2 questions
fantasai: if we want to do this, and then whether we maintain the
list or if Unicode should
addison: I think we want to do some research
addison: space or no space is a classic problem
addison: I would be surprised if there weren't something, but don't
know off the top of my head
addison: would be happy to engage
myles: If this is a classical problem, it's been solved, and we
should figure out how it's been solved in the past and re-use
that solution
fantasai: looking at some of the stuff in css-text, we have a
concept of word separators
fantasai: and it includes a set of code points
fantasai: It excludes Ogham space mark
fantasai: since it would cause text to not join any more
[word-spacing has different considerations than white space
collapsing]
fantasai: So general usage in Unicode is text processing
segmentation is not going to account for that concern,
since they don't deal with typesetting
fantasai: So there's gonna be some aspects of how we're using
Unicode codepoints with specific requirements that haven't
come up in Unicode's context so far
fantasai: Unbreaking lines is something that's been hard to explain
to them
myles: Maybe we shouldn't be unbreaking them?
fantasai: Too late for that!
fantasai: HTML has been unbreaking lines for as long as it has
existed, we want to make that ability available to more
languages
addison: fwiw I've had to write this code in the past, and it's not
any fun
addison: It may have been individually solved but not written down
r12a: Like with the other issues, we need to look in more detail
r12a: the Tsek is a syllable separator, not the same as a word joiner
r12a: You could end a line with a Tsek, then start with more Tibetan
on the next line, with indentation, and no real reason to join
those together necessarily
fantasai: You wouldn't make the Tsek go away, just avoid the extra
space going in there
ACTION: i18n to look this issue of word separators next to newlines
ACTION: addison: ensure we respond to css 3481
Received on Tuesday, 3 December 2019 23:15:19 UTC