W3C home > Mailing lists > Public > public-css-archive@w3.org > October 2019

Re: [csswg-drafts] [css-text] Writing System prose is currently unimplementable on ICU (#4445)

From: Florian Rivoal via GitHub <sysbot+gh@w3.org>
Date: Thu, 24 Oct 2019 07:06:33 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-545777613-1571900792-sysbot+gh@w3.org>
> ICU does not abide by the above quoted prose, and erroneously uses Japanese-style line breaking.

Right, it feels like an ICU bug, although the difference between a missing feature and a bug is a subjective one, and I'd rather call this a missing feature.

ICU is smart enough to take the content language into account, and will do line breaking differently on Japanese vs English, as css-text expects it to. So far so good. However, it is insufficiently subtle about how it handles this language sensitivity.

This test is checking that the rules of https://drafts.csswg.org/css-text-3/#script-tagging are respected, whose intro you quoted in your opening comment. This specific text exercises the first sentence in the normative statements later in that section that @AmeliaBR quoted:

> UAs should assume the most common writing system of the specified content language when choosing typographic behaviors such as line-breaking or justification strategies, **but must not assume that writing system if the author has explicitly indicated a different one**. 

In this test, the language is set to "ja-Latn". ICU uses Japanese style line breaking, even though the writing system is explicitly set the writing-system part of the language tag to something else. Per the sentence above, that is what ought to drive what type of line breaking is used.

I am not an expert on ICU, but my sense is that while ICU currently doesn't handle this case as css-text-3 expects it to, I don't think think handling this correctly would go against the goals of ICU.

ICU already goes further than what UAX14 strictly requires. A large part of UAX14 is "shoulds" rather than "musts", and are explicitly called out as being tailorable. A number of differences between line-break loose/normal/strict are implemented in ICU based on what CSS says, while UAX14 merely allows for that sort of variation, without calling for these specific variations. In particular, the specific differences in line breaking when the language is different aren't part of UAX14 (though allowed by it). Paying attention to the writing-system subtag part of the language tag as well as the primary language subtag falls within that same kind of tailorings, and so while I agree it should be discussed and fixed in ICU, I don't think it needs any fix in Unicode-the-spec itself.

----

My recommendation would therefore be to work upstream with ICU to improve their behavior to handle this sort of case.

In the meanwhile, what do we do about the spec and the test(s)? Possibly nothing.

I would expect the failing test to help remind us that this is something we should fix eventually, independently of where the fix belongs, which it seems to have done in this case. I fully expect browsers to set their own priorities independently, and accept that certain tests will keep on failing for a while if you chose not to fix this right now (or if fixing it in ICU means the fix won't land in the browser for a while).

If we're trying to push this spec out of CR but haven't got around fixing this yet, changing the MUST to a SHOULD (to be restored to a MUST in the following level), or marking it at risk, to avoid blocking on it could make sense to me, but so far we're not even in CR yet (although I think we're pretty close).

Maybe we could already mark it as a should right now, in recognition this is a thing that makes sense to do but isn't done yet, but I worry that making test failures go under the radar will decrease the chances of ever getting to the bottom of it.



-- 
GitHub Notification of comment by frivoal
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/4445#issuecomment-545777613 using your GitHub account
Received on Thursday, 24 October 2019 07:06:34 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 06:41:55 UTC