Re: [csswg-drafts] [css-text-3] Segment Break Transformation Rules for East Asian Width property of A (#337) from Florian Rivoal via GitHub on 2018-12-20 (public-css-archive@w3.org from December 2018)

From: Florian Rivoal via GitHub <sysbot+gh@w3.org>
Date: Thu, 20 Dec 2018 05:12:54 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-448876510-1545282773-sysbot+gh@w3.org>

> I have asked a couple of Unicode experts who are better versed than me about properties related to segmentation and line breaking

This is **not** about line breaking though. This is about processing U+000A that are present in the source code of a document, to decide if a space should be inserted in its place (as would be appropriate for English, where words are space separated), or not (as would be appropriate for Chinese or Japanese, where they are not). This is not about line breaking in the rendered layout.

Finding a rule that would work for all languages in all situations is unrealistic, and is not needed because authors can just avoid using line breaks in their source code to guard against such space insertion. However, finding a subset where we can reliably determine that inserting a space would be the wrong thing to do enables authors of space-less languages to format their source code freely in more situations, and enjoy some of the benefits that users of space-separated languages already enjoy.

Due the languages from which EAW=F/W/H characters come, I believe the proposed rules safely identify such a subset.

By safely, I mean "will not fail to insert a space where one is expected".

> I still think EAW=F/W does a reasonable job, but if people disagrees[...]

I agree that this rule (the second bullet point in [section 4.1.2](https://drafts.csswg.org/css-text-3/#line-break-transform)) does a reasonable job indeed. I won't die on this hill if the consensus is that doing more than that is overkill.

I do think we could do a better job, and that the additional rule about EAW=A (the third bullet point) brings significant benefits (see the situation discussed in the initial comment on this issue) at moderate cost. I'd prefer if we did it. I won't object if we don't.

Whether we do that part or not, Emoji will be inconsistent anyway, and I think it would be nice to fix as that's annoying, and I believe the proposed tailoring to do so is safe. But I'm ok if that's where you want to draw the line.

To me, these are the "appropriate tailoring" that UAX11 is calling for. 

I would not be opposed to using some other unicode property (or combination of properties) than EAW (with the proposed tailoring) if we had a realistic candidate of another classification that works for reliably identifying a safe subset, but I don't think it is helpful to drop EAW based rules for either of the following reasons:
* Even if it provides the information we want, it wasn't designed for that.
* Even if none exist, we can imagine that there should be a theoretical classification that would cover a larger safe subset.

If the result is wrong (as in "inserts spaces where there shouldn't be"), or if there's an easier way to get an equally good (or better) result, then sure. But let's keep in mind [the priority of constituencies](https://www.w3.org/TR/html-design-principles/#priority-of-constituencies):
> consider [...] authors [...] over theoretical purity

----
Digression:

> EAW is about resolving character width as a binary condition [...] half-width or full-width

Even if its goal is to classify things between narrow and wide, given that EAW is a property with 6 values (F/H/W/Na/A/N), describing it as binary undersell the amount of information it caries.

-- 
GitHub Notification of comment by frivoal
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/337#issuecomment-448876510 using your GitHub account

Received on Thursday, 20 December 2018 05:12:56 UTC