- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Thu, 23 Oct 2014 01:17:51 -0400
- To: Asmus Freytag <asmusf@ix.netcom.com>, Richard Ishida <ishida@w3.org>, Koji Ishii <kojiishi@gluesoft.co.jp>
- CC: "Phillips, Addison" <addison@lab126.com>, www-style@w3.org, www International <www-international@w3.org>
On 10/22/2014 06:49 PM, Asmus Freytag wrote: > On 10/22/2014 3:12 PM, fantasai wrote: >> >> If you're asking about the BA category, in order to safely >> make a normative requirement, I need it split into two sets: > > BA Category 1 >> - characters after which a break is always permissible >> and recommended, such as the visible word separators > > BA Category 2 >> - characters after which a break is sometimes a good >> idea but not always, such as hyphens and slashes > > Are there any other members of Category 2? I am unsure and don't have the time to solve this particular problem within the next 2 weeks. If you or the i18nwg would like to go through the entire list and annotate it over the next couple weeks, then perhaps we could ask the CSSWG to reconsider this issue. Personally I don't see why we are so concerned. UAX14 is already referenced normatively for all the non-tailorable categories and informatively for all the rest. I am sure that any implementer would be happy to accept bugs filed against their implementation for specific cases where it is clearly better than the line-breaking behavior they have now. I am not in favor of normatively requiring all of UAX14 because I don't want anyone to go filing bugs against implementers where they violate UAX14's tailorable rules and say "you should follow these rules because they're required [unless you can justify otherwise]". If we're filing line-breaking bugs, I want them to be argued on correctness for the particular characters that are not compliant. I want UAX14 to be used as a source of information, not as a source of rules, and for that an informative reference is the right approach. UAX14 line breaking is great *iff* you have a more sophisticated algorithm that is not simply a pairs table, that has some level of prioritization-by-distance or perhaps some other kind of heuristics. It is not, in its current state, suitable for compliance by a pairwise implementation. > Is the issue "generic" to all kinds of hyphens and slashes, > or is it "specific" to special strings like dates, path names > or identifiers? It's fairly broad. E-mail, for example, shouldn't be broken at the hyphen. Neither should :-) nor -x. And of course, as you mention, neither should dates. >> I will not issue a normative recommendation to honor BA >> behavior of the second category. This will result in bad >> line-breaking when implementations try to comply without >> performing a thoughtful survey of each individual case >> and what contextual information the line break may need >> to consider. Please note that this is not a theoretical >> concern: we have already run into this exact problem. > > I suspect that the issue is more about substrings that represent > some special context, rather than the generic occurrence of > these in running text. It was both. When unsure, it is safer to not break than to break. Knowing that the UAX14 pairs table is insufficient for acceptable line breaking, and that UAs attempting to "improve" their implementation by following it will regress, I cannot in good conscience require it as a baseline. I believe, based on past experience of doing exactly that, that this approach will result in problems for our implementers. I stand by my answer in http://lists.w3.org/Archives/Public/www-style/2014Jul/0500.html and I think the existing references to UAX14 are sufficient given the current situation. Which doesn't mean we can't work on creating a safer pairs table that is suitable for dumb line-breaking implementations applied to Web content, and require that in the future. But as Koji and I keep re-iterating, that is a significantly larger project than is in-scope for us right now. ~fantasai
Received on Thursday, 23 October 2014 05:18:27 UTC