- From: Doug Cooper <doug@th.net>
- Date: Fri, 24 Mar 2000 11:53:09 +0700
- To: www-international@w3.org
- Cc: Chris Lilley <chris@w3.org>
At 15:37 23/3/00 +0100, Chris Lilley wrote: >Have you tried XML browsers, or just HTML ones? HTML. >OK, so you are saying that the expectation is that explicit segmentation >using a zero-width space is acceptable to the community of SEA >non-segmented language users? Certainly it is a lot more tractable than >per-language dictionary lookup, for implementors. Unfortunately, this isn't an either/or situation. An explicit zero-width space is necessary because: -- names, loanwords, neologisms, misspellings, etc. create situations in which standard approaches to word breaking produce errors, -- since their bounds are not easily identified, these unknown areas can make much longer sequences unsegmentable, or lead to incorrect segmentation, -- you can't assume that reliable dictionaries (or national interchange standards, for that matter) are available for render-time breaking. On the other hand, because a) ​ more than doubles a doc's 'text payload' size, and b) most apps that generate HTML do not insert breaks, some mechanism for breaking at render-time is needed. However, the solution is certainly _not_ to enshrine one particular approach -- especially if that approach (dictionary-based maximal matching) is known to be flakey. IMHO, a better way is to provide a hook, called just before the standard line-breaking code, that lets a local app insert zero-width spaces as needed. Maximal matching can be provided as a default local app, but there are other, lighter-weight approaches to weak segmentation for less-well- documented languages (Burmese, say), as well as more robust methods for better-studied systems like Thai. >> I'm raising this issue now both in the hope of resurrecting <wbr>, >Unlikely ... Yet, hope springs eternal;-). I just got this from ftang@netscape.com: >We are going to release the beta of Netsape 6 in earily April >I believe <wbr> works in Netscape 6 beta 1. But it's the bigger picture that I want to address. If a tag of this importance (eg, it's ballpark 50% of the text volume of many Thai html pages) can disappear, then either: a) somebody is not articulating SEA needs clearly, or b) somebody is not listening. If anybody can point me to an archives in which this issue was discussed and resolved on technical merits, I will happily shut up. Otherwise, though, it seems to me that the system is broken, and I'd like very much to figure out which side needs to be fixed. Best, Doug Cooper __________________________________________________ 1425 VP Tower, 21/45 Soi Chawakun Rangnam Road, Rajthevi, Bangkok, 10400 doug@th.net (662) 246-8946 fax (662) 246-8789 Southeast Asian Software Research Center, Bangkok http://seasrc.th.net --> SEASRC Web site http://seasrc.th.net/sealang --> SEALANG Web site
Received on Thursday, 23 March 2000 23:59:37 UTC