- From: Doug Cooper <doug@th.net>
- Date: Fri, 24 Mar 2000 11:53:09 +0700
- To: www-international@w3.org
- Cc: Chris Lilley <chris@w3.org>
At 15:37 23/3/00 +0100, Chris Lilley wrote:
>Have you tried XML browsers, or just HTML ones?
HTML.
>OK, so you are saying that the expectation is that explicit segmentation
>using a zero-width space is acceptable to the community of SEA
>non-segmented language users? Certainly it is a lot more tractable than
>per-language dictionary lookup, for implementors.
Unfortunately, this isn't an either/or situation. An explicit zero-width
space
is necessary because:
-- names, loanwords, neologisms, misspellings, etc. create situations in
which standard approaches to word breaking produce errors,
-- since their bounds are not easily identified, these unknown areas can
make much longer sequences unsegmentable, or lead to incorrect
segmentation,
-- you can't assume that reliable dictionaries (or national interchange
standards, for that matter) are available for render-time breaking.
On the other hand, because a) ​ more than doubles a doc's
'text payload' size, and b) most apps that generate HTML do not
insert breaks, some mechanism for breaking at render-time is needed.
However, the solution is certainly _not_ to enshrine one particular
approach -- especially if that approach (dictionary-based maximal
matching) is known to be flakey.
IMHO, a better way is to provide a hook, called just before the standard
line-breaking code, that lets a local app insert zero-width spaces as needed.
Maximal matching can be provided as a default local app, but there are
other, lighter-weight approaches to weak segmentation for less-well-
documented languages (Burmese, say), as well as more robust methods
for better-studied systems like Thai.
>> I'm raising this issue now both in the hope of resurrecting <wbr>,
>Unlikely ...
Yet, hope springs eternal;-). I just got this from ftang@netscape.com:
>We are going to release the beta of Netsape 6 in earily April
>I believe <wbr> works in Netscape 6 beta 1.
But it's the bigger picture that I want to address. If a tag of this
importance (eg, it's ballpark 50% of the text volume of many Thai
html pages) can disappear, then either:
a) somebody is not articulating SEA needs clearly, or
b) somebody is not listening.
If anybody can point me to an archives in which this issue was
discussed and resolved on technical merits, I will happily shut up.
Otherwise, though, it seems to me that the system is broken, and
I'd like very much to figure out which side needs to be fixed.
Best,
Doug Cooper
__________________________________________________
1425 VP Tower, 21/45 Soi Chawakun
Rangnam Road, Rajthevi, Bangkok, 10400
doug@th.net (662) 246-8946 fax (662) 246-8789
Southeast Asian Software Research Center, Bangkok
http://seasrc.th.net --> SEASRC Web site
http://seasrc.th.net/sealang --> SEALANG Web site
Received on Thursday, 23 March 2000 23:59:37 UTC