W3C home > Mailing lists > Public > www-international@w3.org > January to March 2000

Re: too late for <wbr>, too soon for &#x200B; ?

From: Doug Cooper <doug@th.net>
Date: Fri, 24 Mar 2000 11:53:09 +0700
Message-Id: <>
To: www-international@w3.org
Cc: Chris Lilley <chris@w3.org>
At 15:37 23/3/00 +0100, Chris Lilley wrote:
>Have you tried XML browsers, or just HTML ones?

>OK, so you are saying that the expectation is that explicit segmentation
>using a zero-width space is acceptable to the community of SEA
>non-segmented language users? Certainly it is a lot more tractable than
>per-language dictionary lookup, for implementors.

  Unfortunately, this isn't an either/or situation.  An explicit zero-width
is necessary because:
  -- names, loanwords, neologisms, misspellings, etc. create situations in 
     which standard approaches to word breaking produce errors,
  -- since their bounds are not easily identified, these unknown areas can 
     make much longer sequences unsegmentable, or lead to incorrect
  -- you can't assume that reliable dictionaries (or national interchange
     standards, for that matter) are available for render-time breaking.

   On the other hand, because a) &#x200B; more than doubles a doc's
'text payload' size, and b) most apps that generate HTML do not
insert breaks, some mechanism for breaking at render-time is needed.
However, the solution is certainly _not_ to enshrine one particular 
approach -- especially if that approach (dictionary-based maximal 
matching) is known to be flakey.

   IMHO, a better way is to provide a hook, called just before the standard
line-breaking code, that lets a local app insert zero-width spaces as needed.
Maximal matching can be provided as a default local app, but there are 
other, lighter-weight approaches to weak segmentation for less-well-
documented languages (Burmese, say), as well as more robust methods
for better-studied systems like Thai.

>>   I'm raising this issue now both in the hope of resurrecting <wbr>,
>Unlikely ...

  Yet, hope springs eternal;-).  I just got this from ftang@netscape.com:

>We are going to release the beta of Netsape 6 in earily April
>I believe <wbr> works in Netscape 6 beta 1.

  But it's the bigger picture that I want to address.  If a tag of this 
importance (eg, it's ballpark 50% of the text volume of many Thai 
html pages) can disappear, then either:

  a) somebody is not articulating SEA needs clearly, or
  b) somebody is not listening.

   If anybody can point me to an archives in which this issue was 
discussed and resolved on technical merits, I will happily shut up.
Otherwise, though, it seems to me that the system is broken, and 
I'd like very much to figure out which side needs to be fixed.

  Doug Cooper
          1425 VP Tower, 21/45 Soi Chawakun
        Rangnam Road, Rajthevi, Bangkok, 10400
    doug@th.net (662) 246-8946  fax (662) 246-8789 

  Southeast Asian Software Research Center, Bangkok
  http://seasrc.th.net         -->  SEASRC Web site
  http://seasrc.th.net/sealang --> SEALANG Web site
Received on Thursday, 23 March 2000 23:59:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:19 UTC