Re: [css-text] Shaping for break-all/word-break from Behdad Esfahbod on 2016-03-09 (www-style@w3.org from March 2016)

From: Behdad Esfahbod <behdad@behdad.org>
Date: Tue, 8 Mar 2016 19:15:06 -0800
To: Koji Ishii <kojiishi@gmail.com>
Cc: "www-style@w3.org" <www-style@w3.org>
Message-ID: <CAF63+7UqbEXG4tKN2W_Zqgs7zdQAsjr525XxXhojYw=oW4bMTg@mail.gmail.com>

On Tue, Mar 8, 2016 at 7:06 PM, Koji Ishii <kojiishi@gmail.com> wrote:

> I'm slow to understand, appreciate your expertise.
>

Happy to discuss as long as it takes to clear up :).

> On Wed, Mar 9, 2016 at 11:31 AM, Behdad Esfahbod <behdad@behdad.org>
> wrote:
>
>>
>> Not necessarily.  And, that is even if you *can* map.  For example, if
>> you have a 'ffi' ligature, there's no way to break in between it without
>> shaping.
>>
>
> Yeah, let's go with the assumption we can map. That part I can handle.
>
>
>> But even if it was possible, the results are not necessarily the same.
>> That holds true for all scripts and languages, not just Arabic.  It's a
>> property of how OpenType works.  Fonts have rules that match arbitrary
>> sequences.  For example, a font can have a rule such that if there are five
>> "x" glyphs after eachother, then it will replace the middle one with an
>> alternate form.  This might not be a realworld example, but that's what
>> fonts can do, and there definitely are fonts that do similar things, in
>> their 'calt', Contextual Alternates, feature.  When you get to script
>> styles like Nastaliq, it happens ALL the time.  But then again, break-all
>> and caligraphy is a combination we don't have to fully support.  However, I
>> think pretty much any script-style Latin font will also be broken.
>>
>
> Let me rephase my question.
>
> When break-all a word of 10 chars at 3:
> 1. Shape the 3 with the rest as text-after.
> 2. Shape the 3 without text-after.
> 3. Shape the 10, find glyphs that map to the 3 chars and use them.
>
> I think you're talking about the diff between 1 and 2, correct? Is 3 still
> differ from 1?
>

All three can be different.

> If 1 and 3 are the same, it helps our efficiency a bit.
>

I've maintained text layout engines all my professional life, I fully
understand how it helps efficiency :).

For that reason, I'm going to implement a piece of additional API in
HarfBuzz, called "safe-to-break", which can tell the client which points in
the text it is safe to break text and shape sides separately and still get
the same results.  With that kind of API, you can break line, then walk
outwards from the break location, find the first safe-to-break point, and
reshape just the slice there, which for most simple cases will be empty.

You can track the safe-to-break API progress here:

   https://github.com/behdad/harfbuzz/issues/224

behdad

-- 
behdad
http://behdad.org/

Received on Wednesday, 9 March 2016 03:15:35 UTC