Re: SVG 1.2 Comments: I18N comments on section 4.12 and its friends... from Thomas DeWeese on 2005-01-03 (www-svg@w3.org from January 2005)

From: Thomas DeWeese <Thomas.DeWeese@Kodak.com>
Date: Mon, 03 Jan 2005 09:04:29 -0500
To: aphillips@webmethods.com
CC: www-svg@w3.org, mark.davis@jtcsv.com, w3c-i18n-ig@w3.org
Message-ID: <41D950ED.9060304@Kodak.com>
Hi Addison,

    I am happy to have the I18N's feedback on flowing text.
I apologize for taking a while to respond.  My comments are
inline below.

Addison Phillips [wM] wrote:

>>The conclusion we came to is that I18N WG needs to submit a 
>>formal comment (or set of comments) on this topic. The basic idea 
>>that we discussed in our meeting is:
>>
>>1. SVG will provide two line breaking modes. 

    I would agree with this.

>>2. The wrapping algorithm currently in 4.12 must be scrapped, 
>>since it proceeds from (numerous, fatal) false assumptions about 
>>the layout of text. I have included below a prototype for a new 
>>algorithm, which must be substantially fleshed out. Comments are 
>>very welcome. Vertical layouts have issues left undiscussed here. 

    I am curious what the 'fatal assumptions' are.  I ask because
I suspect that the fatal assumptions are in fact misunderstandings
(see below).

> Special considerations:
> 1. If soft hyphens are used to form breaks, then implementers should 
> specifically consider UAX#14 section 5.2 "Use of soft hyphen". In 
> particular, breaking on a soft hyphen may result in spelling or form 
> changes in certain languages and scripts.

    Is there an algorithmic way to determine this or must you use a
dictionary?  Reading the text of this section it seems to require a
dictionary.  What is the I18N's opinion on the specification of a
"reasonable" line breaking alg independent of dictionaries?

> 2. Reshaping in Unicode does not cross directional boundaries, so 
> this can be used to optimize performance in some cases.

    Yes, the current alg does not consider ligatures across directional
changes.

> 3. Some characters in Unicode take their shape from their current 
> directionality. For example, opening and closing parenthesis change 
> the direction in which they point based on their context. See TUS 
> 4.0 section 4.7 for a discussion of mirroring. Note that mirroring
> can produce different advance widths or heights as a result.

    Absolutely, the glyphs used for layout need to be the proper
glyphs.

> 4. Text at the end of the line renders differently than text in 
> the middle of a line. For example, spaces are generally not rendered 
> at the end of a line. Implementations should be careful of 
> "optimizations" that do not layout the entire line again and just 
> concatenates segments of glyphs. (Note that shaping of characters 
> may be affected in some scripts when the text doesn't occur at the end).

    So I suspect that this is the apparent major flaw in the layout
alg in SVG 1.2.  But in fact this behavior is considered in the
layout alg. presented, however not very explicitly.  In step 5 of
the algorithm it discusses:

        Each Glyph Group has two extents calculated: it's normal
        extent, and it's last in text region extent.

    This is to deal with exactly with this issue.  The last in text
region extent will not include spaces, but would include a soft hyphen
for example.

> 5. "Emergency breaking" may be required if some line of text is too 
> long to fix any of the remaining strips. The form this takes is ?????

    The WG originally decided that they would not consider emergency
breaking.  I didn't agree with this but...  Since one of the targets
for this is small devices (where dictionaries really aren't
appropriate) I would suggest the simple add as may glyphs from the
next glyph group as possible and consider the rest of the glyph
group for the next line.

    Just for background part of the reason the WG didn't want to do
this was because when you consider layout in arbitrary shapes if you
'space down' a bit you may find a region that _is_ wide enough for
the word to fit unbroken.  Perhaps some middle ground can be found
where the alg will 'look ahead' a bit for a suitable location for
the line (a few lines perhaps or anywhere within the current flow
region) and only if one is not found backup and hard break the word.

> 6. When a word is added the line height may increase, it can never 
> decrease from the first glyph rendered. An increase in the line 
> height can only reduce the space available for text placement in 
> the span. In the algorithm described above, the line height must 
> be calculated on the text actually inserted (i.e. between starting 
> and current position) and *not* be based on the line height of the 
> last layout pass in step 5.

     I don't follow you here.  Why are you worried about this based
on the text in the current alg.

> 7. In (5) note that rendering is done on a line oriented to the current 
> and base directionality. For example, vertical rendering is done on a 
> vertical line.

     Yes, absolutely. Note that in step 5 it is not actually placing
any glyphs just figuring out what fits where.  This is based on
extents (which is described later and I think does include appropriate
wording for the vertical case).

> 8. Note that in (3) spans of text may be labeled with a different 
> language or use scripts to which different breaking options may apply. 
> Options selected should be applied as appropriate for each span of text.

     Sure, is there some reason this can't be handled but UAX#14
tagging the character spans with break opportunities?

> -- [[ A Rough-and-Ready Prototype]]--
> 1. Each paragraph is processed according to the Unicode Bidirectional 
>    Algorithm in Unicode Standard Annex #9 [UAX#9] in order to determine 
>    directionality and embedding levels for each character. Base 
>    directionality may be defined by the containing document.
> 2. Each paragraph is then processed in logical order to determine line 
>    breaking opportunities between characters, according to Unicode Standard 
>    Annex #14 [UAX#14]. The specific options for the paragraph's script and 
>    language are applied here as appropriate. This results in "break segments", 
>    which consist of character strings [see CharMod Part1: Fundamentals, 
>    section 6.1] that are bounded on both ends by a line breaking opportunity 
>    (or the start or end of the paragraph).
> 3. The "starting position", "next pointer" and "current pointer" are each 
>    set to the (logical) start of the next paragraph in the text.
> 4. The "next pointer" is set to the character that represents the next 
>    break opportunity following the "current pointer's" position.
> 5. Text layout is performed on a single line of the all of the text between 
>    the "starting position" and the "next pointer". 

      This needs more work.  It at least needs to specify that the text
layout is performed on the text in 'display' direction.  there is also
a question of how lines (strips) that contain multiple spans are to be
handled.  In the current proposal chars/glyphs from the same logical
word that are not co-incident in display order can also be split
across spans (hence the notion of glyph groups).

      To my mind this is the only substantially different step, and I
wonder if there is truly a difference if you consider that each glyph
group will have two versions (normal and last in region).

> 6. If the text in (5) does not exceed the size of the current strip and text
>    remains in the paragraph, set the current pointer = next pointer and go 
>    to (4).
> 7. Otherwise place the rendered text into the strip, set "starting position" 
>    = "current pointer" and "next pointer" = "current pointer" and increment 
>    the strip.
> 8. If text remains in the paragraph, go to (4).
Received on Monday, 3 January 2005 14:04:33 UTC