SVG 1.2 Comments: I18N comments on section 4.12 and its friends...

Dear SVG WG:

This email is sent on behalf of the Internationalization Working Group.

In the message below and as discussed with a few of your WG members at the recent AC meeting, we feel that the line breaking algorithm in section 4.12 of SVG 1.2 Full is problematic. In the email below (which is slightly edited to correct some errors from the one originally sent to the I18N-IG list, please note), I have attempted to describe the problem we discussed in that meeting and possible solutions that we have considered in I18N subsequently. 

We would like to figure out the best method of working with you to resolve this problem. Would cross-posting email or forming a Task Force make the most sense to you? Or do you have other preferences? We would be happy to send representatives to discuss working out the details with you, if that makes sense.

Best Regards,

Addison (for I18N WG)

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
> Sent: 2004年12月7日 14:46
> To: w3c-i18n-ig@w3.org
> Cc: mark.davis@jtcsv.com
> Subject: SVG 1.2 and line breaking...
> 
> 
> At the recent W3C AC meeting, Richard Ishida and I took the time 
> to have a meeting with Chris Lilley and others from the SVG WG. 
> We discussed the problems with SVG 1.2 and the way in which it 
> implements line breaking and line wrapping. Richard and I gave 
> the impression that multiline bidi layout was imperfectly 
> documented, but that appears not to be the case. UAX#9 seems to 
> document the really tough bidi stuff.
> 
> The conclusion we came to is that I18N WG needs to submit a 
> formal comment (or set of comments) on this topic. The basic idea 
> that we discussed in our meeting is:
> 
> 1. SVG will provide two line breaking modes. 
> 
>   a. The default will be 'auto', which MAY be implementation 
> defined and SHOULD be conformant with UAX#9 and UAX#14 (i.e. the 
> idea is that it should be more, rather than less, capable to use 
> 'auto'). Auto mode will not guarantee consistent line breaking 
> across implementations or within differently configured 
> implementations. But it may provide a higher level of language 
> awareness, etc.
> 
>   b. The "other" mode (which needs a name) will be closely 
> described by SVG 1.2. There must be an option that allows for 
> strict UAX#14/UAX#9 based breaking that will be consistent in 
> layout result across implementations given SVG fonts. This mode 
> should also offer language specific tailoring and/or options. For 
> example, for Korean text one might choose space or character 
> based breaking. We note that UAX#14 leaves some leeway for 
> certain operations to the implementation.
> 
> 2. The wrapping algorithm currently in 4.12 must be scrapped, 
> since it proceeds from (numerous, fatal) false assumptions about 
> the layout of text. I have included below a prototype for a new 
> algorithm, which must be substantially fleshed out. Comments are 
> very welcome. Vertical layouts have issues left undiscussed here. 
> See for example 
http://fantasai.inkedblade.net/style/discuss/vertical-text/#css3-text for just how much fun we are in for.

3. Richard suggested, in fact, that our on-going discussion with the CSS WG concerning CSS3 (most notably a thread with "fantasai", the author of the above link) form a basis for SVG's design. See http://lists.w3.org/Archives/Member/w3c-i18n-wg/2004Oct/thread.html whose first message is: http://lists.w3.org/Archives/Member/w3c-i18n-wg/2004Oct/0002.html (but follow the thread).


-- [[ A Rough-and-Ready Prototype]]--
1. Each paragraph is processed according to the Unicode Bidirectional Algorithm in Unicode Standard Annex #9 [UAX#9] in order to determine directionality and embedding levels for each character. Base directionality may be defined by the containing document.
2. Each paragraph is then processed in logical order to determine line breaking opportunities between characters, according to Unicode Standard Annex #14 [UAX#14]. The specific options for the paragraph's script and language are applied here as appropriate. This results in "break segments", which consist of character strings [see CharMod Part1: Fundamentals, section 6.1] that are bounded on both ends by a line breaking opportunity (or the start or end of the paragraph).
3. The "starting position", "next pointer" and "current pointer" are each set to the (logical) start of the next paragraph in the text.
4. The "next pointer" is set to the character that represents the next break opportunity following the "current pointer's" position.
5. Text layout is performed on a single line of the all of the text between the "starting position" and the "next pointer". 
6. If the text in (5) does not exceed the size of the current strip and text remains in the paragraph, set the current pointer = next pointer and go to (4).
7. Otherwise place the rendered text into the strip, set "starting position" = "current pointer" and "next pointer" = "current pointer" and increment the strip.
8. If text remains in the paragraph, go to (4).
--

Special considerations:
1. If soft hyphens are used to form breaks, then implementers should specifically consider UAX#14 section 5.2 "Use of soft hyphen". In particular, breaking on a soft hyphen may result in spelling or form changes in certain languages and scripts.

2. Reshaping in Unicode does not cross directional boundaries, so this can be used to optimize performance in some cases.

3. Some characters in Unicode take their shape from their current directionality. For example, opening and closing parenthesis change the direction in which they point based on their context. See TUS 4.0 section 4.7 for a discussion of mirroring. Note that mirroring can produce different advance widths or heights as a result.

4. Text at the end of the line renders differently than text in the middle of a line. For example, spaces are generally not rendered at the end of a line. Implementations should be careful of "optimizations" that do not layout the entire line again and just concatenates segments of glyphs. (Note that shaping of characters may be affected in some scripts when the text doesn't occur at the end).

5. "Emergency breaking" may be required if some line of text is too long to fix any of the remaining strips. The form this takes is ?????

6. When a word is added the line height may increase, it can never decrease from the first glyph rendered. An increase in the line height can only reduce the space available for text placement in the span. In the algorithm described above, the line height must be calculated on the text actually inserted (i.e. between starting and current position) and *not* be based on the line height of the last layout pass in step 5.

7. In (5) note that rendering is done on a line oriented to the current and base directionality. For example, vertical rendering is done on a vertical line.

8. Note that in (3) spans of text may be labeled with a different language or use scripts to which different breaking options may apply. Options selected should be applied as appropriate for each span of text.
--

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

Received on Thursday, 9 December 2004 23:21:02 UTC