Re: SVG 1.2 Comments: I18N comments on section 4.12 and its friends...

On Friday, December 10, 2004, 12:19:28 AM, Addison wrote:


APw> Dear SVG WG:

APw> This email is sent on behalf of the Internationalization Working Group.

APw> In the message below and as discussed with a few of your WG
APw> members at the recent AC meeting, we feel that the line breaking
APw> algorithm in section 4.12 of SVG 1.2 Full is problematic. In the
APw> email below (which is slightly edited to correct some errors from
APw> the one originally sent to the I18N-IG list, please note), I have
APw> attempted to describe the problem we discussed in that meeting and
APw> possible solutions that we have considered in I18N subsequently.

Many thanks for that.

APw> We would like to figure out the best method of working with
APw> you to resolve this problem. Would cross-posting email or forming a
APw> Task Force make the most sense to you?

Either of those wold work depending on the anticipated scope and
duration.

APw> Or do you have other
APw> preferences? We would be happy to send representatives to discuss
APw> working out the details with you, if that makes sense.

I must admit that I was very surprised to hear that this was not already
well documented, and am pleased that an existing UAX covers this.


>> -----Original Message-----
>> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
>> Sent: 2004?12?7? 14:46
>> To: w3c-i18n-ig@w3.org
>> Cc: mark.davis@jtcsv.com
>> Subject: SVG 1.2 and line breaking...
>> 
>> 
>> At the recent W3C AC meeting, Richard Ishida and I took the time 
>> to have a meeting with Chris Lilley and others from the SVG WG. 
>> We discussed the problems with SVG 1.2 and the way in which it 
>> implements line breaking and line wrapping. Richard and I gave 
>> the impression that multiline bidi layout was imperfectly 
>> documented, but that appears not to be the case. UAX#9 seems to 
>> document the really tough bidi stuff.
>> 
>> The conclusion we came to is that I18N WG needs to submit a 
>> formal comment (or set of comments) on this topic. The basic idea 
>> that we discussed in our meeting is:
>> 
>> 1. SVG will provide two line breaking modes. 
>> 
>>   a. The default will be 'auto', which MAY be implementation 
>> defined and SHOULD be conformant with UAX#9 and UAX#14 (i.e. the 
>> idea is that it should be more, rather than less, capable to use 
>> 'auto').

I agree that 'more rather than less' is key here. I would like to see
wording that, if nothing better can be provided, then its the same as
the 'other' (reproducible graphics) mode. I don't want it to be used as
a loophole for doing less.

I agree, after discussions, that both modes have their use cases.

>> Auto mode will not guarantee consistent line breaking 
>> across implementations or within differently configured 
>> implementations. But it may provide a higher level of language 
>> awareness, etc.
>> 
>>   b. The "other" mode (which needs a name) will be closely 
>> described by SVG 1.2. There must be an option that allows for 
>> strict UAX#14/UAX#9 based breaking that will be consistent in 
>> layout result across implementations given SVG fonts. This mode 
>> should also offer language specific tailoring and/or options. For 
>> example, for Korean text one might choose space or character 
>> based breaking. We note that UAX#14 leaves some leeway for 
>> certain operations to the implementation.
>> 
>> 2. The wrapping algorithm currently in 4.12 must be scrapped, 
>> since it proceeds from (numerous, fatal) false assumptions about 
>> the layout of text. I have included below a prototype for a new 
>> algorithm, which must be substantially fleshed out. Comments are 
>> very welcome. Vertical layouts have issues left undiscussed here. 
>> See for example 
APw> http://fantasai.inkedblade.net/style/discuss/vertical-text/#css3-text
APw> for just how much fun we are in for.

That sounds pretty suboptimal. However,

>>> CSS3 Text maps vertical scripts' character directionality based on
>>> the paragraph's block progression.

SVG has included vertivcal as well as horizontal text from the beginning
and thus, has modelled on XSL property values with before/after and
start/end. It does not, in consequence, have legacy 'left means down
except when it means up' type issues.



APw> 3. Richard suggested, in fact, that our on-going discussion
APw> with the CSS WG concerning CSS3 (most notably a thread with
APw> "fantasai", the author of the above link) form a basis for SVG's
APw> design. See
APw> http://lists.w3.org/Archives/Member/w3c-i18n-wg/2004Oct/thread.html
APw> whose first message is:
APw> http://lists.w3.org/Archives/Member/w3c-i18n-wg/2004Oct/0002.html
APw> (but follow the thread).

Yes, that is helpful discussion.


APw> -- [[ A Rough-and-Ready Prototype]]--
APw> 1. Each paragraph is processed according to the Unicode
APw> Bidirectional Algorithm in Unicode Standard Annex #9 [UAX#9] in
APw> order to determine directionality and embedding levels for each
APw> character. Base directionality may be defined by the containing
APw> document.
APw> 2. Each paragraph is then processed in logical order to
APw> determine line breaking opportunities between characters, according
APw> to Unicode Standard Annex #14 [UAX#14]. The specific options for
APw> the paragraph's script and language are applied here as
APw> appropriate. This results in "break segments", which consist of
APw> character strings [see CharMod Part1: Fundamentals, section 6.1]
APw> that are bounded on both ends by a line breaking opportunity (or
APw> the start or end of the paragraph).
APw> 3. The "starting position", "next pointer" and "current
APw> pointer" are each set to the (logical) start of the next paragraph
APw> in the text.
APw> 4. The "next pointer" is set to the character that represents
APw> the next break opportunity following the "current pointer's"
APw> position.
APw> 5. Text layout is performed on a single line of the all of
APw> the text between the "starting position" and the "next pointer". 
APw> 6. If the text in (5) does not exceed the size of the current
APw> strip and text remains in the paragraph, set the current pointer =
APw> next pointer and go to (4).
APw> 7. Otherwise place the rendered text into the strip, set
APw> "starting position" = "current pointer" and "next pointer" =
APw> "current pointer" and increment the strip.
APw> 8. If text remains in the paragraph, go to (4).

That sounds good. I will forward it to a developer who is implementing
this; hopefully we can have running code to test it out.

APw> --

APw> Special considerations:
APw> 1. If soft hyphens are used to form breaks, then implementers
APw> should specifically consider UAX#14 section 5.2 "Use of soft
APw> hyphen". In particular, breaking on a soft hyphen may result in
APw> spelling or form changes in certain languages and scripts.

In the 'auto' mode, or in both modes? (This is about ß line breaking to
s s, for example?)

APw> 2. Reshaping in Unicode does not cross directional
APw> boundaries, so this can be used to optimize performance in some
APw> cases.

Yes we already have this notion in SVG 1.1 text chunks.

APw> 3. Some characters in Unicode take their shape from their
APw> current directionality. For example, opening and closing
APw> parenthesis change the direction in which they point based on their
APw> context. See TUS 4.0 section 4.7 for a discussion of mirroring.
APw> Note that mirroring can produce different advance widths or heights
APw> as a result.

APw> 4. Text at the end of the line renders differently than text
APw> in the middle of a line. For example, spaces are generally not
APw> rendered at the end of a line. Implementations should be careful of
APw> "optimizations" that do not layout the entire line again and just
APw> concatenates segments of glyphs. (Note that shaping of characters
APw> may be affected in some scripts when the text doesn't occur at the
APw> end).

APw> 5. "Emergency breaking" may be required if some line of text
APw> is too long to fix any of the remaining strips. The form this takes
APw> is ?????

procrustean????

APw> 6. When a word is added the line height may increase, it can
APw> never decrease from the first glyph rendered. An increase in the
APw> line height can only reduce the space available for text placement
APw> in the span. In the algorithm described above, the line height must
APw> be calculated on the text actually inserted (i.e. between starting
APw> and current position) and *not* be based on the line height of the
APw> last layout pass in step 5.

APw> 7. In (5) note that rendering is done on a line oriented to
APw> the current and base directionality. For example, vertical
APw> rendering is done on a vertical line.

APw> 8. Note that in (3) spans of text may be labeled with a
APw> different language or use scripts to which different breaking
APw> options may apply. Options selected should be applied as
APw> appropriate for each span of text.

Thanks, these concrete and specific suggestions are very helpful.

APw> --

APw> Addison P. Phillips
APw> Director, Globalization Architecture
APw> http://www.webMethods.com

APw> Chair, W3C Internationalization Working Group
APw> http://www.w3.org/International

APw> Internationalization is an architecture. 
APw> It is not a feature.

 




-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group

Received on Friday, 10 December 2004 03:25:41 UTC