RE: use case for font-dependent default orientation

Eric,
  I completely agree that options 2. “font based rules” should not be used. In addition to the reasons you give, there is the problem of adjacent characters coming from different fonts because the CSS font fallback rules have come into play. This happens at the user’s site, not the authoring site, so without markup, the author is never quite sure what he will get. Thus, it is unreasonable to promise anything based on font data.

I can live with option 1 (and, it may indeed be the best solution). But, I would like to propose a very simple context rule that may often do better than using only character based rules. There are two parts to the proposal. The first part is that there need to be 3 major character classes, not just two. These are: S for sideways, U for upright and C for contextual. The latter class would contain things like some punctuation and symbols; things without an obvious orientation in vertical text.

The contextual rule is very simple. If a C class character is preceded by an S class character it is set sideways and its class become S and if it is preceded by a U class character it is set upright and its class becomes U. There is no further textual analysis. This does not solve the matched quotes problem, nor is it intended to. It is only intended to help get the orientation of characters within a run of text to be consistent. If the first character in the intended run is a C class character, then, without markup, its orientation will be wrong ; it will not correspond to that run. My guess is, however, that most runs of a given orientation, say S, will begin with characters that are in the S class and will, using this context rule not need markup to get the orientation to be correct. Such “default” behavior has been considered desirable by the CSS authors. (Note that this behavior is quite predictable so that an author can tell when he will need markup to get the result he desires.)

If you accept this context rule, then the problem to solve is what characters should be marked class “C”. My guess (and it is only a guess) is that it is primarily the characters that are marked Common in the data for UAX #24 of Unicode[1]. If this is too great a problem, then we should go with a character only based solution as you suggest.

Steve Zilles

[1] http://www.unicode.org/Public/UNIDATA/Scripts.txt


From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf Of Eric Muller
Sent: Tuesday, September 13, 2011 11:06 AM
To: www-style
Subject: Re: use case for font-dependent default orientation

I think there are three fundamental approaches. In all cases, there will be scenarios in which the determination is not what the user wants, and consequently, we need some kind of markup to impose an orientation; so we can focus on the determination in the absence of markup:

1. the character alone determines the default orientation

2. the character and the font used to display the character determines the default orientation

3. the character and its context (neighboring characters) determine the default orientation

In my experience, the character context is extremely difficult to use. Consider the quotes (U+201C “ LEFT DOUBLE QUOTATION MARK and friends); if they are used to bracket sideways text, they should probably go sideways, while if they bracket upright text, they should probably to upright. The problem is that it is difficult to reliably determine mechanically what is bracketed, because the same character can be used to start bracketing in some cases and to end bracketing in others; and there are also cases where this same character is used for other purposes than bracketing. Layout is just too low-level (i.e. not enough is known about the text) to make the proper analysis of the text.

The font context is also difficult to use. The problem here is that we have to use circumstantial evidence of the font content, there is no data in the font that specifically answers our question. I have seen a variety of circumstantial evidence being used (for this and other problems): which cmap subtables are present, which characters have glyphs (other than .notdef), which OS/2 ulRange bits are set, whether there are vertical metrics, whether a glyph width is 1em, whether the GSUB 'vert' feature is present, whether GSUB 'vert' changes the glyph, the CID of the glyph for CFF/CID-Keyed/well-known ROS fonts, etc. At the end of the day, all those methods have proven to be fragile (15 years later, we are still tweaking the heuristics in our products, and we still get complaints), and they are not surviving the web world very nicely (e.g. runtime font fallback, font subsetting, etc). They also reflect the primary use that the font designer had in mind, rather than what the document author has in mind.  And in addition, we have the considerations that John mentioned, i.e. the complexity of layout engines.

That's why I think we should go with 1: the character alone determines the default orientation. It is simple, it is robust. I also think that it is adequate, i.e. we will need little markup if any in the vast majority of documents. I agree with Koji that if "
[upright] has priority on compatibility with existing documents rather than multi-lingual capability, I believe it can solve most of unified punctuation issues." Don't be too concerned by the values I did put in my proposal, in particular for the punctuation; they were just to get a starting point. By the way, the logic I used was broadly aligned with what seems agreeable to you: only characters which are definitely not part of the Japanese writing system, in a broad sense, are sideways.

A fourth alternative has been mentioned: the character and its locale. This does not have the problem of 3, as we there is no analysis of the text. The big question in my mind is whether that buys enough to warrant the complexity. I think we need very specific scenarios before we can decide that. I also share some of the concerns expressed by Koiji, in particular the overload of functionality (layout, spell checking, speech synthesis) on a single thing.

I am not worried about a mismatch with existing authoring applications, such as InDesign or Word. They can do whatever they want to determine the orientation, and at the time they generate HTML, they can compare the orientation they determined with the orientation mandated by CSS, and insert markup as needed. In fact, the simpler the CSS determination, the more robust this is.

Eric.

Received on Tuesday, 13 September 2011 22:25:49 UTC