[css3-writing-modes] transcript of text orientation discussion from John Daggett on 2011-07-13 (www-style@w3.org from July 2011)

From: John Daggett <jdaggett@mozilla.com>
Date: Wed, 13 Jul 2011 16:21:57 -0700 (PDT)
To: www-style@w3.org
Message-ID: <303024703.509186.1310599317721.JavaMail.root@zimbra1.shared.sjc1.mozilla.com>

I thought it was important to capture the full discussion of text orientation during the CSS WG call today so I recorded the call and wrote up this transcript of the discussion. Note that I didn't include irc comments, those will be in the minutes.

Topic: text-orientation and Elika's proposal for bidi-style resolution of punctuation orientation

http://lists.w3.org/Archives/Public/www-style/2011Jul/0004.html

[jdaggett]
So this is actually Elika's proposal, this is an interpretation of what I proposed. I don't really see the need to compare it to the bidi algorithm because the bidi algorithm is far more complicated but the concept is basically that for the text orientation, even without this proposal, you need a way of distinguishing for different Unicode codepoints which ones are going to rotate right naturally, or are going to rotate in some direction naturally for different vertical writing modes and ones that are going to remain upright. So that's the key problem.

Illustration of this, Figure 7 in the Writing Modes spec:
http://dev.w3.org/csswg/css3-writing-modes/#vertical-intro

Within the text you'll see "Office for Mac 2011". This is the problem that text-orientation is trying to address where you have content that is rendered in different directions depending upon context. So you see that "2011" is rendered upright but it's also rendered rotated right. One of the problems with this is that if you classify codepoints based on their codepoints no matter what you end up with something that's not going to work in one situation or another. In other words, if you have normal ascii punctuation characters for example, it's not clear which direction they should go in all cases. Well, maybe that's not true for ascii punctuation but there are a number of punctuation characters where it's ambiguous. Another ambiguous example are the emoticons, if someone is writing a message in Japanese and they're quoting someone who is tweeting in English where they ended their sentence with a smiley face then the question is should that smiley face be rendered vertically in that Japanese text or horizontally [i.e. rotated right] with the other English text [when displayed in a vertical writing mode]. So what I originally proposed was that for characters that can take either orientation that they follow the surrounding script. And that's what led Elika to talk about the use of the bidi algorithm, to propose something that was analagous to the bidi algorithm.

[fantasai]
I don't understand how your proposal is different from what I wrote.

[Florian]
I believe if you flush out the corner cases of what you're saying you end up with something very similar to what Elika said.

[jdaggett]
I know that but if you put it in the context of bidi that's a much more complicated algorithm. It seems like you obfuscate what you're doing, you don't make it clear.

[Florian]
Don't you need to say this is somewhat similar to?

[jdaggett]
That's fine, I have no problem with that but look at the leadoff description, it goes into all the bidi details. Bidi is a much more complicated problem, there's direction involved whereas this seems much simpler.

[Florian]
Somewhat simpler, I'm not sure I can go as far as "much simpler". I think if you want to drop the introduction, that's fine but I think it only refers to bidi in the intro and after that you stop talking about bidi?

[fantasai]
Yeah, pretty much.

[jdaggett]
Right, but I just don't see the need, bidi has nesting behavior does it not?

[fantasai]
It does but the point is that your suggestion is to use the context and the context is in many case ambiguous and the way to figure out that ambiguity, that is a problem that bidi already solves so that's why I'm looking at the bidi algorithm to see what they do. If you want me to never mention the words bidi in the description then I can rewrite the entire thing and it will come out exactly the same and we will not talk about bidi.

[jdaggett]
No, it will come out differently. The reason it will come out differently is not the behavior but you won't confuse people by bringing in an algorithm that is sufficiently complex to cause a lot of people to not understand what you're saying, there's a big difference there.

[fantasai]
You're contradicting yourself, you're saying it would have the same behavior but it would be different. If it has the same behavior than it's the same.

[jdaggett]
No, I'm saying the explanation is different. You shouldn't explain something that's relatively simple with something that's relatively complex.

[fantasai]
Ok, I will rewrite this so that it doesn't mention the words bidi anywhere. And that would solve your concern, right?

[stevez]
No.

[jdaggett]
Elika, you're not listening. I'm not saying you shouldn't use the word bidi, I'm saying that if you start your explanation by going into the details of the bidi algorithm, it's obtuse. It's just not a good way of explaining it. That's all I'm saying. I'm fine saying "it's very similar to the way the bidi algorithm works" but that's not really relevant to this problem.

ACTION fantasai: rewrite proposal to not reference bidi because it's confusing

[fantasai]
Are we done? Is the action item satisfactory?

[jdaggett]
Elika, this whole subject, we talk about it multiple times but we don't get it resolved in the spec. Text orientation has to be defined, you have to define default behavior and that hasn't been done.

[fantasai]
And that's what we're trying to do. I don't understand what you want from me other than to rewrite this thing so that I don't confuse people by talking about bidi, that I can do. But now what are you complaining about?

[daniel]
I can almost hear John shaking his head. I don't think it's related to the word "bidi", Elika. It's related to the algorithm.

[fantasai]
You want me to change the algorithm that changes the behavior but you said you wanted the same behavior, I don't understand. You want me to write it as a functional programming language? I don't understand what you want me to do. Because you're telling me I want the same behavior you defined but I don't want to you talk about bidi and then you're saying that's not what I said I wanted. I'm not going to talk anymore, you guys can assign my action items and I'll perform them.

[Florian]
I think a lot of people are confused but I'm on the Elika side of being confused.

[jdaggett]
Were you going to make another point?

[stevez]
I have a couple points to make. I sent a message last night which suggested that maybe using the script TR which is 24 if I remember instead of the bidi TR as the model might make more sense. a) it's simpler and b) it covers a number of the cases that John wanted to cover although I'm not totally sure on that. It's a simpler description, it's not algorithmic unfortunately, it points out that there are problems with matching codes. So that was one thing I suggested. The second one was, and this one I'm less sure of, although it was suggested to me by Eric, that since there are a number of full-width characters that cover the common cases like punctuation and common symbols that basically require that if you want the unrotated version that you use the full-width character for that particular symbol. Well, basically the strict Latin alphabet without diacritics and the numbers and a set of symbols and punctuation marks come with a particular full-width code, so if you want the unrotated one then you use the full-width code which would be upright by definition by the width rule and you don't try and play games with the others. I'm less certain of that because I thought I remember John saying there was existing usage where people were not careful about what code they used.

[jdaggett]
Hmm, I'm not sure what context that was from.

[stevez]
Ok, I may be remembering wrong. Basically what Eric was trying to convince me was that we should have simple rules that didn't try to be clever if we didn't need to be.

[jdaggett]
One of the problems and I think this is a fundamental question we either have to specifically address or not address and that is: what is the expected use of full-width codepoints in this model? What Steve, you're sort of suggesting is that full-width codepoints be used as a way of guaranteeing things are upright.

[stevez]
Correct. If you want them upright you use the full-width codepoints and the other ones would be rotated.

[jdaggett]
That breaks things like in-browser search, if you're searching on "NASA" and somebody has used the upright codepoints but you're searching with the normal ascii codepoints for N-A-S-A then you're not going to find it.

[fantasai]
That's a brokenness in the browser, they should be using the NFK decomposition when they're doing searching because there's a lot of things you won't catch if you don't do that.

[jdaggett]
It still seems to me like you're changing the content to influence the presentation. If that's the way things work, then that's the way things work but I think we should be explicitly saying that as opposed to defining it indirectly.

If you have something like N-A-S-A, then there are two different codepoints for each of those letters. One is the normal ascii version and one is the version that's referred to as the full-width version and the distinction in vertical text is that the natural text orientation of ascii is to be rotated to the right and the natural orientation of the full-width characters is to be left upright. And so what we're talking about, these are presentational attributes, they have nothing to do with the content but what we're talking about is using different codepoints, i.e. changing the content to fit the presentation. I'm not sure it's said directly but in the JLREQ they seem to want to ignore the full-width codepoints, they want to get away from using the full-width codepoints. So I think there's a contradiction with that document in some sense.

[stevez]
I'm not disagreeing with anything you say, I said I wasn't convinced about it but I wanted to bring it up because the case was made to me that at least it provides a simple consistent rule. The other point I was going to make is that was my understanding of what the rule was in the original Microsoft implementation.

[jdaggett]
Right, I think that's an historical way of doing things so the question is do we want to perpetuate that.

[stevez]
And I guess the third topic this discussion brings up is what is the mechanism for the person to control it if it the default rule doesn't do what you want.

[jdaggett]
And that's sort of the reason I think we got here was that, okay, we need a default rule, so every codepoint has some natural orientation whether it rotates or it stays upright and how do you control for characters that need to go both ways and how do you allow authors to say, okay, this is the default that I want. Both of those things are slightly different. But I guess part of the problem I don't see the spec as having a normative way of defining these defaults so it's sort of hard to talk about these proposals.

[Florian]
I'm getting confused now, I was under the impression that even though it's currently defined with references to bidi and that might not be ideal so it should be rewritten to explain them without depending upon people to understand bidi first I think Elika's proposal is just that and if it does just that, what's wrong with it.

[jdaggett]
Sorry, I didn't understand what you're saying.

[Florian]
I'm saying the proposal from Elika is now defined in terms of bidi and that is unfortunate but it can be rewritten to say essentially the same thing without mentioning bidi and once that rewriting done isn't that proposal essentially solving the problem we're discussing right now? It seems to me that it is.

[fantasai]
Either that or Appendix Q, whatever that is. Both of them have definitions that if you take them normatively will give you a definition but currently what John is saying is that there's nothing in the spec that is normative right now.

[jdaggett]
Right. And why is it Appendix Q?

[fantasai]
I didn't feel like renumbering it at the time and I figured I'd do it later, I need to renumber the appendices.

[jdaggett]
I actually don't think those should necessarily be in appendices. Specifically, Appendix C, Vertial Typesetting Synthesis, I think if you're just writing a classification scheme that can be defined in a normative appendix but if you're describing the behavior difference, that should be in the body of the text. If it's just a list, these codepoints have this behavior, then there's no reason to have that in the body of the text but if you're explaining it then that text should be in the body in where text orientation is defined and it should be normative.

[jdaggett]
Sort of getting back to what Florian was saying, my original reason for proposing this was that this makes the default a little bit of a better default.

[Florian]
I believe we need something like this. I agree completely that it needs to be normative. Your idea that the explanation should be in the body and if there is a list it can be appendix, I'm fine with that, so if we take Elika's mail, the part that refers to bidi, you rewrite that to not explain things in terms of bidi and explain them in either something simpler or explain from scratch and then you still have to list which character fits in which category affected by the algorithm in an appendix, is that something we all agree we want or am I missing something wrong with that?

[stevez]
So, yes, I don't agree that the current algorithm is what we want even if rewritten and secondly I think we need to answer the bigger questions of that John raised about treatment of full-width characters in general, the issue of whether we should be using codepoints in the decision at all, I was basically arguing for a script-based solution. And there's one final one, which is I don't think shaped scripts should ever be upright.

[daniel]
Only two more minutes for this topic.

[stevez]
We need a way to solve the issues that are still on the floor, because it isn't just the description of the algorithm.

[jdaggett]
I can take as an action item to write up what I think are the questions, and not necessarily what are the solutions and then we can talk on the list about possible solutions there are.

[stevez]
I think that's good, we also have a F2F coming up, I know you can't make it.

[jdaggett]
I can dial in in the afternoon.

Topic: CSS3 Writing Modes schedule

[fantasai]
So, for writing modes I have 2 things to discuss. Right now we have the ability to say whether some text is upright or sideways, you need to put some markup around that text and then you can say this is all sideways or this is all upright. There was a suggestion that we have some kind of @rule or other syntax to say which codepoints are upright or sideways. I think we should not try to address that for this level because otherwise we're not going to finish this spec.

[jdaggett]
Sorry but I think that's dependent on how we answer the other questions.

[fantasai]
The second point is that right now the spec is written so that the default orientation is given informatively and the normative text says that if the font has orientation information about the characters we use it and if it doesn't then you must synthesize that using some appropriate set of rules such as the ones in this informative appendix. That's what the normative text says right now and the result of that is that the author cannot depend on a particular orientation. So they will not get the exact result they wanted, however the reader is assuming that the user agent does something reasonable. they're probably not going to notice that there's anything wrong with the text, most of the characters are going to be oriented the way they're supposed to be there might be one or two that are a little bit off. But in most cases that's not going to happen.

That is the current situation right now. If we want to define this thing in full detail of exactly how it's supposed to work and discuss whether we use this algorithm or that algorithm or a third algorithm or a fourth algorithm with automatic determination and all this kind of stuff and we want to design a new set of controls that say that these unicode codepoints should be upright and these codepoints should be sideways it is going to take us six months to a year before this spec can reach CR. In the meantime, the epub specs are done and people are writing content using the epub specs, they are going to be reciting out implementations that do not implement this correctly even to the point that we have defined so far they are not going to implement correctly and these are all going to be released with the epub prefix. The epub spec is designed such that it will track our behavior if not our syntax so what will happen is that we will have an entire set, an ecosystem of content and implementations that is using epub prefixed syntax but has none of the benefits of the prefixing, it is not stable because, it's not marked as experimental because it's standardized and they're encouraging the deployment of content and not saying this is experimental or it might change or it might break...

[jdaggett]
Elika, i'm sorry but I don't think there has been anything defined so far in such a way that epub is going down one road, it's essentially undefined right now and what you're talking about in terms of non-normative wording, that's essentially undefined so i'm not sure I see a huge issue with epub schedule here.

[fantasai]
It's not about this, it's about the rest of the spec. If the text orientation is slightly off the page isn't going to break, it's not going to look quite right, it's going to be readable, it's going to look the way it should be. But if you mess up with the writing-mode property or some of these other things where the layout changes or you don't implement the layout algorithms correctly, stuff will break. So what i'm pointing out is that we're basically asking epub to have a completely different syntax and implementations that support that syntax to have a different syntax while we figure out this one thing which in my opinion is not that critical and so...

[jdaggett]
Elika, you're asserting that we should lock in things that were not really discussed and haven't been worked out, that was the whole problem with epub depending upon a working draft.

[daniel]
Elika, stop this is going nowhere and we have other agenda items on the radar today. We are going to take this offline by email or F2F. We can't spend the whole conference call on this.

[sylvaing]
I have a very quick request for elika and john. One reason, personally, i'm unable to really have an opinion on this is that, maybe I missed this, is there anyplace that defines, okay, here's a piece of markup and the author has to be able to render it this way, that way, vertical, horizontal and this is what it should look like. In other words, what we should be able to achieve, do we have that? if we have that I would love to look at it because right now i'm hearing this discussion, this in-depth discussion of solutions and frankly I have no idea what to say about them. I'm sensing I'm not the only one here. We can keep some subset of experts talking about it and disagreeing for the next 6 months and that just seems silly to me. If I missed it, the place where we define requirements that we're trying to fix, please share the info but until we agree on what we're trying to solve we can go on like this for a long time and it has been going on for a long time so any help you can provide would be great.

[stevez]
As I understand the problem, we're trying to define the behavior of the default case i.e. No markup. That you can't do experimentally. There isn't any markup to control it.

[sylvaing]
No, there's content, there are basic use cases, you're trying to enable something we don't have today, either japanese content with embedded western content, there's stuff you cannot achieve today. I just would like somebody to be able to explain to me, okay, here is the markup an author has and here is what they want to achieve that they can't do today and I don't even get that and I think that might be helpful, at least to the rest of us.

[stevez]
All I will say is that the original requirements came from michel Michel Suignard from Microsoft, long ago.

[daniel]
Email please. I agree with all of what sylvain said but email.

Received on Wednesday, 13 July 2011 23:22:27 UTC