Re: Ruby attribute vs. ruby element? from Felix Sasaki on 2006-03-03 (public-i18n-its@w3.org from January to March 2006)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 03 Mar 2006 23:05:18 +0900
To: "Farwell, Kevin" <Kevin.Farwell@lionbridge.com>
Cc: Don Day <dond@us.ibm.com>, gershon@tech-tav.com, JoAnn Hackos <joann.hackos@comtech-serv.com>, public-i18n-its@w3.org
Message-ID: <44084D1E.4010303@w3.org>
Hi Kevin, all,

This is a mail from Martin Duerst, who is also member of the ITS working
group, and - among others - one of the editors of the W3C Ruby TR. I
think he is making some valid points on the topic.

I will write another mail on this in a few hours (I'm still busy at the
W3C technical plenary).

Regards, Felix.

Hello Richard, everybody else,

[Richard or somebody else, can you please make sure that
my comments get back to all the previous participants in
this discussion?]

Richard and Christian are absolutely correct.

http://www.w3.org/International/draft-duerst-ruby-01 is
*totally* outdated. The reason it is in a somewhat prominent
position is that in the very early days of the W3C Internationalization
Activity (before I joined the W3C Team), a lot of internationalization-
related documents were collected and dumped in the activity top level
directory. They are still kept there only because W3C has a policy
of not breaking links. But the draft itself also explicitly says
"Expires 30 February 1997".

What should be used is definitely http://www.w3.org/TR/ruby/. Not only
is this used in XHTML 1.1 and XHTML 2.0, with data and implementations
around (although not terribly numerous), it is also much more thought
through in all respects, and was carefully coordinated with a Japanese
expert group working on markup and typography.

As for the claim that ruby are a presentation issue, that's also
clearly false. In standard usage, ruby are used for indicating
pronounciation exactly in those cases where the reader cannot
figure it out. In those cases, software is also usually useless.
In other cases, ruby are added to provide additional information,
including puns and jokes. In all these cases, ruby are clearly
an authoring issue, not a presentation issue.

There are some cases where we get closer to presentation, e.g.
when taking an existing text with no or few ruby and adding more
ruby to help readability for children, for accessibility, or for
foreigners learning Japanese. But adding ruby cannot be completely
automated, because there are always ambiguous cases. Also, adding
ruby in these cases requires a large dictionary and some language
processing, which is much different from the usual presentation
operations.

There is another presentational aspect in ruby. It's the choice
of element name. The name refers to the most usual form of presentation
(alongside the main text), or more specifically, to the font size
most often used in that case. Markup in the W3C Recommendation
(http://www.w3.org/TR/ruby/) has been designed to work with other
ways of presenting the same information (in particular, for using
parentheses as a fallback), but we retained the name of the most
used presentation form because everybody was used to it. Other,
less specific names, such as <gloss>, have been suggested.


To summarize:

- Use the markup in http://www.w3.org/TR/ruby/.
- Make ruby available to authors.


Hope this helps. Please feel free to ask any additional questions
to me directly.


Regards,    Martin.




Farwell, Kevin wrote:
> Hi,
> 
> If DITA is to be used for textbooks, then yes, Ruby should be included. You're right that the localization process issues would be irrelevant because the text would be authored in Japanese and the author would just add it where it was needed. Because of cultural differences and the obvious alphabet differences, I can't see a children's textbook getting translated from English to Japanese. Localization process problems would not come into the picture even within textbooks, where Ruby is commonly used.
> 
> Not to sound cavalier about his, but it seems Ruby should be tossed in. Nobody can come up with a solid reason to have it, but nobody has come with a solid reason not to either. We have come up with many scenarios where it would be handy, and the cost seems pretty small. It also seems like the discussion has been much more about whether it should be there and less about how it should be there. I came to this late, as you know, but it felt to me that whether had been decided and we were just to talk about how.
> 
> I'll also double back briefly on my element/attribute opinion. That's based on making the implementation easier in localization and authoring, but it certainly isn't based on history. It's based on imagining how I'd most like to do it if I had to. If I had to deal with the table-like grid in the W3C recommendation, I would. I really don't feel strongly enough that the process part of this should become more important than the markup part, but I realize that has been discussed a lot in my emails. If it's in the way, we should drop it. Localization vendors have learned to deal with much worse over the years. 
> 
> Kevin
>    
> 
> -----Original Message-----
> From: Don Day [mailto:dond@us.ibm.com] 
> Sent: Thursday, March 02, 2006 1:45 PM
> To: Farwell, Kevin
> Cc: Lieske, Christian; Felix Sasaki; gershon@tech-tav.com; Richard Ishida; JoAnn Hackos
> Subject: RE: Ruby attribute vs. ruby element?
> 
> I inquired about the potential for Ruby usage within IBM and got quite similar assessments, Kevin.  The most likely use case in our business would be a Japanese translator wanting to enhance the translation results, but we still face the same issues of how to make those language- and situation-dependent changes permanent in the management of sources and translation memories.  This being a very edge case for us, the business case for developing worldwide source management just for this feature probably can't pay off the occasional costs of applying the editorial change after translation.
> 
> Technical writing is not the only application for DITA however.  Is the use-case of children's textbooks compelling enough to make Ruby a regular feature of the DITA language? In those cases, the content is likely written directly in Japanese or other East Asian language, and having the Ruby markup provided as a domain would make sense for any situation close to this. But I am not seeing Ruby as a requirement in the far more general case of content that will require translation, and which therefore introduces the late-in-process editiorial considerations of Ruby.
> 
> Regards,
> --
> Don Day
> Chair, OASIS DITA Technical Committee
> IBM Lead DITA Architect
> Email: dond@us.ibm.com
> 11501 Burnet Rd. MS9033E015, Austin TX 78758
> Phone: +1 512-838-8550
> T/L: 678-8550
> 
> "Where is the wisdom we have lost in knowledge?
>  Where is the knowledge we have lost in information?"
>    --T.S. Eliot
> 
> 
>                                                                            
>              "Farwell, Kevin"                                              
>              <Kevin.Farwell@li                                             
>              onbridge.com>                                              To 
>                                        Don Day/Austin/IBM@IBMUS, "Lieske,  
>              03/02/2006 01:07          Christian"                          
>              PM                        <christian.lieske@sap.com>          
>                                                                         cc 
>                                        "Felix Sasaki" <fsasaki@w3.org>,    
>                                        <gershon@tech-tav.com>, "Richard    
>                                        Ishida" <ishida@w3.org>, "JoAnn     
>                                        Hackos"                             
>                                        <joann.hackos@comtech-serv.com>     
>                                                                    Subject 
>                                        RE: Ruby attribute vs. ruby         
>                                        element?                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
> 
> 
> 
> 
> Hi,
> 
> I considered this question as well, and you're right that, assuming English development, this only matters during or after translation. Translators are not generally relied upon to add XML markup, so it would fall to a localization engineer in cooperation with a translator to add the markup.
> This gets more complex when translation memory is tossed in, so whatever the standard becomes, any localization outfit will need to settle on a process to get the Ruby in and keep it for the next time the phrase is used.
> 
> Since this email arrived, Gershon has asked the question of whether it's important to have Ruby support in DITA. As I said yesterday, the odds are that it will never be used. I got the impression that the issue was only opened as an effort at I18N thoroughness. If that's the goal, Ruby should be added. If the goal is to provide something that will be useful to the maximum number of people in a given profession, that being technical writers, Ruby would never be missed. If the target market is all writers of any kind of content, then it might be.
> 
> Kevin
> 
> 
> -----Original Message-----
> From: Don Day [mailto:dond@us.ibm.com]
> Sent: Thursday, March 02, 2006 11:04 AM
> To: Lieske, Christian
> Cc: Felix Sasaki; gershon@tech-tav.com; Richard Ishida; JoAnn Hackos; Farwell, Kevin
> Subject: RE: Ruby attribute vs. ruby element?
> 
> I've finally come to understand what I think is the critical question to be answered first:
> 
> Where is the editorial requirement for Ruby annotation actually applied?
> American authors have zero awareness of how or whether to indicate that a term requires annotation, therefore the source that is received by a translation service is likely to be inadequately marked for Ruby interests to begin with.
> 
> Everything else seems to flow from that decision--for example, whether to return that edited source to the content owner so that the editorial work does not have to be repeated later, or whether the ruby comments for Japanese audiences will even be the same for Korean or Chinese readers.  If there are locale differences in the applied Ruby, isn't this more of a CM problem true of any XML language?
> 
> Practically, how are existing XML languages like DocBook or TEI doing for internationalization support? Are there best practices or lessons learned from these that should inform on this research?
> 
> Regards,
> --
> Don Day
> Chair, OASIS DITA Technical Committee
> IBM Lead DITA Architect
> Email: dond@us.ibm.com
> 11501 Burnet Rd. MS9033E015, Austin TX 78758
> Phone: +1 512-838-8550
> T/L: 678-8550
> 
> "Where is the wisdom we have lost in knowledge?
>  Where is the knowledge we have lost in information?"
>    --T.S. Eliot
> 
> 
> 
>              "Lieske,
>              Christian"
>              <christian.lieske                                          To
>              @sap.com>                 <gershon@tech-tav.com>
>                                                                         cc
>              03/02/2006 11:11          "JoAnn Hackos"
>              AM                        <joann.hackos@comtech-serv.com>,
>                                        "Farwell, Kevin"
>                                        <Kevin.Farwell@lionbridge.com>,
>                                        "Richard Ishida" <ishida@w3.org>,
>                                        Don Day/Austin/IBM@IBMUS, "Felix
>                                        Sasaki" <fsasaki@w3.org>
>                                                                    Subject
>                                        RE: Ruby attribute vs. ruby
>                                        element?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hi Gershon,
> 
> Thanks a lot for the effort you put into this. From what I can see, your work as well as the input from the colleagues who receive a carbon copy of this mail, will help to create a good proposal for the DITA TC.
> Unfortunately, I won't be to participate in next week's activities since I will be out of office.
> 
> Best regards,
> Christian
> -----Original Message-----
> From: Gershon L Joseph [mailto:gershon@tech-tav.com]
> Sent: Donnerstag, 2. März 2006 15:39
> To: Lieske, Christian; 'Don Day'
> Cc: 'JoAnn Hackos'; 'Farwell, Kevin'; 'Richard Ishida'
> Subject: RE: Ruby attribute vs. ruby element?
> 
> Hi all,
> 
> My understanding is that ruby and dir will be used in the authoring stage in some use cases, and will definitely be used in the translation stage. I don't think they apply only at the publishing stage.
> 
> After researching Ruby and text flow direction over the past few days, and taking Felix's input on the translate attribute into account, I'm leaning more towards adopting the W3C markup. My main reasoning is to keep in sync with the ITS spec that requires translatable text to be in elements, and not in attributes. Since we want DITA to comply with the ITS recommendations, moving in the direction of the ruby attribute would result in us having to retract it in the future when we become fully compliant with the ITS translatability requirements. I think we all prefer to get it right the first time.
> 
> I am investigating further to see whether we really need the full W3C element structure or not, and how best we should specialize the base DITA to provide ruby markup.
> 
> Richard and Felix won't be available for next Monday's conf call, but they will be available the following Monday. I'll post my ruby and dir proposals to this list as soon as they are ready, and we'll see how far we get by Monday. I suspect we'll need another week to get to a final proposal, due to the scope of this project...
> 
> Best Regards,
> Gershon
> 
> -----Original Message-----
> From: Lieske, Christian [mailto:christian.lieske@sap.com]
> Sent: Thursday, March 02, 2006 3:55 PM
> To: Don Day
> Cc: gershon@tech-tav.com; JoAnn Hackos; Farwell, Kevin; Richard Ishida
> Subject: RE: Ruby attribute vs. ruby element?
> 
> Hi Don and everyone,
> 
> Possibly we currently have slightly different conceptions about Ruby. Mine is taken from http://www.w3.org/TR/ruby/ and reads as follows:
> 
>              "Ruby" are short runs of text alongside the base text, typically used in East Asian documents
>              to indicate pronunciation or to provide a short annotation.
> 
> Accordingly, I do not see Ruby as belonging only into the "output" space.
> From my understanding, Ruby usually carries some additional information (quite often presumably explicitly created by a human) which might be of interest to many consumers/readers of a document.
> 
> However, I am not an expert on this. Thus, I copy Richard, who is far more knowledgable than me.
> 
> Best regards,
> Christian
> -----Original Message-----
> From: Don Day [mailto:dond@us.ibm.com]
> Sent: Mittwoch, 1. März 2006 19:14
> To: Farwell, Kevin
> Cc: Lieske, Christian; gershon@tech-tav.com; JoAnn Hackos
> Subject: RE: Ruby attribute vs. ruby element?
> 
> You inferred correctly, Kevin, although on reflection I can see all kinds of problems this itself would engender. Despite the attraction of the idea of One DTD that Rules Them All, I appreciate the freedom we get from splitting the authoring/processing concerns between different DTDs, particularly if the "processing DTDs" effectively are end nodes as far as the translated information not coming back into the authoring process.  I know that there are feelings that authoring/presentation DTDs are tantamount to sedition, so let me affirm that I am neutral on the issue, just pointing out the architecture as a way to isolate certain localization concerns from authoring concerns if indeed the are separate.
> 
> One thing in favor of a single archiecture for both authoring and output is the easier maintenance of all the parts that are typical of DITA.  Maybe instead of output DTDs, we can think about a localization domain that could be slotted into any of the existing authoring DTDs.  This would be far simpler maintenance and would work well to introduce ruby and other phrase-like structures into textual contexts in DITA.  Often we discover as we write--and I'm learning that I like this idea better than maintaining a bunch of top-down specializations.
> 
> Regards,
> --
> Don Day
> Chair, OASIS DITA Technical Committee
> IBM Lead DITA Architect
> Email: dond@us.ibm.com
> 11501 Burnet Rd. MS9033E015, Austin TX 78758
> Phone: +1 512-838-8550
> T/L: 678-8550
> 
> "Where is the wisdom we have lost in knowledge?
>  Where is the knowledge we have lost in information?"
>    --T.S. Eliot
> 
> 
> 
>              "Farwell, Kevin"
>              <Kevin.Farwell@li
>              onbridge.com>                                              To
>                                        Don Day/Austin/IBM@IBMUS,
>              03/01/2006 11:56          <gershon@tech-tav.com>
>              AM                                                         cc
>                                        "Lieske, Christian"
>                                        <christian.lieske@sap.com>, "JoAnn
>                                        Hackos"
>                                        <joann.hackos@comtech-serv.com>
>                                                                    Subject
>                                        RE: Ruby attribute vs. ruby
>                                        element?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hi,
> 
> I need a clarification about something you said below. Does "the 'rendering DTD' that would take the Ruby addition might be specifically for the output merge back into DITA for processing" mean the XML would be valid against one DTD before translation and another after translation? I'm stuck on "specifically for the output merge back into DITA."
> 
> Thanks,
> Kevin
> 
> -----Original Message-----
> From: Don Day [mailto:dond@us.ibm.com]
> Sent: Wednesday, March 01, 2006 3:52 AM
> To: gershon@tech-tav.com
> Cc: 'Lieske, Christian'; 'JoAnn Hackos'; Farwell, Kevin
> Subject: Re: Ruby attribute vs. ruby element?
> 
> Just a thought about the separation of authoring concerns from production and rendering concerns--
> 
> Often there are tensions between authoring requirements and display requirements.  Some SGML and XML architectures are explicitly designed with "authoring DTDs" and "rendering DTDs" in mind that represent the different requirements for those application domains.  Introducing Ruby might be such a case, and its impact for DITA would be the same as for any other general purpose authoring language with multiple output requirements, such as DocBook or TEI.  In this case, the "rendering DTD"
> that would take the Ruby addition might be specifically for the output merge back into DITA for processing, wherein translators can properly identify those terms that particular languages require to be given an appropriate Ruby annotation.
> 
> Paul Prescod made a very key observation, back before DITA was even around, in an xsl-list posting from 1998:
> Re: Why Transformation?
> (http://www.biglist.com/lists/xsl-list/archives/199809/msg00206.html)
> 
> ...The fundamental point is that
> you cannot predict how your data will be used in the future, so you cannot decide on the "optimal" encoding for it. Even if you knew exactly how it was going to be used, the needs of document renditions and data storage are often different.
> 
> In a rendition, redundancy is your friend. In document maintenance, it is your enemy. Actually, redundancy is probably the most important point.
> Often you want to get rid of redundant markup ("Why should I always wrap this series of elements when the wrapper can be logically implied?").
> Often you want to get rid of redundant text ("Why should I type titles for these columns, when I use the same column titles for every table of this
> type?") Sometimes you want to get rid of completely redundant elements:
> ("Why should I the chapter title both in the document, and in the table of contents, and in a dozen cross references")?
> 
> In a rendition, data should often be sorted according to some rule that will help human navigation. In your document database, you probably want to allow authors to enter it in any order. You may even need to sort the same data according to different rules according to the rendering.
> 
> Transformations are the basis of all XML processing. I expect that within a few years all XML-processing applications will have transformation engines built in. Style application are just the start.
> 
> Paul's point back then certainly applies to Ruby just as it also happens to apply to features already designed into DITA, such as conref and xref behaviors, the separation of navigation from content via ditamaps, and the generation of accesible navigation links in tables (done automatically in DITA output processing rather than as an author's concern).
> 
> Getting back to Ruby, I think it fits in with a class of other best practices for XML that have been made in good faith, but not always first discerning whether the practice is one that an author should enforce, or one that tools or other backend business practices can enforce.  It might be useful to borrow the programmer's convention of a state transition diagram to indicate the interfaces between authors, translators, and end users as the data progresses in an end-to-end workflow. The first line of this chart attempts to show some interactions and the following lines suggest other dimensions of interaction that can be added to the diagram.
> This diagram could show feedback loops and data reuse loops, for example.
> Then it should become more clear where a concern such as Ruby or table linking for accessibility should be applied (and I wish to heaven that authors who create Section 508-compliant HTML pages could see what DITA could do for them in the processing stage of this diagram).
> 
> author ---> | --> translator --> | --> processing tools --> | --> publish
> <-- | end user
> (editing)   | (separation/merge) | (NLS-enabled output)     | (locale
> servers)| (browser)
> (guides)    | (ITS rules)        | (specs, technologies)    | (updates)
> | (notifications)
> (files,CMS) | (CMS/TMX/XLIFF)    | (file-based build)       |
> (database/files)| (URLs, bookmarks)
> (UA focus)  | (terminology)      | (certification stds)     | (look and
> feel) | (expectations)
> and so on...
> 
> 
> Regards,
> --
> Don Day
> Chair, OASIS DITA Technical Committee
> IBM Lead DITA Architect
> Email: dond@us.ibm.com
> 11501 Burnet Rd. MS9033E015, Austin TX 78758
> Phone: +1 512-838-8550
> T/L: 678-8550
> 
> "Where is the wisdom we have lost in knowledge?
>  Where is the knowledge we have lost in information?"
>    --T.S. Eliot
> 
> 
> 
> 
>              "Gershon L
> 
>              Joseph"
> 
>              <gershon@tech-tav
> To
>              .com>                     "'Lieske, Christian'"
> 
>                                        <christian.lieske@sap.com>,
> 
>              03/01/2006 03:40          <kevin.farwell@lionbridge.com>
> 
>              AM
> cc
>                                        "'JoAnn Hackos'"
> 
>                                        <joann.hackos@comtech-serv.com>,
> 
>              Please respond to         Don Day/Austin/IBM@IBMUS
> 
>              <gershon@tech-tav
> Subject
>                    .com>               Ruby attribute vs. ruby element?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hi Christian and Kevin,
> 
> I've started researching my action item from Monday's meeting. I've found the following resources to use as the basis of our proposal:
> 
> - early proposal on the ruby attribute
>     http://www.ifi.unizh.ch/groups/mml/people/mduerst/rubi.html
> 
> - later, more mature proposal (based on the above) on the ruby attribute, submitted to W3C
>     http://www.w3.org/International/draft-duerst-ruby-01
>     This is probably the one we should use as the basis for our Translation SC work. (In fact, it's what I'm using as the basis for my
> proposal.)
> 
> - W3C ruby recommendation as it relates to HTML and XHTML
>     Uses elements, not attributes. Allows for more complex ruby markup including
>     having 2 ruby texts associated with a base text; also provides finer control
>     as to what base text the ruby text is attached to.
>     http://www.w3.org/TR/ruby/
> 
> My guess is we don't want to go the ruby element route for DITA at this time, but I'd like your input before I totally rule out this option.
> It's definitely more flexible and more powerful than the ruby element.
> We could allow <ruby> as a child in every element that we would allow the ruby attribute, so the ruby-specific markup would be defined once in the DTD and documented once. However, the work on the DITA-OT (and any other conforming
> processor) would be much more complex. Also, authoring would be much more complex, due to the additional elements required (I'm not sure how much, if any, authoring tools could hide this complexity, because the authors really need to mark up the ruby and base texts carefully and accurately).
> 
> My feeling is we should allow the ruby attribute (or element) on selected inline elements. I'll go through the table Robert and Chris prepared for the translate attribute and add a column for ruby (I'll do the same for the dir attribute). Ruby should be allowed on most (if not
> all) inline elements.
> 
> I'd appreciate your feedback on the ruby attribute vs. ruby element question, and any other input you'd like to provide at this time.
> 
> My proposal currently assumes we're going with the ruby attribute; I'll extend it to cover the ruby element as per W3C if your feedback recommends it.
> 
> Thanks in advance,
> Gershon
> 
> ---
> Gershon L Joseph
> Member, OASIS DITA and DocBook Technical Committees Director of Technology and Single Sourcing Tech-Tav Documentation Ltd.
> office: +972-8-974-1569
> mobile: +972-57-314-1170
> http://www.tech-tav.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
Received on Friday, 3 March 2006 14:05:51 UTC