Re: FW: Ruby attribute vs. ruby element? from Martin Duerst on 2006-03-03 (public-i18n-its@w3.org from January to March 2006)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 03 Mar 2006 15:50:17 +0900
To: "Richard Ishida" <ishida@w3.org>, <public-i18n-its@w3.org>
Message-Id: <6.0.0.20.2.20060303104215.05d51060@localhost>
Hello Richard, everybody else,

[Richard or somebody else, can you please make sure that
my comments get back to all the previous participants in
this discussion?]

Richard and Christian are absolutely correct.

http://www.w3.org/International/draft-duerst-ruby-01 is
*totally* outdated. The reason it is in a somewhat prominent
position is that in the very early days of the W3C Internationalization
Activity (before I joined the W3C Team), a lot of internationalization-
related documents were collected and dumped in the activity top level
directory. They are still kept there only because W3C has a policy
of not breaking links. But the draft itself also explicitly says
"Expires 30 February 1997".

What should be used is definitely http://www.w3.org/TR/ruby/. Not only
is this used in XHTML 1.1 and XHTML 2.0, with data and implementations
around (although not terribly numerous), it is also much more thought
through in all respects, and was carefully coordinated with a Japanese
expert group working on markup and typography.

As for the claim that ruby are a presentation issue, that's also
clearly false. In standard usage, ruby are used for indicating
pronounciation exactly in those cases where the reader cannot
figure it out. In those cases, software is also usually useless.
In other cases, ruby are added to provide additional information,
including puns and jokes. In all these cases, ruby are clearly
an authoring issue, not a presentation issue.

There are some cases where we get closer to presentation, e.g.
when taking an existing text with no or few ruby and adding more
ruby to help readability for children, for accessibility, or for
foreigners learning Japanese. But adding ruby cannot be completely
automated, because there are always ambiguous cases. Also, adding
ruby in these cases requires a large dictionary and some language
processing, which is much different from the usual presentation
operations.

There is another presentational aspect in ruby. It's the choice
of element name. The name refers to the most usual form of presentation
(alongside the main text), or more specifically, to the font size
most often used in that case. Markup in the W3C Recommendation
(http://www.w3.org/TR/ruby/) has been designed to work with other
ways of presenting the same information (in particular, for using
parentheses as a fallback), but we retained the name of the most
used presentation form because everybody was used to it. Other,
less specific names, such as <gloss>, have been suggested.


To summarize:

- Use the markup in http://www.w3.org/TR/ruby/.
- Make ruby available to authors.


Hope this helps. Please feel free to ask any additional questions
to me directly.


Regards,    Martin.



At 01:32 06/03/03, Richard Ishida wrote:
 >
 >FYI
 >
 >
 >
 >============
 >Richard Ishida
 >Internationalization Lead
 >W3C (World Wide Web Consortium)

 >-----Original Message-----
 >From: Richard Ishida [mailto:ishida@w3.org]
 >Sent: 02 March 2006 14:46
 >To: 'Lieske, Christian'; 'Don Day'
 >Cc: 'gershon@tech-tav.com'; 'JoAnn Hackos'; 'Farwell, Kevin'
 >Subject: RE: Ruby attribute vs. ruby element?
 >
 >It looks like my input is needed here.  Unfortunately I'm being kept really
 >busy by the W3C Technical Plenary in France at the moment.  I'll try to
 >connect with this thread asap.  In the meantime, it appears from this
 >message that the following might prove useful:
 >
 >FAQ: What is ruby?
 >http://www.w3.org/International/questions/qa-ruby
 >
 >Tutorial: Ruby Markup and Styling
 >http://www.w3.org/International/tutorials/ruby/
 >
 >Ruby used in the typical East Asian context (eg. furigana) would typically
 >be used by East Asian content authors during the authoring process.  Ruby
 >may also be used by translators translating into East Asian languages.
 >
 >Some people may also use ruby for inline annotations, unrelated to typical
 >East Asian use.
 >
 >I haven't researched it, but I'm pretty certain that
 >http://www.w3.org/International/draft-duerst-ruby-01 has probably been
 >superceded by http://www.w3.org/TR/ruby/ .  (Martin, if you are reading this
 >please confirm.)
 >
 >I suggest that you incorporate http://www.w3.org/TR/ruby/ since it is what
 >is available in XHTML 1.1 and will be used for XHTML 2, and will probably be
 >what we use for ITS.  Note that there are two conformance levels specified -
 >'simple ruby' and 'complex ruby'.
 >
 >Note also that putting ruby text in attributes runs counter to the idea that
 >attributes are a bad place for content.  Attribute text cannot be annotated
 >to properly support bidirectional text, language changes, etc.
 >
 >Hope that helps.
 >RI
 >
 >
 >============
 >Richard Ishida
 >Internationalization Lead
 >W3C (World Wide Web Consortium)
 >
 >http://www.w3.org/People/Ishida/
 >http://www.w3.org/International/
 >http://people.w3.org/rishida/blog/
 >http://www.flickr.com/photos/ishida/
 >
 >
 >> -----Original Message-----
 >> From: Lieske, Christian [mailto:christian.lieske@sap.com]
 >> Sent: 02 March 2006 13:55
 >> To: Don Day
 >> Cc: gershon@tech-tav.com; JoAnn Hackos; Farwell, Kevin; Richard Ishida
 >> Subject: RE: Ruby attribute vs. ruby element?
 >>
 >> Hi Don and everyone,
 >>
 >> Possibly we currently have slightly different conceptions about Ruby.
 >> Mine is taken from http://www.w3.org/TR/ruby/ and reads as follows:
 >>
 >>      "Ruby" are short runs of text alongside the base text, typically
 >used
 >> in East Asian documents
 >>      to indicate pronunciation or to provide a short annotation.
 >>
 >> Accordingly, I do not see Ruby as belonging only into the "output"
 >> space. From my understanding, Ruby usually carries some additional
 >> information (quite often presumably explicitly created by a human)
 >> which might be of interest to many consumers/readers of a document.
 >>
 >> However, I am not an expert on this. Thus, I copy Richard, who is far
 >> more knowledgable than me.
 >>
 >> Best regards,
 >> Christian
 >> -----Original Message-----
 >> From: Don Day [mailto:dond@us.ibm.com]
 >> Sent: Mittwoch, 1. M舐z 2006 19:14
 >> To: Farwell, Kevin
 >> Cc: Lieske, Christian; gershon@tech-tav.com; JoAnn Hackos
 >> Subject: RE: Ruby attribute vs. ruby element?
 >>
 >> You inferred correctly, Kevin, although on reflection I can see all
 >> kinds of problems this itself would engender. Despite the attraction
 >> of the idea of One DTD that Rules Them All, I appreciate the freedom
 >> we get from splitting the authoring/processing concerns between
 >> different DTDs, particularly if the "processing DTDs" effectively are
 >> end nodes as far as the translated information not coming back into
 >> the authoring process.  I know that there are feelings that
 >> authoring/presentation DTDs are tantamount to sedition, so let me
 >> affirm that I am neutral on the issue, just pointing out the
 >> architecture as a way to isolate certain localization concerns from
 >> authoring concerns if indeed the are separate.
 >>
 >> One thing in favor of a single archiecture for both authoring and
 >> output is the easier maintenance of all the parts that are typical of
 >> DITA.  Maybe instead of output DTDs, we can think about a localization
 >> domain that could be slotted into any of the existing authoring DTDs.
 >> This would be far simpler maintenance and would work well to introduce
 >> ruby and other phrase-like structures into textual contexts in DITA.
 >> Often we discover as we write--and I'm learning that I like this idea
 >> better than maintaining a bunch of top-down specializations.
 >>
 >> Regards,
 >> --
 >> Don Day
 >> Chair, OASIS DITA Technical Committee
 >> IBM Lead DITA Architect
 >> Email: dond@us.ibm.com
 >> 11501 Burnet Rd. MS9033E015, Austin TX 78758
 >> Phone: +1 512-838-8550
 >> T/L: 678-8550
 >>
 >> "Where is the wisdom we have lost in knowledge?
 >>  Where is the knowledge we have lost in information?"
 >>    --T.S. Eliot
 >>
 >>
 >>
 >>
 >>              "Farwell, Kevin"
 >>
 >>              <Kevin.Farwell@li
 >>
 >>              onbridge.com>
 >>           To
 >>                                        Don
 >> Day/Austin/IBM@IBMUS,
 >>              03/01/2006 11:56          <gershon@tech-tav.com>
 >>
 >>              AM
 >>           cc
 >>                                        "Lieske, Christian"
 >>
 >>
 >> <christian.lieske@sap.com>, "JoAnn
 >>                                        Hackos"
 >>
 >>
 >> <joann.hackos@comtech-serv.com>
 >>
 >>      Subject
 >>                                        RE: Ruby attribute vs.
 >> ruby
 >>                                        element?
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >> Hi,
 >>
 >> I need a clarification about something you said below. Does "the
 >> 'rendering DTD' that would take the Ruby addition might be
 >> specifically for the output merge back into DITA for processing" mean
 >> the XML would be valid against one DTD before translation and another
 >> after translation? I'm stuck on "specifically for the output merge
 >> back into DITA."
 >>
 >> Thanks,
 >> Kevin
 >>
 >> -----Original Message-----
 >> From: Don Day [mailto:dond@us.ibm.com]
 >> Sent: Wednesday, March 01, 2006 3:52 AM
 >> To: gershon@tech-tav.com
 >> Cc: 'Lieske, Christian'; 'JoAnn Hackos'; Farwell, Kevin
 >> Subject: Re: Ruby attribute vs. ruby element?
 >>
 >> Just a thought about the separation of authoring concerns from
 >> production and rendering concerns--
 >>
 >> Often there are tensions between authoring requirements and display
 >> requirements.  Some SGML and XML architectures are explicitly designed
 >> with "authoring DTDs" and "rendering DTDs" in mind that represent the
 >> different requirements for those application domains.  Introducing
 >> Ruby might be such a case, and its impact for DITA would be the same
 >> as for any other general purpose authoring language with multiple
 >> output requirements, such as DocBook or TEI.  In this case, the
 >> "rendering DTD"
 >> that would take the Ruby addition might be specifically for the output
 >> merge back into DITA for processing, wherein translators can properly
 >> identify those terms that particular languages require to be given an
 >> appropriate Ruby annotation.
 >>
 >> Paul Prescod made a very key observation, back before DITA was even
 >> around, in an xsl-list posting from 1998:
 >> Re: Why Transformation?
 >> (http://www.biglist.com/lists/xsl-list/archives/199809/msg00206.html)
 >>
 >> ...The fundamental point is that
 >> you cannot predict how your data will be used in the future, so you
 >> cannot decide on the "optimal" encoding for it. Even if you knew
 >> exactly how it was going to be used, the needs of document renditions
 >> and data storage are often different.
 >>
 >> In a rendition, redundancy is your friend. In document maintenance, it
 >> is your enemy. Actually, redundancy is probably the most important
 >> point.
 >> Often you want to get rid of redundant markup ("Why should I always
 >> wrap this series of elements when the wrapper can be logically
 >> implied?").
 >> Often you want to get rid of redundant text ("Why should I type titles
 >> for these columns, when I use the same column titles for every table
 >> of this
 >> type?") Sometimes you want to get rid of completely redundant
 >> elements:
 >> ("Why should I the chapter title both in the document, and in the
 >> table of contents, and in a dozen cross references")?
 >>
 >> In a rendition, data should often be sorted according to some rule
 >> that will help human navigation. In your document database, you
 >> probably want to allow authors to enter it in any order. You may even
 >> need to sort the same data according to different rules according to
 >> the rendering.
 >>
 >> Transformations are the basis of all XML processing. I expect that
 >> within a few years all XML-processing applications will have
 >> transformation engines built in. Style application are just the start.
 >>
 >> Paul's point back then certainly applies to Ruby just as it also
 >> happens to apply to features already designed into DITA, such as
 >> conref and xref behaviors, the separation of navigation from content
 >> via ditamaps, and the generation of accesible navigation links in
 >> tables (done automatically in DITA output processing rather than as an
 >> author's concern).
 >>
 >> Getting back to Ruby, I think it fits in with a class of other best
 >> practices for XML that have been made in good faith, but not always
 >> first discerning whether the practice is one that an author should
 >> enforce, or one that tools or other backend business practices can
 >> enforce.  It might be useful to borrow the programmer's convention of
 >> a state transition diagram to indicate the interfaces between authors,
 >> translators, and end users as the data progresses in an end-to-end
 >> workflow. The first line of this chart attempts to show some
 >> interactions and the following lines suggest other dimensions of
 >> interaction that can be added to the diagram.
 >> This diagram could show feedback loops and data reuse loops, for
 >> example.
 >> Then it should become more clear where a concern such as Ruby or table
 >> linking for accessibility should be applied (and I wish to heaven that
 >> authors who create Section 508-compliant HTML pages could see what
 >> DITA could do for them in the processing stage of this diagram).
 >>
 >> author ---> | --> translator --> | --> processing tools --> |
 >> --> publish
 >> <-- | end user
 >> (editing)   | (separation/merge) | (NLS-enabled output)     | (locale
 >> servers)| (browser)
 >> (guides)    | (ITS rules)        | (specs, technologies)    |
 >> (updates)
 >> | (notifications)
 >> (files,CMS) | (CMS/TMX/XLIFF)    | (file-based build)       |
 >> (database/files)| (URLs, bookmarks)
 >> (UA focus)  | (terminology)      | (certification stds)     |
 >> (look and
 >> feel) | (expectations)
 >> and so on...
 >>
 >>
 >> Regards,
 >> --
 >> Don Day
 >> Chair, OASIS DITA Technical Committee
 >> IBM Lead DITA Architect
 >> Email: dond@us.ibm.com
 >> 11501 Burnet Rd. MS9033E015, Austin TX 78758
 >> Phone: +1 512-838-8550
 >> T/L: 678-8550
 >>
 >> "Where is the wisdom we have lost in knowledge?
 >>  Where is the knowledge we have lost in information?"
 >>    --T.S. Eliot
 >>
 >>
 >>
 >>
 >>              "Gershon L
 >>
 >>              Joseph"
 >>
 >>              <gershon@tech-tav
 >> To
 >>              .com>                     "'Lieske, Christian'"
 >>
 >>                                        <christian.lieske@sap.com>,
 >>
 >>              03/01/2006 03:40          <kevin.farwell@lionbridge.com>
 >>
 >>              AM
 >> cc
 >>                                        "'JoAnn Hackos'"
 >>
 >>
 >> <joann.hackos@comtech-serv.com>,
 >>
 >>              Please respond to         Don Day/Austin/IBM@IBMUS
 >>
 >>              <gershon@tech-tav
 >> Subject
 >>                    .com>               Ruby attribute vs.
 >> ruby element?
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >> Hi Christian and Kevin,
 >>
 >> I've started researching my action item from Monday's meeting. I've
 >> found the following resources to use as the basis of our proposal:
 >>
 >> - early proposal on the ruby attribute
 >>     http://www.ifi.unizh.ch/groups/mml/people/mduerst/rubi.html
 >>
 >> - later, more mature proposal (based on the above) on the ruby
 >> attribute, submitted to W3C
 >>     http://www.w3.org/International/draft-duerst-ruby-01
 >>     This is probably the one we should use as the basis for our
 >> Translation SC work. (In fact, it's what I'm using as the basis for my
 >> proposal.)
 >>
 >> - W3C ruby recommendation as it relates to HTML and XHTML
 >>     Uses elements, not attributes. Allows for more complex ruby markup
 >> including
 >>     having 2 ruby texts associated with a base text; also provides
 >> finer control
 >>     as to what base text the ruby text is attached to.
 >>     http://www.w3.org/TR/ruby/
 >>
 >> My guess is we don't want to go the ruby element route for DITA at
 >> this time, but I'd like your input before I totally rule out this
 >> option.
 >> It's definitely more flexible and more powerful than the ruby element.
 >> We could allow <ruby> as a child in every element that we would allow
 >> the ruby attribute, so the ruby-specific markup would be defined once
 >> in the DTD and documented once.
 >> However, the work on the DITA-OT (and any other conforming
 >> processor) would be much more complex. Also, authoring would be much
 >> more complex, due to the additional elements required (I'm not sure
 >> how much, if any, authoring tools could hide this complexity, because
 >> the authors really need to mark up the ruby and base texts carefully
 >> and accurately).
 >>
 >> My feeling is we should allow the ruby attribute (or element) on
 >> selected inline elements. I'll go through the table Robert and Chris
 >> prepared for the translate attribute and add a column for ruby (I'll
 >> do the same for the dir attribute).
 >> Ruby should be allowed on most (if not
 >> all) inline elements.
 >>
 >> I'd appreciate your feedback on the ruby attribute vs. ruby element
 >> question, and any other input you'd like to provide at this time.
 >>
 >> My proposal currently assumes we're going with the ruby attribute;
 >> I'll extend it to cover the ruby element as per W3C if your feedback
 >> recommends it.
 >>
 >> Thanks in advance,
 >> Gershon
 >>
 >> ---
 >> Gershon L Joseph
 >> Member, OASIS DITA and DocBook Technical Committees Director of
 >> Technology and Single Sourcing Tech-Tav Documentation Ltd.
 >> office: +972-8-974-1569
 >> mobile: +972-57-314-1170
 >> http://www.tech-tav.com
 >>
 >>
 >>
 >>
 >>
 >>
 >>
 >>

----
Martin J. Du"rst               Associate Professor
Aoyama Gakuin University       mailto:duerst@it.aoyama.ac.jp
5-10-1 Fuchinobe, Sagamihara   http://www.sw.it.aoyama.ac.jp
229-8558 Japan                 tel:+81-42-759-6329
Received on Friday, 3 March 2006 09:45:12 UTC