Re: Re[2]: FW: acronym in title... from Mark Davis on 2003-03-14 (w3c-wai-gl@w3.org from January to March 2003)

From: Mark Davis <mark.davis@jtcsv.com>
Date: Fri, 14 Mar 2003 07:35:14 -0800
To: "Roberto Scano - IWA/HWG" <rscano@iwa-italy.org>, <ishida@w3.org>, <w3c-i18n-ig@w3.org>, <public-i18n-geo@w3.org>, "Al Gilman" <asgilman@iamdigex.net>
Cc: <w3c-wai-gl@w3.org>
Message-ID: <004501c2ea3f$4f179520$7900a8c0@DAVIS1>
I wonder how realistic these scenarios are. The principal motivation for a
fine-grained language tagging of individual words or phrases appears to be
for text-to-speech, primarily for the blind. But the goal for
text-to-speech, except in very rare cases, will be to read the customary,
most-well-understood pronunciation of the phrase in the end-user's language.
Rarely will that precisely match the exact pronunciation of the word in the
foreign language. More typically it will be modified to fit within an
allowable sequence of phonemes in the user's language, and often it will be
given a "spelling pronunciation". In that, it is much more like an equally
obscure non-foreign word in the user's language. For example, an English
text-to-speech system should be more likely to find a relatively common
Yiddish term like "schlemiel" than to find an uncommon "native" word such as
"phthisic" in English.

If the goal is to provide special pronunciations for unusual phrases, or
phrases that are pronounced unusually in the given context, then it may be
appropriate to have annotations: but annotations of the common
pronunciation, not of the language. So for that reason, it may be reasonable
to have fine-grained annotations. But it would be most useful to explore
first (a) what the actual requirements for text-to-speech are, and (b)
whether people find it, in practice, relatively easy to work around the
restrictions of attributes or elements that don't allow substructure.

[That being said, I have always found the attributes that end up being
displayed to the user, such as alt and title, very confining. It would be
better to have elements that correspond to them, so that the text display
can be richer, such as italicising or bolding a word within them.]

Mark
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Al Gilman" <asgilman@iamdigex.net>
To: "Roberto Scano - IWA/HWG" <rscano@iwa-italy.org>; <ishida@w3.org>;
<w3c-i18n-ig@w3.org>; <public-i18n-geo@w3.org>
Cc: <w3c-wai-gl@w3.org>
Sent: Friday, March 14, 2003 05:30
Subject: Re: Re[2]: FW: acronym in title...


>
> At 04:29 AM 2003-03-14, Roberto Scano - IWA/HWG wrote:
>
> >So, at least, for the "previous" version all we can do is to "hope" to
> >have - for example - intelligent text readers that read the words in the
> >natural language...
>
> Not at all.
>
> The approach that John Foliot laid out
>
>   http://lists.w3.org/Archives/Public/w3c-wai-ig/2003JanMar/0800.html
>
> will produce graceful and communicative results in the vast majority of
cases.
>
> You just realize that you can't indicate more than one natural language
for
> the attributes of one element.  You edit the hypertext so as not to mix
> natural languages within the attributes of one element and you go outside
> the element to indicate the natural language for as small a context as
> necessary, but not less than that one element.
>
> Further, you don't use foreign terms or phrases abruptly.  Assuming that
you
> want to use 'Piazza San Marco' as the ALT for an image, but the bulk
> language of the text is not Italian, introduce the foreign phrase
> conversationally in the text before it is used in an epigrammatic way in
the
> exhibit.
>
> If you can't find it in your heart to take the text space to introduce the
> foreign phrase in the text, don't use it in the attribute.  Very rarely
will
> these guidelines leave you with nothing good to do.
>
> On the other hand, hoping for the smarter tools is all we can do in any
case.
> For the old version of the language or the new improved version of the
> language.  There is a reasonable chance that switching language models on
> the fly will come to be the norm for TTS systems in the next few years,
but
> it is certainly not the norm now.  So content has to be prepared to
survive
> as communication with the screen reader which reads the whole page in one
> language anyway.
>
> Yes, we want to clean this up in XHTML 2.0.
>
> But our change proposals have to be very well thought out (by the time
they
> hit the HTML user community) and our follow-up promotion during "life
after
> Rec" for the reformed version of the language will have to be sensitive to
> the problems faced by our customers, the content developers, in managing
the
> transition across incompatible versions.
>
> The whole value of the Web is its universal interoperability.  And we are
> threatening to mess with that in one dimension to improve it in another.
> This has to be done carefully, and diplomatically.
>
> Al
>
> PS:
>
> On a personal level, all that I said above is really going to cramp *my*
> style.  My personal version of English contains a lot of latin, French,
and
> Chinese idioms.  Maybe a little Yiddish.  I use these as if they were
> English without thinking.  But I eventually learned from sad experience in
> working in the WAI that foreign phrases are not second-language-safe
> vocabulary.  My French colleagues who are so good in English that I forget
> it is not their natural language do not know the same Yiddish phrases that
I
> do, and, I was shocked to realize, not the Latin borrowings either such as
> "i.e., e.g." and so forth.
>
> So for an international readership in *any* natural language, there are
> reasons to avoid sprinkling in borrowed terms from other languages.
Because
> there is a large audience for whom the dominant language of the current
text
> is usable as a second language, but where the third-language vocabulary
will
> be opaque, the sense inaccessible.
>
> We (PF) would like to migrate Web media to a language-use and
> language-usage-explanation platform that improves on this situation for
> foreign language borrowings, for Terms of Art in technical literature,
etc.
> WCAG 4.3 where it says "explain at first use" is a very print-oriented
rule.
> It is not the most implementable form of an effective guideline, from the
> AT perspective.  We should probably come up with something better which
> combines new markup language design, processing patterns, and authoring
> guidance.  If we try to change in these three areas piecemeal, the changes
> are likely simply never to take hold in general practice.
>
> We will have more on "how this could work" shortly.  Sorry about the
> promises.  But bear with us, please.
>
> Al
>
> >----- Original Message -----
> >From: "Al Gilman" <asgilman@iamdigex.net>
> >To: "Roberto Scano - IWA/HWG" <rscano@iwa-italy.org>; <ishida@w3.org>;
> ><w3c-i18n-ig@w3.org>; <public-i18n-geo@w3.org>
> >Cc: <w3c-wai-gl@w3.org>
> >Sent: Friday, March 14, 2003 12:10 AM
> >Subject: Re: Re[2]: FW: acronym in title...
> >
> > >I would approach the assistive technology developers and ask what
> > >kind of an indirect relationship they could most readily see
> > >implementing.  Will they come to the W3C DOM for this information?
> > >Do we need to get it into MSAA?
> >
> >I think that IBM and other W3C member that create these applications
could
> >reply to this.
> >
> > >It would not be hard to write a tool that reads a glossary and
> > >adds things like the language information to HTML attributes where
> > >they appear in the DOM.  Then the assistive technology could know that
> > >'Piazza San Marco' was Italian.
> >
> >Hum... i think it would be difficoult to have a full dictionary for all
the
> >possible words.
> >But - at now - i have no proposal for solution in my mind :-/
> >
> > >The problem is that if the HTML Working Group were to introduce an
> > >incompatible change in a minor release like that, who would implement
> > >it?  The conventional wisdom is "nobody."  And I am not inclined to
> > >second-guess the experts on that point.
> > >
> > >Incompatible changes in HTML are generally not going to be considered
> > >outside the XHTML 2.0 activity, as it is risky to think with the heavy
use
> > >of HTML all the time that any incompatible changes will be taken up in
> > >practice, even with the best efforts of the HTML WG.
> > >
> > >XHTML 2.0 is the version that HTML WG is working on.  We would have to
have
> > >a flaming disaster going on to get an incompatible change released as
some
> > >sort of an interim patch, and it is not clear who would implement it.
> > >
> > >Besides, there are too many, too good, ways to do this in ways that
> > >interoperate with HTML 2.0.
> >
> >[cut]
> >
> > >There is also a plugin option for the browser extension.  Also an
> > >independent screen scraper like Atomica.
> >
> >So, at least, for the "previous" version all we can do is to "hope" to
> >have - for example - intelligent text readers that read the words in the
> >natural language...
> >
> >Roberto Scano
> >IWA/HWG EMEA Coordinator
> >W3C Advisory Committee Representative for IWA/HWG
> >International Webmasters Association / HTML Writers Guild
> >http://www.iwanet.org - http://www.hwg.org
> >E-Mail: emea@iwanet.org - w3c-rep@iwanet.org
> >--------------------------------------------------------------------
>
>
Received on Friday, 14 March 2003 10:35:38 UTC