W3C home > Mailing lists > Public > w3c-wai-gl@w3.org > October to December 2003

RE: Examples of language changes in websites

From: Richard Ishida <ishida@w3.org>
Date: Fri, 5 Dec 2003 09:40:03 -0000
To: "'Matt May'" <mcmay@w3.org>
Cc: "'WAI GL'" <w3c-wai-gl@w3.org>
Message-ID: <000201c3bb13$c2c63c10$6501a8c0@w3c40upc3ma3j2>

I'm wondering whether the best approach is to say:

1.	if this is a passage or phrase that is clearly in another
language or uses a different script, use lang or xml:lang to indicate
that 
	(for a different script this can sometimes be useful for
rendering visually - eg. browsers sometimes automatically apply Japanese
vs. Chinese fonts to the same character based on the language info)

2.	if this is a borrowed word or phrase, use your judgement to
guess what a voice browser will do with it in the language of the page,
then mark it as in a foreign language if you feel it is appropriate.
	you might not mark it up if you think 
	a. the tts process will do a pretty good job anyway in getting
the gist across (either because it may be in the dictionary or the
guessed pronunciation will be close enough to be understandable, or
because the pronunciation has been adapted to the phones of the native
language (eg. voir-dire) ), or 
	b. if you think that the tts process will just bypass a word in
another language (which might be worse than a bad pronunciation).

RI


PS:  I think the Japanese Macudonarudo would come out fine in a tts
process, because the Japanese script used (katakana) is close to
phonetic.  (Fortunately, because if you were to worry about that you'd
have serious problems, since there are hundreds of everyday words and
many more trendy borrowings in Japanese.  (naifu, fo-ku, chi-zu,
compyutaa, eskareetaa, erebeetaa, pen, doaa, etc, etc are all standard
words for everyday items in Japan, and they continue to assimilate such
words, but if they are written in japanese script its easy to word out
the pronunciation, which generally becomes Japanised too in the
process))





============
Richard Ishida
W3C

contact info: http://www.w3.org/People/Ishida/ 

http://www.w3.org/International/ 
http://www.w3.org/International/geo/ 

W3C Internationalization FAQs
http://www.w3.org/International/questions.html
RSS feed: http://www.w3.org/International/questions.rss



> -----Original Message-----
> From: Matt May [mailto:mcmay@w3.org] 
> Sent: 04 December 2003 20:33
> To: ishida@w3.org
> Cc: WAI GL
> Subject: Re: Examples of language changes in websites
> 
> 
> On Dec 4, 2003, at 5:04 AM, Richard Ishida wrote:
> > When tagging language variations in text, where do you draw the line
> > between what is clearly a foreign language phrase, what is 
> a neologism 
> > drawn from a foreign source, and what is one of the latter that has 
> > become an adopted part of the language?
> >
> > For (another) example, in English we talk about 'a certain 
> je ne sais
> > quoi'.  I would think that a large proportion of people in the UK 
> > would understand this half English half French phrase these 
> days, due 
> > to the influence of recent advertising and cooking programs, etc., 
> > though I'd say that is a fairly recent phenomenon.  Is it 
> English or 
> > French?
> 
> My colleague has hit on the subject I wanted to bring up. (Good job, 
> Richard! How'd you like it if I started talking about 
> internationalization, huh? :)
> 
> One of my favorite words in the English language is résumé. I 
> say it's 
> English, even with the accents, for two reasons. First, it's 
> pronounced 
> "reh-zuh-may" by all but the most pretentious Americans, and some 
> people who can actually speak French. A screen reader which 
> accurately 
> reads this in French may even be unintelligible to someone 
> who doesn't 
> understand the language. Second, the term in English doesn't 
> mean what 
> it means in French: In France, a résumé is a summary, including such 
> things as sporting event wrap-ups. In American English, a résumé is 
> what the French (and British) would call a "curriculum vitae". (Which 
> is actually something different to us. Go figure.) So we have a word 
> which is neither pronounced nor intended like the word it's derived 
> from. Should it be marked as French anyway? (I should also note that 
> résumé does appear in my English dictionary, with accents, as an 
> English word.)
> 
> Another example: "voir dire", which in French would mean 
> something like 
> "to see (one) say", but is a legal term for the jury 
> selection process. 
> It's also horribly mangled in English ("vohr die-er", versus 
> the French 
> "vwahr-deer"). So, not a French word, not pronounced like one, and 
> still pretty likely to get destroyed in a screen reader.
> 
> What about garage, or queue, or the thousands of other 
> cognates English 
> and French have? After what period does a word no longer have to be 
> marked? What about loanwords, taken from various languages and made 
> part of the English lexicon? Is there any real value in marking up 
> individual words in otherwise monolingual text, except for citations 
> (The expulsion of the last Moorish rulers was the culmination of the 
> "Reconquista"), or obscure uses for affectation ("arriviste", "moi? 
> jamais!")?
> 
> What about other languages, such as Japanese? The katakana 
> transliteration of the McDonald's restaurant is 
> "ma-ku-do-na-ru-do". Is 
> that English? Would Japanese people understand it if it were 
> spoken as 
> English? This stuff has to apply across all languages.
> 
> I'm starting to believe that we have to look at the real 
> problem, which 
> is in larger blocks of clearly foreign-language text without markup. 
> It's perfectly reasonable to ask authors to mark up sentences or 
> paragraphs that are in a foreign language. What I don't want 
> is to have 
> WCAG set standards that only linguists or polyglots with a 
> whole lot of 
> time on their hands are able to legitimately conform to. The vast 
> majority of the usefulness of this guideline is in the big stuff, not 
> in marking up microscopic bits.
> 
> -
> m
> 
Received on Friday, 5 December 2003 04:40:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:47:26 GMT