Re: W3C I18N & Accessibility; ISO 639 language codes

Colleagues:

Those of you joining our meeting tomorrow on sign language and AAC
designations by telephone should follow the remote participation teleconference
directions at:

htt://www.w3.org/WAI/APA/wiki/Meetings/TPAC_2019

Resource: Webex & Teleconference Logistics
Webex Best Practices: https://www.w3.org/2006/tools/wiki/WebExBestPractices

W3C uses IRC to capture minutes and otherwise manage our discussions.*IRC Logistics
IRC: server: irc.w3.org, channel: #APA

IMPORTANT: Upon joining IRC, please identify yourself:
present+ [your_name]
Ex: present+ Janina_Sajka

To raise your hand to speak, enter q+


Janina Sajka writes:
> Hi, Addison:
> 
> Let's then do it Thursday at 5PM.
> 
> Does that work for others on this thread?
> 
> I'll adjust our APA planning accordingly, and I'll ask our staff contact
> Michael Cooper to set up a one-time Webex we can share with our non W3C
> colleagues who have raised some of the questions in this thread.
> 
> I may not have mentioned it previously, but part of APA's direct
> interest is in identifying AAC languages appropriately. Our
> Personalization TF will have a demo in hand during TPAC of web content
> auto transformed for Bliss symbol users. The technology they're
> prototyping should allow the AAC user to specify their preferred AAC
> lang and get similar results.
> 
> Our Personalization TF Co-Facilitators will be in Japan and will want to
> participate in our conversation.
> 
> Janina
> 
> Phillips, Addison writes:
> > Hi Janina,
> > 
> > Thanks for the note.
> > 
> > I personally can't do Friday at 5 PM, since my flight to Tokyo is at 4:00 PM. I could do Thursday. I'm also happy to do some other evening or to host a call as part of the I18N teleconference outside of TPAC. Others in the I18N WG might be able to accommodate different days or times.
> > 
> > How do you want to resolve this?
> > 
> > Addison
> > 
> > > -----Original Message-----
> > > From: janina@rednote.net [mailto:janina@rednote.net]
> > > Sent: Thursday, September 05, 2019 12:39 PM
> > > To: Phillips, Addison <addison@lab126.com>
> > > Cc: ishida@w3.org; atsushi@w3.org; xfq@w3.org; W3C WAI Accessible
> > > Platform Architectures <public-apa@w3.org>; public-i18n-core@w3.org;
> > > Fourney, David <david.fourney@usask.ca>; Christian Galinski
> > > <christian.galinski@chello.at>; 'klaus.miesenberger'
> > > <klaus.miesenberger@jku.at>; hoeckner@hilfsgemeinschaft.at; shadi@w3.org;
> > > alejandro.moledo@edf-feph.org; lisa.seeman@zoho.com; 'Kasinskaite,
> > > Irmgarda' <I.Kasinskaite@unesco.org>; drude@xs4all.nl; stevelee@w3.org;
> > > 'FERRES Mercè' <FERRES@iso.org>; Charles LaPierre <charlesl@benetech.org>;
> > > p13n@rednote.net
> > > Subject: Re: W3C I18N & Accessibility; ISO 639 language codes
> > > 
> > > Thank you, Addison, for the very prompt and positive response. And thank you
> > > for offering to make room on your Monday-Tuesday agenda. However, I will be
> > > wearing a different badge representing a different contracted consulting
> > > interest Monday-Tuesday, and I hesitate to step away on those days for APA
> > > agenda.
> > > 
> > > I believe many of the people cc'd who have raised these questions with us are in
> > > Europe. So, if we're to offer them a reasonable opportunity to dial in, the very
> > > end of the day is likely the most congenial opportunity, though admittedly
> > > horrible for North Americans.
> > > 
> > > What if we took some time at the very end of the week? Say starting at 5PM
> > > Friday? I believe that would be 9AM for our friends in Europe.
> > > 
> > > Would that work for I18N? For whoever is still at TPAC?
> > > 
> > > 
> > > Janina
> > > 
> > > 
> > > Phillips, Addison writes:
> > > > <chair hat on>
> > > > I would be happy to meet with our A11Y colleagues during a portion of the
> > > I18N meeting Monday/Tuesday. I would also be glad to meet with A11Y folks on
> > > Thursday or part of Friday (speaking personally) and I'm sure others in our group
> > > who are present would also attend.
> > > >
> > > > <chair hat off>
> > > > This thread seems confused? BCP 47 includes support for ISO 639, parts 1, 2,
> > > and 3, including a large number of sign languages. Alpha2 subtags are used for
> > > languages that have alpha2 codes assigned by ISO 639-1. Languages that have
> > > no 639-1 code but which are assigned codes by 639-2/3 use the alpha3 subtag to
> > > form language tags. These subtags are widely and thoroughly supported in
> > > HTML, CSS and other Web standards. Some other standards (in the structured
> > > data space and notably related to DC) have not fully embraced BCP47, which is a
> > > source of woe for them. Some of the other considerations, such as length, are
> > > dealt with already by BCP47 and in actual fact the use and adoption of Unicode
> > > Locale Identifiers have placed truly huge language tags into production.
> > > >
> > > > I'd be glad to discuss the details here. A more thorough reading and/or in
> > > depth response is probably warranted on my part. Please let me know how best
> > > to meet.
> > > >
> > > > Addison
> > > >
> > > > Addison Phillips
> > > > Sr. Principal SDE – I18N (Amazon)
> > > > Chair (W3C I18N WG)
> > > > Editor (IETF BCP 47)
> > > >
> > > > Internationalization is not a feature.
> > > > It is an architecture.
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: janina@rednote.net [mailto:janina@rednote.net]
> > > > > Sent: Thursday, September 05, 2019 11:13 AM
> > > > > To: Phillips, Addison <addison@lab126.com>; ishida@w3.org;
> > > > > atsushi@w3.org; xfq@w3.org
> > > > > Cc: W3C WAI Accessible Platform Architectures <public-apa@w3.org>;
> > > > > public- i18n-core@w3.org; Fourney, David <david.fourney@usask.ca>;
> > > > > Christian Galinski <christian.galinski@chello.at>; 'klaus.miesenberger'
> > > > > <klaus.miesenberger@jku.at>; hoeckner@hilfsgemeinschaft.at;
> > > > > shadi@w3.org; alejandro.moledo@edf-feph.org; lisa.seeman@zoho.com;
> > > > > 'Kasinskaite, Irmgarda' <I.Kasinskaite@unesco.org>; drude@xs4all.nl;
> > > > > stevelee@w3.org; 'FERRES Mercè' <FERRES@iso.org>; Charles LaPierre
> > > > > <charlesl@benetech.org>; p13n@rednote.net
> > > > > Subject: W3C I18N & Accessibility; ISO 639 language codes
> > > > >
> > > > > Dear W3C I18N Colleagues:
> > > > >
> > > > > With a growing list of cc's accumulated from email exchanged in the
> > > > > past few days ...
> > > > >
> > > > > APA would like an opportunity to explore what actions W3C can and
> > > > > should take toward more useful language specification in web content.
> > > > >
> > > > > Unfortunately, we meet on different days at TPAC. Also, our TPAC
> > > > > calendar has become a little crowded. However, we still have some
> > > > > remaining open slots where we might have a preliminary conversation,
> > > > > should any I18N people still be in Fukuoka and available later in
> > > > > the week. APA will have dialin capability, should a conversation during TPAC
> > > prove possible:
> > > > >
> > > > > https://www.w3.org/WAI/APA/wiki/Meetings/TPAC_2019
> > > > >
> > > > > Or, it may be simpler to say we should take this topic up post TPAC,
> > > > > as a number of the principals with specific knowledge of the
> > > > > accessibility issues we want to discuss will NOT be in Japan.
> > > > >
> > > > > I will defer to your judgement whether a brief introductory
> > > > > conversation in Fukuoka makes sense given limited availability.
> > > > >
> > > > > However we calendar the conversation, I would request, on behalf of
> > > > > APA and particularly our Personalization Task Force that we look for
> > > > > an opportunity to address the issues detailed in the email thread
> > > > > forwarded here.Our TF is moving forward with technology that should
> > > > > significantly improve the web experience of many people living with
> > > > > various cognitive and learning disabilities. APA also continues to
> > > > > have an interest in uptake of the work we began during the
> > > > > development of HTML 5.0 on media accessibility, which brings in our interest
> > > in correctly identifying sign language videos.
> > > > >
> > > > > The above is the simplest agenda description I can come up with at the
> > > moment.
> > > > > Below are some interesting details that should help better explain
> > > > > the concern and hope for improved content markup.
> > > > >
> > > > > Looking forward to greeting many of you in person in Fukuoka,
> > > > >
> > > > > Janina
> > > > >
> > > > > Fourney, David writes:
> > > > > > Hi Janina,
> > > > > >
> > > > > > With respect to standardizing lang codes for AAC (i.e.,
> > > > > > Augmentative and alternative communication), Chritian is better
> > > > > > able to update you on status and timelines.
> > > > > >
> > > > > > I am responding to your question because I wanted to point out
> > > > > > that this proposal (or at least answering the question of whether
> > > > > > 3-letter support is sufficiently in place) solves several issues relating to AAC.
> > > > > >
> > > > > > For example, the ability to use the ISO 639-3 language code for
> > > > > > Blissymbols (lang="zbl") would be possible / better supported on
> > > > > > the web if we can be certain that both HTML and user agents
> > > > > > support such 3-letter encoding. (There remains, of course, the
> > > > > > issue of getting Blissymbolic script into the ISO script code
> > > > > > and/or Unicode so they are properly displayed.)
> > > > > >
> > > > > > On the issue of scripts, as I said earlier, it would be useful for
> > > > > > users to be able to specify (either as the creator of the content
> > > > > > or its user) any preferred scripts. My example below is Russian
> > > > > > presented in a different script, but the issue also applies to specific AAC.
> > > > > > (e.g., This issue would aid the arguments supporting the
> > > > > > development of standards for Blissymbolic script and adding
> > > > > > appropriate script
> > > > > > codes.)
> > > > > >
> > > > > > As for the signed modality (including sign languages, but also
> > > > > > other manual-visual systems), this proposal tries to capture this
> > > > > > AAC technique by using language codes for the natural sign
> > > > > > languages (e.g.,
> > > > > > lang="ase") and the more generic "sgn" for all others.
> > > > > >
> > > > > > As I mentioned to Christian, the current implementation of HTML5
> > > > > > may already address some of these issues. As mentioned below,
> > > > > > BCP47 may need to to be expanded to support a longer length, which will
> > > impact HTML.
> > > > > > Further BCP47 (and HTML) could eventually specify a minimum 3
> > > > > > character length.
> > > > > >
> > > > > > Thus the need for user agent support for three-character codes
> > > > > > (status
> > > > > > unknown) and the need for W3C to begin transitioning to the wider
> > > > > > use of the 3-character code (i.e., lang="eng" rather than
> > > > > > lang="en") is the main meat of the discussion/proposal. Updating
> > > > > > W3C documentation will impact all examples currently using
> > > > > > lang="xx" (e.g., this will impact the supporting documents of WCAG 2.1).
> > > > > >
> > > > > > I hope this further information helps. Please feel free to contact
> > > > > > me if you have any questions or concerns.
> > > > > >
> > > > > > Thanks,
> > > > > > David Fourney
> > > > > >
> > > > > >
> > > > > > On 2019-09-04 3:23 p.m., Christian Galinski wrote:
> > > > > > > Hi, Janina,
> > > > > > >
> > > > > > > Thank you for your positive reply. I am sorry that I cannot
> > > > > > > attend the TCAP meeting – unless there is the possibility to
> > > > > > > attend through teleconferencing.
> > > > > > >
> > > > > > > This would also be the ideal way to participate for David
> > > > > > > Fourney, who could represent ISO/IEC-JTC 1/SC 35 in this matter.
> > > > > > >
> > > > > > > Please be so kind as to put the issue of language
> > > > > > > identifiers/codes for sign languages explained below on the
> > > > > > > agenda of the upcoming TCAP meeting in Japan and discuss how it
> > > > > > > could be solved, duly taking into account that language codes
> > > > > > > increasingly (for a variety of purposes) have to be combined with other
> > > coding schemes.
> > > > > > >
> > > > > > > Below please find a summary of the discussion concerning (1) alpha-2 vs.
> > > > > > > alpha-3 language identifiers for sign languages in video
> > > > > > > programs and apps and (2) the combination of codes to further
> > > > > > > specify the language used, the regional and other language
> > > > > > > variety and the script in which a written file is rendered.
> > > > > > >
> > > > > > > Technically speaking there may be more complexity or deeper
> > > > > > > issues behind the questions raised. There may also be new needs
> > > > > > > for coordination. We are looking forward to your comments. If
> > > > > > > there would be a slot for the discussion of the issues at the
> > > > > > > TCAP meeting, David Fourney and me could join by calling in.
> > > > > > >
> > > > > > > Best regards
> > > > > > >
> > > > > > > Christian
> > > > > > >
> > > > > > > *1 Background:*
> > > > > > >
> > > > > > > The issue at hand is a technical problem that occurs when you
> > > > > > > want to assign language identifiers to sign languages, if the
> > > > > > > code length of the identifier is limited to alpha-2. However,
> > > > > > > ISO 639-1:2002 “Codes for the representation of names of languages –
> > > Part 1:
> > > > > > > Alpha-2 code” does not provide identifiers for sign languages.
> > > > > > > There are estimates of the number of sign languages between more
> > > > > > > than 300 and up to 500. About 150 are assigned 3-letter language
> > > > > > > identifiers in ISO 639-3 “Codes for the representation of names
> > > > > > > of languages – Part 3: Alpha-3 code for comprehensive coverage
> > > > > > > of languages”. In this connection, David Fourney also referred
> > > > > > > to 2019 as UN's International Year of Indigenous Languages – in
> > > > > > > some indigenous language communities sign languages exist. ‘Sign
> > > > > > > languages’ differ from ‘signed languages’ insofar as they are
> > > > > > > the main language for Deaf and Hard of Hearing persons to
> > > > > > > express themselves and largely differ from the language
> > > > > > > spoken/written by the language community in which the respective Deaf
> > > and Hard of Hearing persons are living.
> > > > > > > Compared to ‘sign languages’, ‘signed language’ is a language
> > > > > > > modality largely representing the spoken or written form of a
> > > > > > > language (e.g. “Signed Exact English”) – thus any language can
> > > > > > > be signed in this way which can be identified by adding the
> > > > > > > identifier “sgn” to
> > > > > the respective language identifier.
> > > > > > >
> > > > > > > *2 Request to W3C/TCAP:*
> > > > > > >
> > > > > > > The issue was raised at the ISO/IEC-JTC 1/SC 35 meeting in 2018
> > > > > > > in Okayama “User interfaces” where I reported on standardizing
> > > > > > > activities of ISO/TC 37 “Language and terminology” referring to
> > > > > > > language
> > > > > coding.
> > > > > > > David Fourney made TC 37 aware of the fact that there is a “deficiency”
> > > > > > > in the ISO 639 series when it comes to the coding of sign
> > > > > > > languages in video technology. The issue was taken up by two WGs
> > > > > > > in ISO/TC 37 working on the fundamental terminology of language
> > > > > > > coding and language varieties in a coordinated way. Out of the
> > > > > > > discussions emerged the clarification of the above-mentioned
> > > > > > > distinction of ‘sign language and ‘signed language’. The WGs
> > > > > > > formulated a request to ISO/IEC-JTC 1/SC 35 to clarify the
> > > > > > > matter and formulate a recommendation to ISO/IEC-JTC 1/SC 35. At
> > > > > > > its last meeting ISO/IEC-JTC 1/SC 35 in Shanghai on 2 August
> > > > > > > unanimously approved
> > > > > > >
> > > > > > > *Resolution 2019-69: Requests that Alpha-3 codes be used and
> > > > > > > recommended *
> > > > > > >
> > > > > > > ISO/IEC JTC1/SC35
> > > > > > >
> > > > > > >   * recognizes that the application of the 2-letter (alpha-2) code today
> > > > > > >     is not sufficient for use in programs and apps related to user
> > > > > > >     interfaces which is particularly detrimental when needed for
> > > > > > >     identifying individual languages (including individual sign
> > > > > > >     languages) in user interfaces.
> > > > > > >   * resolves to recommend the use of 3-letter codes for language
> > > > > > >     identification, wherever they can be applied
> > > > > > >   * requests its chair to contact W3C to ask that they recommend the use
> > > > > > >     of 3-letter identifiers for the names of languages wherever used
> > > > > > >     according to:
> > > > > > >       o ISO 639-2 "Codes for the representation of names of languages
> > > > > > >         -Part 2: Alpha-3 code" and
> > > > > > >       o ISO 639-3 "Codes for the representation of names of languages -
> > > > > > >         Part 3: Alpha-3 code for comprehensive coverage of languages"
> > > > > > >         (which includes additional languages beyond those in ISO
> > > > > > > 639-2)
> > > > > > >
> > > > > > > These can be recommended either in addition to or in replacement
> > > > > > > for the 2-letter language identifiers as defined in ISO 639-1
> > > > > > > "Codes for the representation of names of languages - Part 1: Alpha-2
> > > code".
> > > > > > >
> > > > > > > Here the issue as explained by David Fourney:
> > > > > > >
> > > > > > > The technical issue lies primarily with the HTML5 <video>
> > > > > > > element and how it supports the HTML lang attribute.
> > > > > > >
> > > > > > > A <video> allows for one or more <source> files (which can be
> > > > > > > audio and or video tracks) as well as one or more <track> files
> > > > > > > (for subtitles, captions, transcripts, etc.).As a developer, I
> > > > > > > want to specify the language of the captions, audio, and video
> > > > > > > so I can meet WCAG's
> > > > > SCs.
> > > > > > > (WCAG SC 3.1.1 and SC 3.1.2 require the specification of the
> > > > > > > language of
> > > > > > > content.)
> > > > > > >
> > > > > > > HTML allows the specification of the language of content on
> > > > > > > pretty much any element using HTML5's lang attribute. This means
> > > > > > > that I can specify the language of a caption file, an audio
> > > > > > > track, or
> > > > > > > (presumably) a video track.
> > > > > > >
> > > > > > > As a user, if my media player supports it, I can select an audio
> > > > > > > track in one language (e.g., French) and a caption track in
> > > > > > > another (e.g., Norwegian). Theoretically, I can also select a
> > > > > > > video track in whatever language I want.
> > > > > > >
> > > > > > > *That's where the problem lies*. If the audio is embedded in the
> > > > > > > video file, then obviously the language of the video is the
> > > > > > > language of the audio. This can be any spoken language.
> > > > > > > Typically, this is indicated with a two-character code. (This is
> > > > > > > also true with audio sources and
> > > > > > > captioning.)
> > > > > > >
> > > > > > > Many languages do NOT have a two-character code. (Many many
> > > > > > > languages face this issue. The SIL code tables provides a list
> > > > > > > of languages that have one or both types of codes:
> > > > > > > https://iso639-3.sil.org/code_tables/639/data)
> > > > > > >
> > > > > > > But, what if there is no audio in the video? What if the
> > > > > > > language of the video is in fact a visual language? What if it is a sign
> > > language?
> > > > > > >
> > > > > > > I should be able to specify the language of the content (e.g.,
> > > > > > > lang="ase"). Since no sign languages have a two-character code,
> > > > > > > this must be a three-character code.
> > > > > > >
> > > > > > > *3 Combinations of codes:*
> > > > > > >
> > > > > > > Increasingly a higher degree of granularity is becoming
> > > > > > > necessary for identifying not only languages and their regional
> > > > > > > varieties, but also other dimensions of language variation, such
> > > > > > > as a speaker’s language register or communication anomaly. So
> > > > > > > far ISO 639 series deals with combinations of the language
> > > > > > > identifiers with the country (or major
> > > > > > > subdivision) code acc. to ISO 3166 series and script code acc.
> > > > > > > to ISO 15924.
> > > > > > >
> > > > > > > Here again David Fourney’s explanation:
> > > > > > >
> > > > > > > With respect to the size of the string used to fully specify
> > > > > > > languages, I recommend looking at IETF's BCP47
> > > > > https://tools.ietf.org/html/bcp47.
> > > > > > > BCP47 is the document HTML seems to rely upon as well.
> > > > > > >
> > > > > > > W3C could ask the authors of BCP47 to require a new minimum
> > > > > > > string size (if it is not already large enough) and recommend
> > > > > > > the expected use of separators. I suggest using a larger string
> > > > > > > than 12 characters to future proof this decision.
> > > > > > >
> > > > > > > I recommend W3C provide examples in all of their discussions on
> > > > > > > the use of the lang attribute. These examples should all start
> > > > > > > with the 3-character code as its base. All examples using the
> > > > > > > 2-character code should be updated.
> > > > > > >
> > > > > > > With respect to scripts, as I recall, HTML relies entirely on
> > > > > > > the specification of the character set. Typically, this is now
> > > > > > > set to Unicode which is thought to provide the necessary
> > > > > > > characters to write in various languages. As I understand the
> > > > > > > situation (and I could be wrong), authors do not have the
> > > > > > > ability to specify the script of their
> > > > > content.
> > > > > > >
> > > > > > > You are correct that it would be exceedingly useful to be able
> > > > > > > to deliberately specify a script (rather than a character set).
> > > > > > > I envisioned this when I wrote ISO/IEC 24756:2009 and, to a
> > > > > > > lesser extent, ISO/IEC 20071-23. For example, in languages that
> > > > > > > have more than one script, it would be useful for users to be
> > > > > > > able to specify that they want captions in one preferred script
> > > > > > > (e.g., a user might want Russian captions to be presented in Roman script
> > > rather than Cyrillic).
> > > > > > >
> > > > > > > -----Ursprüngliche Nachricht-----
> > > > > > >
> > > > > > > Von: Janina Sajka <janina@rednote.net>
> > > > > > >
> > > > > > > Gesendet: Donnerstag, 29. August 2019 18:17
> > > > > > >
> > > > > > > An: lisa.seeman <lisa.seeman@zoho.com>
> > > > > > >
> > > > > > > Cc: christian.galinski@chello.at; W3C WAI Accessible Platform
> > > > > > > Architectures <public-apa@w3.org>
> > > > > > >
> > > > > > > Betreff: Re: Language codes and iso639 series
> > > > > > >
> > > > > > > Hi, Lisa, Christian, All:
> > > > > > >
> > > > > > > It's unclear to me what kind of assistance you're seeking, and
> > > > > > > specifically what agendum we might propose for a joint meeting
> > > > > > > during TPAC. Christian, are you planning to attend TPAC? It
> > > > > > > would be helpful, as I don't see us effectively carrying your concerns
> > > second hand.
> > > > > > >
> > > > > > > I'm aware, at least to a degree, of ISO and IETF standardization
> > > > > > > on language coding to include support for specifying sign
> > > > > > > language usage,[1] but those are not activities directly in
> > > > > > > W3C's I18N remit,[2] though working in coordination with those groups
> > > clearly is.
> > > > > > >
> > > > > > > Is there a W3C i18n document Christian is looking to affect? Or
> > > > > > > perhaps you're proposing something W3C might publish? APA would
> > > > > > > clearly be interested, but the specifics just aren't in your
> > > > > > > email so I'm left guessing.
> > > > > > >
> > > > > > > We were certainly aware of the multiplicity of sign languages
> > > > > > > when we created our "Media Accessibility User Requirements
> > > > > > > (MAUR)"[3] document during the process of defining HTML 5.0, and
> > > > > > > I believe HTML
> > > > > > > 5 supports that well for alternative media. But, I don't think
> > > > > > > we've done anything specifically beyond that activity in this space.
> > > > > > >
> > > > > > > PS: Any news on standardizing lang codes for AAC?
> > > > > > >
> > > > > > > Please feel free to say more. I'd like to be helpful if I can.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Janina
> > > > > > >
> > > > > > > [1] https://www.evertype.com/standards/iso639/sgn.html
> > > > > > >
> > > > > > > [2] https://www.w3.org/i18n
> > > > > > >
> > > > > > > [3] http://www.w3.org/TR/media-accessibility-reqs/
> > > > > > >
> > > > > > > Lisa Seeman writes:
> > > > > > >
> > > > > > >> Hi Janina
> > > > > > >
> > > > > > >> Christian, who is cc'd is working on improving language code
> > > > > > >> support so
> > > > > that it works for sign langage and the combinations. For example
> > > > > English sign language with Canadian dialect.
> > > > > > >
> > > > > > >>
> > > > > > >
> > > > > > >> Can we bring this up at TPAC with internationalisation?
> > > > > > >
> > > > > > >>
> > > > > > >
> > > > > > >> All the best
> > > > > > >
> > > > > > >>
> > > > > > >
> > > > > > >> Lisa Seeman
> > > > > > >
> > > > > > >>
> > > > > > >
> > > > > > >  > LinkedIn, Twitter
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > > -----Ursprüngliche Nachricht-----
> > > > > > > Von: Fourney, David <david.fourney@usask.ca>
> > > > > > > Gesendet: Montag, 19. August 2019 13:20
> > > > > > > An: christian.galinski@chello.at christian.galinski@chello.at
> > > > > > > <christian.galinski@chello.at>
> > > > > > > Cc: klaus.miesenberger <klaus.miesenberger@jku.at>
> > > > > > > Betreff: Re: Re: HTML etc. and ISO 639-1 2-letter code
> > > > > > >
> > > > > > > Hi Christian,
> > > > > > >
> > > > > > > With respect to the size of the string used to fully specify
> > > > > > > languages, I recommend looking at IETF's BCP47
> > > > > > >
> > > > > > > https://tools.ietf.org/html/bcp47
> > > > > > >
> > > > > > > BCP47 is the document HTML seems to rely upon as well.
> > > > > > >
> > > > > > > W3C could ask the authors of BCP47 to require a new minimum
> > > > > > > string size (if it is not already large enough) and recommend
> > > > > > > the expected use of separators. I suggest using a larger string
> > > > > > > than 12 characters to future proof this decision.
> > > > > > >
> > > > > > > I recommend W3C provide examples in all of their discussions on
> > > > > > > the use of the lang attribute. These examples should all start
> > > > > > > with the 3-character code as its base. All examples using the
> > > > > > > 2-character code should be updated.
> > > > > > >
> > > > > > > With respect to scripts, as I recall, HTML relies entirely on
> > > > > > > the specification of the character set. Typically, this is now
> > > > > > > set to Unicode which is thought to provide the necessary
> > > > > > > characters to write in various languages. As I understand the
> > > > > > > situation (and I could be wrong), authors do not have the
> > > > > > > ability to specify the script of their
> > > > > content.
> > > > > > >
> > > > > > > You are correct that it would be exceedingly useful to be able
> > > > > > > to deliberately specify a script (rather than a character set).
> > > > > > > I envisioned this when I wrote ISO/IEC 24756:2009 and, to a
> > > > > > > lesser extent, ISO/IEC 20071-23. For example, in languages that
> > > > > > > have more than one script, it would be useful for users to be
> > > > > > > able to specify that they want captions in one preferred script
> > > > > > > (e.g., a user might want Russian captions to be presented in Roman script
> > > rather than Cyrillic).
> > > > > > >
> > > > > > > Finally, on the choice of codes. I strongly recommend that ISO
> > > > > > > and W3C set an explicit recommendation on exactly which code set to
> > > use.
> > > > > > > The existence of multiple 3-character sets will add to the
> > > > > > > problem rather than solve anything. ISO will need to unify this
> > > > > > > work to help ease the confusion.
> > > > > > >
> > > > > > > David.
> > > > > > >
> > > > > > > ________________________________________
> > > > > > >
> > > > > > > From: christian.galinski@chello.at
> > > > > > > <mailto:christian.galinski@chello.at>
> > > > > > > christian.galinski@chello.at
> > > > > > > <mailto:christian.galinski@chello.at>
> > > > > > > <christian.galinski@chello.at
> > > > > > > <mailto:christian.galinski@chello.at>>
> > > > > > >
> > > > > > > Sent: Monday, August 19, 2019 3:06 AM
> > > > > > >
> > > > > > > To: Fourney, David
> > > > > > >
> > > > > > > Cc: klaus.miesenberger
> > > > > > >
> > > > > > > Subject: Fwd: Re: HTML etc. and ISO 639-1 2-letter code
> > > > > > >
> > > > > > > Hi David,
> > > > > > >
> > > > > > > Great thanks to you for this excellent clarification!
> > > > > > >
> > > > > > > The recommendation to use only the 3-letter code for languages
> > > > > > > obviously is only one step in the direction of handling language
> > > > > > > codes in various combinations with other codes and thus
> > > > > > > indicating language varieties to some extent. At present
> > > > > > > language varieties can only be indicated in a rudimentary form.
> > > > > > > ISO/TR 21636 "Indication and description of language varieties"
> > > > > > > will pave the way for a future much more detailed coding of varieties.
> > > > > > >
> > > > > > > At present we have at our disposal for coding languages
> > > > > > > (disregarding the 2-letter code according to ISO 639-1):
> > > > > > >
> > > > > > > - 3-letter language codes (all small caps) according to ISO
> > > > > > > 639-2 and 639-3
> > > > > > >
> > > > > > > - 3-letter codes for countries and their subdivisions (all
> > > > > > > capitalized) according to ISO 3166-1 and 3166-2
> > > > > > >
> > > > > > >    (I think we should recommend also here the use of the
> > > > > > > 3-letter
> > > > > > > code)
> > > > > > >
> > > > > > > - 4-letter code for scripts /and script variants/ (first letter
> > > > > > > capitalized) With 10 digits (12 - if separators are added) we
> > > > > > > can thus cope with a lot of variation, under given limitations.
> > > > > > >
> > > > > > > In the case of sign languages (being true sign languages - i.e.
> > > > > > > mother tongues for the Deaf and Hard-of-Hearing) we have at our
> > > disposal:
> > > > > > >
> > > > > > > - 3-letter language code (all small caps) according to ISO 639-3
> > > > > > >
> > > > > > >    (to be extended towards including further sign languages)
> > > > > > >
> > > > > > > - 3-letter codes for countries and their subdivisions (all
> > > > > > > capitalized) according to ISO 3166-1 and 3166-2 With 6 digits (7
> > > > > > > - if separators are
> > > > > > > added) we can thus cope with some variation, under given limitations.
> > > > > > >
> > > > > > > In the case of the language variety "signed language" (e.g.
> > > > > > > Signed Exact
> > > > > > > English) we have at our disposal:
> > > > > > >
> > > > > > > - "sgn" as indicator for "signed language"
> > > > > > >
> > > > > > > - 3-letter language codes (all small caps) according to ISO
> > > > > > > 639-2 and 639-3
> > > > > > >
> > > > > > > - 3-letter codes for countries and their subdivisions (all
> > > > > > > capitalized) according to ISO 3166-1 and 3166-2 With 9 digits
> > > > > > > (11 - if separators are
> > > > > > > added) we can cope with a lot of variation, under given limitations.
> > > > > > > sgn-eng-AUS would refer to the Australian variety of Signed Exact English.
> > > > > > >
> > > > > > > Would this mean that we should recommend - under given
> > > > > > > circumstances and as a step in the direction of further
> > > > > > > necessary varieties in the future
> > > > > > > - a minimum of 12 digits (incl. separators) for coding languages (incl.
> > > > > > > sign languages and signed language)? Is this realistic, and if
> > > > > > > so, is it sufficient?
> > > > > > >
> > > > > > > Best regards
> > > > > > >
> > > > > > > Christian
> > > > > > >
> > > > > > >  > ---------- Ursprüngliche Nachricht ----------
> > > > > > >
> > > > > > >  > Von: "Fourney, David" <david.fourney@usask.ca
> > > > > > > <mailto:david.fourney@usask.ca>>
> > > > > > >
> > > > > > >  > An: "christian.galinski@chello.at
> > > > > > > christian.galinski@chello.at
> > > <mailto:christian.galinski@chello.at%20christian.galinski@chello.at>"
> > > > > > >
> > > > > > >  > <christian.galinski@chello.at
> > > > > > > <mailto:christian.galinski@chello.at>>
> > > > > > >
> > > > > > >  > Cc: "klaus.miesenberger" <klaus.miesenberger@jku.at
> > > > > > > <mailto:klaus.miesenberger@jku.at>>, hoeckner
> > > > > > >
> > > > > > >  > <hoeckner@hilfsgemeinschaft.at
> > > > > > > <mailto:hoeckner@hilfsgemeinschaft.at>>
> > > > > > >
> > > > > > >  > Datum: 17. August 2019 um 02:00
> > > > > > >
> > > > > > >  > Betreff: Re: HTML etc. and ISO 639-1 2-letter code
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > Hi Christian,
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > To answer your specific question: There is no connection to CSS.
> > > > > > >
> > > > > > >  > Cascading Style Sheets are used only for the styling and
> > > > > > > presentation
> > > > > > >
> > > > > > >  > of content. For example, I would use CSS to indicate the font
> > > > > > > I want,
> > > > > > >
> > > > > > >  > whether to make the text bold, and where to put it on the screen.
> > > > > > > CSS
> > > > > > >
> > > > > > >  > is not for specifying languages, this is the role of HTML.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > The technical issue lies primarily with the HTML5 <video>
> > > > > > > element and
> > > > > > >
> > > > > > >  > how it supports the HTML lang attribute.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > A <video> allows for one or more <source> files (which can be
> > > > > > > audio
> > > > > > >
> > > > > > >  > and or video tracks) as well as one or more <track> files
> > > > > > > (for
> > > > > > >
> > > > > > >  > subtitles, captions, transcripts, etc.).
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > As a developer, I want to specify the language of the
> > > > > > > captions, audio,
> > > > > > >
> > > > > > >  > and video so I can meet meet WCAG's SCs. (WCAG SC 3.1.1 and
> > > > > > > SC
> > > > > > > 3.1.2
> > > > > > >
> > > > > > >  > require the specification of the language of content.)
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > HTML allows the specification of the language of content on
> > > > > > > pretty
> > > > > > >
> > > > > > >  > much any element using HTML5's lang attribute. This means
> > > > > > > that I can
> > > > > > >
> > > > > > >  > specify the language of a caption file, an audio track, or
> > > > > > >
> > > > > > >  > (presumably) a video track.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > As a user, if my media player supports it, I can select an
> > > > > > > audio track
> > > > > > >
> > > > > > >  > in one language (e.g., French) and a caption track in another
> > > > > > > (e.g.,
> > > > > > >
> > > > > > >  > Norwegian). Theoretically, I can also select a video track in
> > > > > > > whatever
> > > > > > >
> > > > > > >  > language I want.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > That's where the problem lies. If the audio is embedded in
> > > > > > > the video
> > > > > > >
> > > > > > >  > file, then obviously the language of the video is the
> > > > > > > language of the
> > > > > > >
> > > > > > >  > audio. This can be any spoken language. Typically, this is
> > > > > > > indicated
> > > > > > >
> > > > > > >  > with a two-character code. (This is also true with audio
> > > > > > > sources and
> > > > > > >
> > > > > > >  > captioning.)
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > Many languages do NOT have a two-character code. (Many many
> > > > > > > languages
> > > > > > >
> > > > > > >  > face this issue. The SIL code tables provides a list of
> > > > > > > languages that
> > > > > > >
> > > > > > >  > have one or both types of codes:
> > > > > > >
> > > > > > >  > https://iso639-3.sil.org/code_tables/639/data)
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >> (A reminder that 2019 is the UN's International Year of
> > > > > > >> Indigenous
> > > > > > >
> > > > > > >  > Languages.)
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > But, what if there is no audio in the video? What if the
> > > > > > > language of
> > > > > > >
> > > > > > >  > the video is in fact a visual language? What if it is a sign language?
> > > > > > >
> > > > > > >  > I should be able to specify the language of the content
> > > > > > > (e.g.,
> > > > > > >
> > > > > > >  > lang="ase"). Since no sign languages have a two-character
> > > > > > > code, this
> > > > > > >
> > > > > > >  > must be a three-character code.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > So the first issue is: "Can I do this?"
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  >  From reading the HTML 5.2 and some IETF specifications, I
> > > > > > > MIGHT be
> > > > > > >
> > > > > > >  > able to use a three-character code, but its not very clear IF I CAN.
> > > > > > >
> > > > > > >  > The specification appears to allow a code of 6 to 8 characters in length.
> > > > > > >
> > > > > > >  > This suggests a combination of language and region codes,
> > > > > > > including
> > > > > > >
> > > > > > >  > hyphens, might fit a three-character language code plus a
> > > > > > >
> > > > > > >  > two-character region code, but not much else.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > Resources on this include IETF's BCP47
> > > > > > >
> > > > > > >  > https://tools.ietf.org/html/bcp47
> > > > > > >
> > > > > > >  > and the HTML5.2 specification
> > > > > > >
> > > > > > >  >
> > > > > > > https://www.w3.org/TR/html52/dom.html#the-lang-and-xmllang-attri
> > > > > > > bute
> > > > > > > s
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > The living specification discusses this at
> > > > > > >
> > > > > > >  >
> > > > > > > https://html.spec.whatwg.org/#the-lang-and-xml:lang-attributes
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > The second issue is: "Will it work?"
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > If a browser sees a three-character language code, will it
> > > > > > > know what
> > > > > > >
> > > > > > >  > to do with it? What about a media player? What about a screen reader?
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > Its all well and good that I can specify my language, but not
> > > > > > > if it is
> > > > > > >
> > > > > > >  > not supported (i.e., my user agent won't be able to handle it).
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > Setting aside <video>, I would also point out that this
> > > > > > > second issue
> > > > > > >
> > > > > > >  > applies to the browser in general. Is there full support for
> > > > > > >
> > > > > > >  > specifying the language of a document using a three-character
> > > > > > > code
> > > > > > >
> > > > > > >  > (e.g., <html lang="eng"> vs. <html lang="en">).
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > As I mentioned in Ottawa, what we need the W3C to do is:
> > > > > > >
> > > > > > >  > 1. Confirm how large a language code can be used within the
> > > > > > > HTML lang
> > > > > > >
> > > > > > >  > attribute and determine if this length is large enough given
> > > > > > > the
> > > > > > >
> > > > > > >  > three-character codes of ISO 639-2 and the various region and
> > > > > > > script
> > > > > > >
> > > > > > >  > codes that can be appended to it.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > 2. Confirm that user agents are required to support long
> > > > > > > language
> > > > > > >
> > > > > > >  > codes (via the lang attribute), not just the two-character
> > > > > > > codes that
> > > > > > >
> > > > > > >  > are specified in ISO 639-1. This is important because, if the
> > > > > > > HTML
> > > > > > >
> > > > > > >  > specifications allow for rather long codes but the user
> > > > > > > agents do not,
> > > > > > >
> > > > > > >  > then using a long code will not work.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > To my mind, there should be no issue because it is just a
> > > > > > > language
> > > > > > >
> > > > > > >  > indication code. Most of the time user agents should just
> > > > > > > accept any
> > > > > > >
> > > > > > >  > code and do nothing further with it.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > This issue was the source of my concern only because you
> > > > > > > mentioned the
> > > > > > >
> > > > > > >  > demand to freeze ISO 639-1 from 20+ years ago. The freeze
> > > > > > > request
> > > > > > >
> > > > > > >  > suggests to me that user agents only support a small number
> > > > > > > of codes
> > > > > > >
> > > > > > >  > and intend to act in some way on these codes.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > 3. Confirm that the lang attribute (of any length) can be
> > > > > > > used on any
> > > > > > >
> > > > > > >  > HTML element in a meaningful way, including the specification
> > > > > > > of the
> > > > > > >
> > > > > > >  > language of a video track (e.g., <source src="movie.mp4"
> > > > > > >
> > > > > > >  > type='video/mp4' lang='ase'>).
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >> Ultimately, the need is to determine if user agents support
> > > > > > >
> > > > > > >  > three-character codes so that the specification of a video or
> > > > > > > a
> > > > > > >
> > > > > > >  > document in a language that only has a three-character code
> > > > > > > will
> > > > > > >
> > > > > > >  > actually work. I would expect someone at W3C will know what
> > > > > > > support is (or is not) available.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > I hope that this explanation helps you. Please let me know if
> > > > > > > you have
> > > > > > >
> > > > > > >  > any questions.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > Thanks,
> > > > > > >
> > > > > > >  > David.
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  >
> > > > > > >
> > > > > > >  > On 2019-08-15 12:21 p.m., christian.galinski@chello.at
> > > > > > > <mailto:christian.galinski@chello.at>
> > > > > > >
> > > > > > >  > christian.galinski@chello.at <mailto:christian.galinski@chello.at>
> > > wrote:
> > > > > > >
> > > > > > >  > > Hi, David,
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > How are you doing?
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > Further to our recent discussions I would like to ask you
> > > > > > > to clarify
> > > > > > >
> > > > > > >  > > one more technical question: concerning the use of the
> > > > > > > alpha-2 code
> > > > > > >
> > > > > > >  > > (acc. to ISO 639-1?) in HTML and/or XHTML and/or HTML5
> > > > > > > which you
> > > > > > >
> > > > > > >  > > mentioned is hindering certain functions/features necessary
> > > > > > > for the
> > > > > > >
> > > > > > >  > > Deaf and hard of hearing. Is there a connection to CSS?
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > Could you please elaborate a bit on this technical question?
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > If there is an issue, how should it be presented to W3C/TCAP?
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > Best regards
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > Christian
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > > > >  > > p.s.
> > > > > > >
> > > > > > >  > >
> > > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Janina Sajka
> > > > >
> > > > > Linux Foundation Fellow
> > > > > Executive Chair, Accessibility Workgroup: http://a11y.org
> > > > >
> > > > > The World Wide Web Consortium (W3C), Web Accessibility Initiative (WAI)
> > > > > Chair, Accessible Platform Architectures http://www.w3.org/wai/apa
> > > >
> > > 
> > > --
> > > 
> > > Janina Sajka
> > > 
> > > Linux Foundation Fellow
> > > Executive Chair, Accessibility Workgroup: http://a11y.org
> > > 
> > > The World Wide Web Consortium (W3C), Web Accessibility Initiative (WAI)
> > > Chair, Accessible Platform Architectures http://www.w3.org/wai/apa
> > 
> 
> -- 
> 
> Janina Sajka
> 
> Linux Foundation Fellow
> Executive Chair, Accessibility Workgroup: http://a11y.org
> 
> The World Wide Web Consortium (W3C), Web Accessibility Initiative (WAI)
> Chair, Accessible Platform Architectures http://www.w3.org/wai/apa
> 

-- 

Janina Sajka

Linux Foundation Fellow
Executive Chair, Accessibility Workgroup: http://a11y.org

The World Wide Web Consortium (W3C), Web Accessibility Initiative (WAI)
Chair, Accessible Platform Architectures http://www.w3.org/wai/apa

Received on Wednesday, 18 September 2019 11:38:48 UTC