- From: r12a <ishida@w3.org>
- Date: Thu, 8 Mar 2018 17:05:03 +0000
- To: www International <www-international@w3.org>
https://www.w3.org/2018/03/08-i18n-minutes.html
text extract follows:
– DRAFT –
Internationalization Working Group Teleconference
08 March 2018
[2]Agenda [3]IRC log
[2]
https://lists.w3.org/Archives/Member/member-i18n-core/2018Mar/0000.html
[3] https://www.w3.org/2018/03/08-i18n-irc
Attendees
Present
addison, Bert, Fuqiao, JcK, Katy, Nigel, pal, stpeter
Regrets
Chair
Addison Phillips
Scribe
stpeter
Contents
* [4]Meeting Minutes
1. [5]Agenda
2. [6]IMSC visiting us!
3. [7]What Time is This Meeting At?
* [8]Summary of Action Items
Meeting Minutes
Agenda
<JcK> No
IMSC visiting us!
<nigel> [9]IMSC Issue 236
[9] https://github.com/w3c/imsc/issues/236
r12a: background ... ISMC uses Unicode characters, glyphs come
out of fonts, rendering algos/engines are needed for complex
scripts at times before glyphs are assigned; important in this
discussion to be clear on terminology of character/codepoint
vs. glyphs
JcK: are you talking about single code points or multiple that
might result in a single grapheme?
r12a: single code points for this discussion
<r12a> [10]https://www.w3.org/TR/
ttml-imsc1.0.1/#recommended-unicode-code-points-per-language
[10]
https://www.w3.org/TR/ttml-imsc1.0.1/#recommended-unicode-code-points-per-language
pal: purpose is to provide guidance regarding subtitles;
enhance chance that if author chooses text it will be supported
by the user agent and properly rendered
pal: the intent is not to disallow certain code points or to
require a rendering engine to not render certain code points
addison: I think this is an extremely tricky thing to specify
addison: first, implementers might see this as a required set,
the only thing they have to support, etc.
addison: for example, you wouldn't necessarily have enough code
points to properly render Arabic
pal: actually we have the common code points
addison: doesn't deal with the need for more glyphs in your
font
pal: that's why worded in terms of code points, not glyphs
addison: naive implementation would have glyph per code point
pal: should we add a note about that?
addison: most people build a system there's an instance of it
for Arabic users or whatever script is in play
addison: second point, CLDR has sets of characters like this by
language (exemplar sets)
addison: it might be helpful to reference CLDR instead of
defining your own
pal: we do reference CLDR - recommended set is a union of CLDR
and ???
r12a: I'm worried about implementers too, but this section is
about authors
r12a: my worry is that implementers won't see this as clearly
r12a: make it clear that this is a guide for a minimum set and
for real support you should go further
r12a: also make it clear that implementers need to enable the
display of the following sets of characters, not selecting
those sets of characters
pal: output document should only contain those characters
addison: output document is displayed somewhere and needs to be
displayed faithfully
addison: depends on how system that receives it is implemented
addison: shaping engine etc.
pal: annex is intended to be used by validator implementation
pal: validator that sees a character that's not in the
recommended character set can flag a warning
addison: is this really a good idea?
pal: what's a bad idea is showing unsupported characters
pal: realistically no implementation is going to support all
Unicode code points
addison: some implementations support everything but rather
obscure code points (plane 2 Chinese, ancient scripts, etc.)
addison: what I see happen is trying to legislate fairly narrow
character sets, whereas many rendering systems are more capable
pal: this is targeting not just browsers but embedded systems
like TVs
pal: also, this has already proved useful
addison: implementers do have font and space limitations, but
it's a slippery slope when recommending subsets of characters
r12a: I understand the intent, my concern is in how we describe
that to people
r12a: e.g., if we said "these are the safe characters to use"
makes more sense to me
r12a: this comes across as "these are the Hebrew (etc.)
characters you should support" but these sets tend to grow to
support new code points
pal: this is why we reference CLDR
r12a: unfortunately CLDR is not a panacea - it's missing things
pal: so let's fix CLDR
pal: not displaying a character is way worse
r12a: the crux is specifying a safe set of characters for
authors without implying that implementers should limit the
sets of characters they support
pal: what about starting the annex with that text?
r12a: that's the kind of thing I was looking for
<Zakim> nigel, you wanted to ask what action we can take to
address the remaining concerns.
nigel: the struggle here is understanding exactly what the
concern is and coming up with a proposal to address the concern
nigel: this discussion is helping
nigel: any other concerns we can surface here?
JcK: I'm concerned about where this might be leading;
displaying the wrong character is much worse than displaying
parts of a string and not other parts (for instance)
JcK: part of the concern is that there are many edge cases
which can't be handled by this kind of approach
JcK: e.g., if you get text in Hebrew script but another
language then you might not have the right code points to
display things properly
JcK: there are traps here about writing this particular
language with this particular script, but not other languages
pal: I captured another concern earlier about cautioning
implementers that one code point != one glyph
r12a: if you're dealing with a complex script like Myanmar,
there are more difficulties
addison: when people go font shopping, they can be satisfied
with an inferior font and the rendering engine doesn't have the
glyph that's necessary
pal: that's true regardless
r12a: that's part of my concern - we shouldn't let implementers
off the hook and stymie forward progress (yes, these are
embedded systems that aren't updated often)
pal: hard to phrase this in a technical document
addison: these things tend to ossify into a lowest common
denominator or institutionalizes some particular set of
characters
pal: I think we're safe in the sense that systems support all
of Unicode - we're not trying to create a chokepoint for code
points
addison: not at document level but at the validator and
authoring tool levels
pal: that's why we don't reference a particular version of CLDR
for instance
JcK: the fact that CLDR exists does not imply that CLDR is
correct
Katy: even defining a list of safe characters can vary quite
wildly
Katy: to clarify, managing author expectations is difficult
here
Katy: not just glyph display but processing and the like
nigel: maybe clarify for authors that you can't just get a
glyph but there is more complexity - there might fallback fonts
and such (not just safe characters)
nigel: is there a document we can reference?
nigel: an informative document about rendering different
characters correctly?
addison: a different place to look might be the various font
standards, which have introduced language codes that are
supported
<nigel> I heard r12a and katy express support for adding a note
to explain that correct rendering of scripts goes beyond
mapping code points to glyphs in a font
addison: there might be standardization there to look at - a
different way of accomplishing the goal here
r12a: two questions: (1) the safe list here is presumably based
on lowest common denominator for various devices?
pal: tables were built using a study of TV and motion picture
content
pal: collecting all code points that were used in that context
r12a: (2) why are we not just referencing CLDR?
pal: there are longstanding issue against CLDR to add flag for
text commonly appearing in subtitles
r12a: I think what would help is to add some text cautioning
against ossification
pal: [summarizes feedback received so far]
pal: we can try to formulate text along those lines and come
back for further feedback
stpeter: why not attack the problem at the CLDR level if they
aren't properly supporting text needed in subtitles?
pal: everyone's goal is to move this to CLDR
addison: we'd be happy to support that as well
addison: we do have a liaison agreement
pal: subtitles and captions are becoming a global requirement
and there are unique needs here; great example is musical note
character
<Zakim> nigel, you wanted to note that ossification is not a
feature of the list of characters but a wider issue
nigel: this point about ossification is a tricky one; e.g., if
you deploy player code to a device, updates might not be
available
nigel: e.g., a downloadable font could be possible, but more
work is needed to support the right characters
nigel: how do we phrase this?
addison: good question
<r12a> [11]https://github.com/w3c/imsc/issues/
236#issuecomment-367713408
[11] https://github.com/w3c/imsc/issues/236#issuecomment-367713408
r12a: that link has some suggested text but it might not be
exactly what we need here - encourage folks to re-read
pal: I'll try to craft text based on the terms we used in this
call today
addison: would you like us to say something to the CLDR folks?
pal: +1
nigel: +1
pal: I plan to propose text soon for review by folks here
addison: any concerns about supporting the CLDR trac?
JcK: I'm nervous because it would be great to get down to one
standard instead of two; at the same time, CLDR has been
criticized for being opaque to folks with actual language
expertise and not just character coding expertise
addison: I'll take an action to focus it on the issue at hand
Action: addison: write to cldr on WG behalf about Trac 8915
including wording about getting exemplars right
<trackbot> Created ACTION-699 - Write to cldr on wg behalf
about trac 8915 including wording about getting exemplars right
[on Addison Phillips - due 2018-03-15].
pal: I will let you know when the proposed text is ready
Action: addison: make pal's new draft part of homework
<trackbot> Created ACTION-700 - Make pal's new draft part of
homework [on Addison Phillips - due 2018-03-15].
addison: anything else on this topic?
What Time is This Meeting At?
<Katy> +1
r12a: typically don't change time until UK changes to Summer
Time
addison: in favor
<Bert> (So no change for me then? That's good :-) )
<r12a> s/<JcK> No//
<r12a> s/<addison> trackbot, prepare teleconference//
Summary of Action Items
1. [12]addison: write to cldr on WG behalf about Trac 8915
including wording about getting exemplars right
2. [13]addison: make pal's new draft part of homework
Received on Thursday, 8 March 2018 17:05:11 UTC