[minutes] Internationalization telecon 2108-03-08

https://www.w3.org/2018/03/08-i18n-minutes.html




text extract follows:


– DRAFT –
            Internationalization Working Group Teleconference

08 March 2018

    [2]Agenda [3]IRC log

       [2] 
https://lists.w3.org/Archives/Member/member-i18n-core/2018Mar/0000.html
       [3] https://www.w3.org/2018/03/08-i18n-irc

Attendees

    Present
           addison, Bert, Fuqiao, JcK, Katy, Nigel, pal, stpeter

    Regrets

    Chair
           Addison Phillips

    Scribe
           stpeter

Contents

      * [4]Meeting Minutes
          1. [5]Agenda
          2. [6]IMSC visiting us!
          3. [7]What Time is This Meeting At?
      * [8]Summary of Action Items

Meeting Minutes

Agenda

    <JcK> No

IMSC visiting us!

    <nigel> [9]IMSC Issue 236

       [9] https://github.com/w3c/imsc/issues/236

    r12a: background ... ISMC uses Unicode characters, glyphs come
    out of fonts, rendering algos/engines are needed for complex
    scripts at times before glyphs are assigned; important in this
    discussion to be clear on terminology of character/codepoint
    vs. glyphs

    JcK: are you talking about single code points or multiple that
    might result in a single grapheme?

    r12a: single code points for this discussion

    <r12a> [10]https://www.w3.org/TR/
    ttml-imsc1.0.1/#recommended-unicode-code-points-per-language

      [10] 
https://www.w3.org/TR/ttml-imsc1.0.1/#recommended-unicode-code-points-per-language

    pal: purpose is to provide guidance regarding subtitles;
    enhance chance that if author chooses text it will be supported
    by the user agent and properly rendered

    pal: the intent is not to disallow certain code points or to
    require a rendering engine to not render certain code points

    addison: I think this is an extremely tricky thing to specify

    addison: first, implementers might see this as a required set,
    the only thing they have to support, etc.

    addison: for example, you wouldn't necessarily have enough code
    points to properly render Arabic

    pal: actually we have the common code points

    addison: doesn't deal with the need for more glyphs in your
    font

    pal: that's why worded in terms of code points, not glyphs

    addison: naive implementation would have glyph per code point

    pal: should we add a note about that?

    addison: most people build a system there's an instance of it
    for Arabic users or whatever script is in play

    addison: second point, CLDR has sets of characters like this by
    language (exemplar sets)

    addison: it might be helpful to reference CLDR instead of
    defining your own

    pal: we do reference CLDR - recommended set is a union of CLDR
    and ???

    r12a: I'm worried about implementers too, but this section is
    about authors

    r12a: my worry is that implementers won't see this as clearly

    r12a: make it clear that this is a guide for a minimum set and
    for real support you should go further

    r12a: also make it clear that implementers need to enable the
    display of the following sets of characters, not selecting
    those sets of characters

    pal: output document should only contain those characters

    addison: output document is displayed somewhere and needs to be
    displayed faithfully

    addison: depends on how system that receives it is implemented

    addison: shaping engine etc.

    pal: annex is intended to be used by validator implementation

    pal: validator that sees a character that's not in the
    recommended character set can flag a warning

    addison: is this really a good idea?

    pal: what's a bad idea is showing unsupported characters

    pal: realistically no implementation is going to support all
    Unicode code points

    addison: some implementations support everything but rather
    obscure code points (plane 2 Chinese, ancient scripts, etc.)

    addison: what I see happen is trying to legislate fairly narrow
    character sets, whereas many rendering systems are more capable

    pal: this is targeting not just browsers but embedded systems
    like TVs

    pal: also, this has already proved useful

    addison: implementers do have font and space limitations, but
    it's a slippery slope when recommending subsets of characters

    r12a: I understand the intent, my concern is in how we describe
    that to people

    r12a: e.g., if we said "these are the safe characters to use"
    makes more sense to me

    r12a: this comes across as "these are the Hebrew (etc.)
    characters you should support" but these sets tend to grow to
    support new code points

    pal: this is why we reference CLDR

    r12a: unfortunately CLDR is not a panacea - it's missing things

    pal: so let's fix CLDR

    pal: not displaying a character is way worse

    r12a: the crux is specifying a safe set of characters for
    authors without implying that implementers should limit the
    sets of characters they support

    pal: what about starting the annex with that text?

    r12a: that's the kind of thing I was looking for

    <Zakim> nigel, you wanted to ask what action we can take to
    address the remaining concerns.

    nigel: the struggle here is understanding exactly what the
    concern is and coming up with a proposal to address the concern

    nigel: this discussion is helping

    nigel: any other concerns we can surface here?

    JcK: I'm concerned about where this might be leading;
    displaying the wrong character is much worse than displaying
    parts of a string and not other parts (for instance)

    JcK: part of the concern is that there are many edge cases
    which can't be handled by this kind of approach

    JcK: e.g., if you get text in Hebrew script but another
    language then you might not have the right code points to
    display things properly

    JcK: there are traps here about writing this particular
    language with this particular script, but not other languages

    pal: I captured another concern earlier about cautioning
    implementers that one code point != one glyph

    r12a: if you're dealing with a complex script like Myanmar,
    there are more difficulties

    addison: when people go font shopping, they can be satisfied
    with an inferior font and the rendering engine doesn't have the
    glyph that's necessary

    pal: that's true regardless

    r12a: that's part of my concern - we shouldn't let implementers
    off the hook and stymie forward progress (yes, these are
    embedded systems that aren't updated often)

    pal: hard to phrase this in a technical document

    addison: these things tend to ossify into a lowest common
    denominator or institutionalizes some particular set of
    characters

    pal: I think we're safe in the sense that systems support all
    of Unicode - we're not trying to create a chokepoint for code
    points

    addison: not at document level but at the validator and
    authoring tool levels

    pal: that's why we don't reference a particular version of CLDR
    for instance

    JcK: the fact that CLDR exists does not imply that CLDR is
    correct

    Katy: even defining a list of safe characters can vary quite
    wildly

    Katy: to clarify, managing author expectations is difficult
    here

    Katy: not just glyph display but processing and the like

    nigel: maybe clarify for authors that you can't just get a
    glyph but there is more complexity - there might fallback fonts
    and such (not just safe characters)

    nigel: is there a document we can reference?

    nigel: an informative document about rendering different
    characters correctly?

    addison: a different place to look might be the various font
    standards, which have introduced language codes that are
    supported

    <nigel> I heard r12a and katy express support for adding a note
    to explain that correct rendering of scripts goes beyond
    mapping code points to glyphs in a font

    addison: there might be standardization there to look at - a
    different way of accomplishing the goal here

    r12a: two questions: (1) the safe list here is presumably based
    on lowest common denominator for various devices?

    pal: tables were built using a study of TV and motion picture
    content

    pal: collecting all code points that were used in that context

    r12a: (2) why are we not just referencing CLDR?

    pal: there are longstanding issue against CLDR to add flag for
    text commonly appearing in subtitles

    r12a: I think what would help is to add some text cautioning
    against ossification

    pal: [summarizes feedback received so far]

    pal: we can try to formulate text along those lines and come
    back for further feedback

    stpeter: why not attack the problem at the CLDR level if they
    aren't properly supporting text needed in subtitles?

    pal: everyone's goal is to move this to CLDR

    addison: we'd be happy to support that as well

    addison: we do have a liaison agreement

    pal: subtitles and captions are becoming a global requirement
    and there are unique needs here; great example is musical note
    character

    <Zakim> nigel, you wanted to note that ossification is not a
    feature of the list of characters but a wider issue

    nigel: this point about ossification is a tricky one; e.g., if
    you deploy player code to a device, updates might not be
    available

    nigel: e.g., a downloadable font could be possible, but more
    work is needed to support the right characters

    nigel: how do we phrase this?

    addison: good question

    <r12a> [11]https://github.com/w3c/imsc/issues/
    236#issuecomment-367713408

      [11] https://github.com/w3c/imsc/issues/236#issuecomment-367713408

    r12a: that link has some suggested text but it might not be
    exactly what we need here - encourage folks to re-read

    pal: I'll try to craft text based on the terms we used in this
    call today

    addison: would you like us to say something to the CLDR folks?

    pal: +1

    nigel: +1

    pal: I plan to propose text soon for review by folks here

    addison: any concerns about supporting the CLDR trac?

    JcK: I'm nervous because it would be great to get down to one
    standard instead of two; at the same time, CLDR has been
    criticized for being opaque to folks with actual language
    expertise and not just character coding expertise

    addison: I'll take an action to focus it on the issue at hand

    Action: addison: write to cldr on WG behalf about Trac 8915
    including wording about getting exemplars right

    <trackbot> Created ACTION-699 - Write to cldr on wg behalf
    about trac 8915 including wording about getting exemplars right
    [on Addison Phillips - due 2018-03-15].

    pal: I will let you know when the proposed text is ready

    Action: addison: make pal's new draft part of homework

    <trackbot> Created ACTION-700 - Make pal's new draft part of
    homework [on Addison Phillips - due 2018-03-15].

    addison: anything else on this topic?

What Time is This Meeting At?

    <Katy> +1

    r12a: typically don't change time until UK changes to Summer
    Time

    addison: in favor

    <Bert> (So no change for me then? That's good :-) )

    <r12a> s/<JcK> No//

    <r12a> s/<addison> trackbot, prepare teleconference//

Summary of Action Items

     1. [12]addison: write to cldr on WG behalf about Trac 8915
        including wording about getting exemplars right
     2. [13]addison: make pal's new draft part of homework

Received on Thursday, 8 March 2018 17:05:11 UTC