- From: Fuqiao Xue <xfq@w3.org>
- Date: Tue, 20 Sep 2022 19:36:25 +0800
- To: www-international@w3.org
https://www.w3.org/2022/09/12-i18n-minutes.html – DRAFT – TPAC 2022: Internationalization Working Group 12 September 2022 [2]Agenda. [3]IRC log. [2] https://www.w3.org/events/meetings/121afb09-553e-4b68-854d-2ba64111c34b [3] https://www.w3.org/2022/09/12-i18n-irc Attendees Present (guest), (virtual), Addison, Atsushi, Atsushi (virtual), Bert, css), David Singer (guest, David-Clarke, fantasai, Florian, Florian (guest), Francois, Francois (guest), Fuqiao, Fuqiao (virtual), Greg, Greg (guest), mathml), Paul, Paul (guest, PeterR, Richard, Richard (virtual), xfq Regrets - Chair Addison Phillips Scribe addison, fantasai, xfq Contents 1. [4]Introductions 2. [5]CSS backlog 3. [6]Ruby markup status 4. [7]~* Break *~ 5. [8]MathML 6. [9]CSS issues 7. [10]https://github.com/w3c/csswg-drafts/issues/7183 8. [11]~* Lunch *~ 9. [12]Intros 10. [13]CSS Stuff 11. [14]Triage 12. [15]AOB? 13. [16]Summary of action items Meeting minutes Meeting Info: [17]https://www.w3.org/events/meetings/ 121afb09-553e-4b68-854d-2ba64111c34b#agenda [17] https://www.w3.org/events/meetings/121afb09-553e-4b68-854d-2ba64111c34b#agenda FIND US IN 'FINBACK', 3rd Floor (same as registration but in the far corner) Use this link: [18]https://us02web.zoom.us/j/ 85856632124?pwd=TzdnYzZTbUZNTkNGLzBkMG1rbDdEdz09 [18] https://us02web.zoom.us/j/85856632124?pwd=TzdnYzZTbUZNTkNGLzBkMG1rbDdEdz09 Introductions atsushi: JL-TF is preparing two documents, one is simple ruby, one is ruby-t2s-req r12a: Richard Ishida xfq: Fuqiao Xue Bert: Bert Bos florian: Florian Rivoal <fremy> François Remy, Invited Expert in the CSS WG <r12a> Paul Libbrecht, MathML WG <florian> Florian Rivoal, Invited Expert, CSS WG, i18n WG, Advisory Board Greg Whitworth atsushi: team contact for i18n, timed text, immersive web <r12a> Elika Etemad, CSS, i18n, ex-AB CSS backlog [19]https://github.com/w3c/csswg-drafts/issues/5421 [19] https://github.com/w3c/csswg-drafts/issues/5421 florian: we need someone from Apple to discuss this effectively r12a: maybe there's one or even two other issues that need to be read in conjunction with this one <r12a> see also [20]https://github.com/w3c/csswg-drafts/issues/ 4497#issuecomment-763459971 [20] https://github.com/w3c/csswg-drafts/issues/4497#issuecomment-763459971 [21]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aissue+is%3Aopen+label%3A%22Agenda%2B+TPAC%22+labe l%3Ai18n-tracker [21] https://github.com/w3c/csswg-drafts/issues?q=is:issue+is:open+label:"Agenda++TPAC"+label:i18n-tracker [22]https://github.com/w3c/csswg-drafts/issues/6848 [22] https://github.com/w3c/csswg-drafts/issues/6848 addison: Backslash & Yen sign behavior fantasai: last time I looked at the issue there's no good solution for it addison: this is the famous issue that existed forever … probably we need to have someone from WebKit to discuss it [23]https://github.com/w3c/csswg-drafts/issues/4606 [23] https://github.com/w3c/csswg-drafts/issues/4606 addison: Kaiti & cursive florian: it's probably good to discuss this in the joint session … we have some CSS people here, but we're missing some [24]https://github.com/w3c/csswg-drafts/issues/6730 [24] https://github.com/w3c/csswg-drafts/issues/6730 florian: we have specified in css text L4 … a pair of properties … if you have wbr in the markup … @@ which is typically used in titles … this is also useful because allowing line breaks … especially in children's books or for people with dyslexia … for line breaking you can use the usual … here it's trying to tackle the same problem differently … whether we reuse the existing machinery for line breaking or whether we make a new one … do we just have a giant pile of AI and it figures out where to put the breaks on its own … or do we need to be more strict on this and say it's Chinese … from an author's point of view … it's going to be important to know what it's going to do … you need to be able to ask not just about line breaking (with @supports), but line breaking in Japanese … the property we have in CSS, specify which language you want to line break … let the browser figure it out … that's the general two different approaches to try and tackle the same problem … CSS line breaking properties is already extraordinary complicated with so many properties interact with each other … I'm not excited about adding one magic switch that says ignore everything and do new line breaking addison: I hear a lot from Japanese designers … a lot of unfortunate line breaks … do you think it's related to that? addison: I'm trying to understand what people think the problem is florian: I made a talk about many line breaking things, including this one <florian> [25]https://florian.rivoal.net/talks/ line-breaking/#ja-titles [25] https://florian.rivoal.net/talks/line-breaking/#ja-titles florian: I don't think I can take over and project, but I just dropped a link on IRC ^ … if you press space and shift+space repeatedly to move forward [florian shows the slides] florian: we have existing properties that lets you switch two different modes addison: you can mark all the boundaries florian: if you mark all the boundaries, it's just like English … as Richard mentions, there is more than one way to do this … there's varying opinions … the new approach is don't add wbr, ignore the line break properties in CSS … I'm concerned about the inability to specify which language it is Francois: why not 'word-break: avoid'? fremy: @@ … if you can, fallback to 'word-break: normal' … if it doesn't work, it's the normal behaviour florian: there's word-boundary-detection and word-boundary-expansion … word-spacing is for where there is already a space and makes it bigger … this is for inserting a space florian: we could add a second keyword <myles> hello florian: for languages like Thai, word-boundary-detection has three values fremy: interesting to see if anybody cares about this r12a: they do … we're talking about the language Thai … there are many languages using the Thai script … like Northern Thai florian: you could switch out auto addison: if you don't know Northern Thai, don't use Thai … with proper language tagging florian: not language for the context, but language for the algorithm … maybe browsers don't know how to do line breaking for Cantonese, but they know how to do line breaking for Chinese <r12a> (languages using the Thai scipt: [26]http:// r12a.github.io/scripts/thai/index.html#languages) [26] http://r12a.github.io/scripts/thai/index.html#languages) florian: there's something dealing with the normalization of languages r12a: we have another issue about that topic as well <Bert> (Example of an unfortunate line break in English: ‘Hi, My daughter had an accident and now we need body parts to fix her / car’, from [27]https://freediculous.blogspot.com/2006/02/ unfortunate-line-break.html ) [27] https://freediculous.blogspot.com/2006/02/unfortunate-line-break.html r12a: it is a little difficult to hear you with your masks on florian: combination of word-boundary-detection and word-break: keep-all … go to look at the content … when you language tag things, if you language tag your content properly, the browser has exact algorithm for this, it will do the right thing, otherwise it will fallbak to normal fantasai: I would not classify that as normal vs strict … kinsoku rules are independent florian: at some point, if you're extremely picky about how to line break, you can go and add wbr fantasai: if you do word-based breaking and suddenly switch to phrase-based line breaking Bert: there's other thing you want, like length of a line florian: I don't want a complete new thing, a new magic mode that completely ignores the line breaking properties … I think I got useful ideas <fantasai> [current discussion is adding 'words' and 'phrases' keywords or something to 'word-break' property] <fantasai> [28]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aopen+is%3Aissue+label%3A%22Agenda%2B+TPAC%22+labe l%3Ai18n-tracker [28] https://github.com/w3c/csswg-drafts/issues?q=is:open+is:issue+label:"Agenda++TPAC"+label:i18n-tracker <fantasai> [29]https://github.com/w3c/csswg-drafts/issues/5995 [29] https://github.com/w3c/csswg-drafts/issues/5995 r12a: you started by saying you wanted to talk about the needs-resolution issues … but these are tracker issues, addison <r12a> [30]https://w3c.github.io/i18n-activity/reviews/# [30] https://w3c.github.io/i18n-activity/reviews/ r12a: we don't have a single CSS label <r12a> filter on needs-resolution r12a: if you go here ^ and filter on needs-resolution <fantasai> [31]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aopen+is%3Aissue+label%3Ai18n-needs-resolution+ [31] https://github.com/w3c/csswg-drafts/issues?q=is:open+is:issue+label:i18n-needs-resolution+ [32]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aopen+is%3Aissue++label%3Ai18n-needs-resolution [32] https://github.com/w3c/csswg-drafts/issues?q=is:open+is:issue++label:i18n-needs-resolution <fantasai> [33]https://github.com/w3c/csswg-drafts/issues/ 771#issuecomment-1182339573 [33] https://github.com/w3c/csswg-drafts/issues/771#issuecomment-1182339573 fantasai: proposal is to get the WG to resolve it <fantasai> if i18n agrees with the proposal, I'll get CSSWG to resolve on it addison: atsushi and r12a seem to agree with fantasai's comment fantasai: should we do this to handle justification for ruby annotations? r12a: what we discovered it's more likely the Latin text is centered <fantasai> [34]https://github.com/w3c/csswg-drafts/issues/5995 [34] https://github.com/w3c/csswg-drafts/issues/5995 addison: doesn't sound controversial fantasai: Should auto-hide match use NFKC or other normalization? addison: NFKC is usually not a good idea … there's a lot of things, it's kind of an uncontrolled @@ r12a: I think it's NFK mapping addison: I suppose we need to make some research fantasai: if it's too aggressive we can do some custom normalization fantasai: I think it's legitimate using different representations r12a: would not be good idea to normalize stuff that people have typed … automatically annotate your Japanese or Chinese … e.g., you could have katanaka, and the kuten marks decomposed … if it's a different kanji character, you probably don't want to unify them in any way … if it's real normalization, maybe it's useful fantasai: we should probably do NFC for auto-hiding <fantasai> [35]https://www.w3.org/TR/css-ruby-1/#hiding [35] https://www.w3.org/TR/css-ruby-1/#hiding r12a: if it matches, it removes the annotation addison: the comments are all pointing to things like whitespace normalization … possibly normalize inernal whitespace … two space become one r12a: if you inline, you're not removing anything from view … we're talking about real edge cases here florian: Xian and Xi'an example in Chinese <David-Clarke> Should half-width kana match full-width in this case? fantasai: I feel that is a different issue <fantasai> * whitespace normalization <fantasai> * NFC normalization <fantasai> * East Asian Width folding florian: you're using half-width katakana because it's tiny … [explains the half-width katakana example] fantasai: whitespacing is sometimes accidentally introduced … you have trailing/leading whitespace … I think what we should do is whitespace and NFC normalization for auto-hiding addison: sounds like a reasonable starting point … need to think about the edge cases florian: I suspect if we start with NFC, it's safe … if it's rare enough edge case, we probably shouldn't do anything by default addison: choose your code point carefully <fantasai> Proposal for normalization of base and annotation text before auto-hiding: <fantasai> - Use NFC normalization (not NFKC) <fantasai> - Trim white space <fantasai> Anything else, authors should adjust manually using `visibility: collapse`. addison: it's not asking people to store or something like that fantasai: my suggestion going forward is to ask the WG to see how they feel with NFC matching r12a: you're not actually displaying diffreent things, you're just matching … if you got 2 or 3 spaces in between two words … it's not really relevant here addison: we've just done discussing wbr <fantasai> Commentary on why we have the current spec text [36]https://github.com/w3c/csswg-drafts/commit/ 0c972dc6d3a3bd34ee9ce63bfd5babc55f0afb14 [36] https://github.com/w3c/csswg-drafts/commit/0c972dc6d3a3bd34ee9ce63bfd5babc55f0afb14 Ruby markup status florian: extremely long discussion to try and make it possible to write the ruby markup … a pull request against the HTML spec … as soon as we find time to actually do it we should be very quick to make a FPWD … we actually have two impls … firefox and amazon kindle ~* Break *~ MathML polx: MathWG is working on v4 of MathML … MathML is XML format for writing math notations … v4 the biggest novelty is trying to get speakout to work … so that a11y tools can read math out loud … MathML is known to sort-of work on that, but we want this to become proper … current development means adding an attr, intent, that describes how parts of tree will be spoken out … should be combined with default knowledge of how to speak things, which is currently the fuzzy part of the spec addison: intent is structured data? fremy: is it fixed options or freetext? polx: freetext, but has some placeholder that allow you to delegate the rest of the speaking to inside fremy: so templates? polx: template language being developed, part of more unclear part florian: It's freetext in a human language? polx: Yes, that's why i18n aspect is interesting … I believe MathML lives in lang-tagged trees … and I think the voice that is used to speak this, depends on user … not sure if i18n has special concerns? addison: You're touching on some hot buttons … one is that putting natural language text into an attribute makes it hard to localize / translate … can't be lang-tagged polx: it's mono-language, so whole subtree of language … there are alternative representations but addison: common thing to want to do, if you want to localize something you have the intent ... can have multiple ones with different lang tags … can localize sub-parts polx: would translate the whole subtree addison: also case that there are things like structure of natural language is not, you can just add words together to make senstence … so if you structure things that way, then wehn you put into another language, then you don't have attributes in correct order … need to rearrange to make it sensible polx: There is a dynamic to how intents are combined … pull things out and make one single sentence … makes it non-navigatable, a11y tools like to navigate sub-elements … but [missed] addison: I'm not an a11y expert, but usally what you want is you want a stream of words … that you're feeding to the TTS engine … might feed language tags to get it to pronounce correctly polx: The whole world around it would be Russian, expectedly addison: what I'm saying is, there's a stream of text that would go to processor that will read it out … your problem is that to generate the stream of text … you're providing a way for ppl to mark up their math with the content such that it generates that string of text … and if you only ever had a document in one language at a time, that would be maybe possible to do … but different languages have different requirements … you'd have to reformulate your content to be in a different language … e.g. Japanese has very different word order than English … so you would need to set it up so that stream of text would be in the taret language florian: If I've understood correct, idea is not that each subtree has a piece of text that gets added together … but idea of having a language tmeplate thing … can have subtrees invoke the grammar in the right way … so concatenation order wouldn't be a problem polx: Sometimes can't, so in "...", you'd have different needs to put things on the parent alone … because you cannot use the templates in a reasonable manner addison: Harder than it loos, bcause you have agreement issues, e.g. don't have just zero/one/plural … and math of course has lots of numbers <r12a> it would be very useful to see an example ! addison: interesting project at Unicode, next generation XXX format, to describe localizable structures … for inserting runtime formatted strings … called the Message Format WG of CLDR <florian> fantasai: have you heard of l20n project at Mozilla? <florian> fantasai: it was a templating system that Mozilla was working on maybe a decade ago <florian> fantasai: that was to deal with agreement or inflexions and other grammatical things, in order to deal with these in the Mozilla UI addison: I think that evolved into Fluent, which is a format that does that … other groups doing similar things … and all those groups working on this message formatl also … to build a system polx: Problem with math is that there's extremely big variation on abstraction … predicting contents, can depend on resolution you might not come t … speaking in an abstract way addison: I have illustrations of why doesn't work but polx: Send it around, it would be useful addison: My first reaction is, I think I understand what you want to do, super common to want to build a templating language … trick is it's hard to do well from i18n pov, build it for one language, and have to rewrite for other languages … there are better mechanisms to support doing these things … I think it would be a good idea for us to look at your proposals and help make connections early on <Bert> [37]Example of ‘intent’ in MathML4 [37] https://w3c.github.io/mathml/spec.html#mixing_intent addison: from a very high level your description sounds like it would be problematic florian: another thing to mention, since you put text in attributes, is limiting because you can't have markup … since trying to display text, often need some extra markup … if speaking rather than displaying, could be different … but also have other things in CSS, that supposed to let you style how things are spoken fremy: feedback I got is ppl don't want that florian: if you stick things into an attr, you can't extend it … if you can use regular elements, that opens up more possibilities … maybe more than you want, but won't run into problem of less than you need polx: There's an element in MathML called <semantic> for alternate representations, e.g. LaTeX representation alongside MathML … these things are all there, but known to be too complex to be of use … hard to make it simpler, honestly … because i nthe end what you want is parallel trees … and need to hook them up with IDs, and it works, but it's art fremy: I think what you're trying to do, it seems you're trying to create a text representation for elements in the a11y tree … very close to concept of aria-label and aria-description polx: it is fremy: why not use the existing system that they use? … they already have this concept … if you use aria-labeled-by, can have a list of things polx: There are guys who are in the aria groups, and aria-label is considered part of this scenario, but different impl possibilities … offer in ways that are independent … not sufficient for our formatting fremy: I would like to understand why it's insufficient, because it would help us understand what needs to be worked on r12a: This discussion would be a lot easier if we had an example r12a: The other possibility occures to me, you have a templating language which creates something in English and you translate it to other languages … rather than trying to have a templating language that serves every language polx: What do you mean my translating? r12a: As I understand it, you're coming out with a sentence that sayse "The third root of 64" … and that represents the formula that will appear on the screen polx: right r12a: you've got all those bits in that formula which you can assign words to, and then you have to understand relationships among them and how to create syntax for that … then need to figure out agremeents e.g. pluralization … and do that for every language … but another possibility is that you generate a string in English … then only have to build all that complex stuff in English, and then you use translation mechanisms polx: This is a support aspect. You could do that, and you could do that at the authoring level or in the browsers … but at some point you want control over that, and this is the space we're creating with the intent … we want author to control how is spoken fremy: also translation isn't cheap … can't run it on client side every time you have a math equation, it doesn't scale addison: if you have a true machine-translation engine, then it's not cheap to create but maybe can do that fremy: I work in machine learning, and machine translation is multiple gigabytes in memory … very few programs translate things correctly on a computer … that's why they use servers … small machine translation is very low quality, help if you are stuck without internet … but not something that can be relied on polx: Sometimes can do wonders with automatic translation, and can help author … but whether author wants to render everything into a string, and then get that translated, and then get it checked by a math expert is one thing … [missed] … as soon as formula becomes really big, becomes essential <fremy> fantasai: I have concerns about having natural language in attributes <addison> fantasai: some concern about natural language in an attribute, because we often markup <fremy> fantasai: not all accessibility engines use a speech engine <fremy> fantasai: sometimes braille output for example polx: Braille has a special math pattern … One Hungarian guy I forget has a system for this … I don't remember … but I know that many ppl are feeling that this standard for Braille math is limited, but it is what everyone uses addison: Also don't get wrapped up in saying it's just a11y, lots of documents are read alound these days … so general purpose TTS is more prevalent than it has been <Bert> Math in Braille often uses Nemeth Braille. addison: so is it good enough to serve a11y audiences? Maybe, and that's where tech has been driven from historically … but it is expanding quite a bit polx: Yes, all wondering about listening to math in the car while driving addison: "Alexa, read this paper" … that's when you have some piece of MathML embedded, and needs to become... fremy: aspect to keep in mind, MathML has 2 standards, and the one that is used is a presentational format … it says how it looks on the screen, but same notation can be used for multiple things … that's why you need intent … need this on a letter can mean multiple things, that's part of the intent aspect right? polx: you can use MathML Content to do better, but it's too expensive … math professionals are more comfortable thinking about how to write things rather than how you mean them fremy: Content is intended to be a middle ground, still describing presentation but with more info … I think it does make sense to me, looking at the example … one thing that maybe I am wondering is removing the idea of the string representation … I would argue that this is not a good idea to put in intent … I would limit it to things that can be understood … If you want to express something outside of intent, should use aria-label … for example in the spec you have x power to the ? … suppose you want to read as position … then this should be done at the aria-label level … so you have the x arg, you have x aria-label = positoin … and then you can compose the stentence … but I would refrain from using intent to scope things outside the scope of intent … I think it misses the bar … because it becomes very confusing if you can rename things … if you go depeer in the hierarchy, these renames won't be consistent … So want to see if can remove the freetext option from itnent … and if you need freetext, use aria-label on the functions to give the freetext you need … and that is a tech that already exits … and get those translated … it flows into natural localization pipeline for HTML … and enforces the idea that 'intent' is something the computer can understand … freetext is something the computer cannot understand polx: What do you mean computer can understand … being able to understnad more of the intent of the expression … my experience is this an extremely American point of view … as soon as you go farther … the bigger problem when you do this understanding, you want to understand in a semantic world that is well-defined … and mathematicians have been creating math or centuries, and many things are not encoded fremy: Not saying computer understands the equation, but understand each piece … intent should be structured, but if you need a name, should use a name from the markup … stilll rely on existing tech, but compose [missed] … this seems more reasonable I think polx: This is interesting, we'll be meeting on Wednesday … is interesting thoughts <fantasai> +1 polx: Wondering if we should consider, if single-language is safe enough … or should be safe enough florian: One of the beauties of math notation is that it is not natural language … in translation description can be different, but the equation will be the same … to a large extent florian: It's shared … and if we could enable those formulas that are not strongly tied to a natural language to be re-used as-is in a bunch of different language documents as-is … would be nice, but certainly more complicated polx: There's a will in the intent definition that trying to make it as simple as possible is a most important quality … and might be reason why all these templating languages feel inappropriate addison: integrates well with other tech stack pieces then make ssense … the more different special thngs ppl invent, there's less likely to have widespread support … e.g. re-using aria-label insofar as possible, already widespread polx: One thing unsure about is how to encode defaults … so that a11y tools don't need intent as much as possible … probably this is doable for basic math, for English language … if you go to any European language, there is no complete tool with these defaults … if you go further away, then this will be almost impossible … to use, for every different language, can stick the ENglish name and translate, seems doable but not sure … and then things like i is used as root of -1 , but understood to be something else in different fields … e.g. H2 is hydrogen or 2nd homology group … currently we seem to avoid being able to speak a proper domain name … this is crystallograph or organic chemistry … we don't know whether there's a way to model this kind of subdomain things … because at the end you end up very scattered addison: I understand what you're saying r12a: We r12a: We're talking about describing an expression, why don't we have something like alt attr polx: 2 reasons polx: This is one string for whole subtree, which is what aria-label/descri can do … but this is not enough for navigating through the subtree … as you move in the subtree … take out some parts and re-use other things r12a: Thanks … point I wanted to make, before I joined W3C I worked at Xerox as global design consultant and helped develop the i18n aspects of the corporate engineering process … if you're developing a product, principle of develop it in at least 2 languages … My recommendation to you, because sounds extremely complicated, is that you develop it in English and another language e.g. Arabic or Japanese ,which are substantially different in syntax … and try to concucrrently develop the tech in all those languages at the same time … you'll have a better idea of how to develop addison: Danger is that WEstern-european types can assemble something that works, but breaks down as you move to other language sets polx: Exactly the problem we have right now in non-standardized software addison: Get proof of existeance, and then encounater problem of like Japanese having very different word order … or different agreements with numbered … and that's where you discovered have features, but can't go there … if you can make it work for an array of languages then you can sneak up on some aspects of the problems … as you see from earlier discussion in CSS, still corner cases that are hard … don't have "well it worked in English" and then get stuck r12a: I chose Arabic and Japanese on purpose … Japanese has a SOV word order … but also has very little agreement and very vague language … Arabic on the other has lots of agreement, and VSO order … and also has single tense, dual tense, and multiple tense in terms of plurality … so those two languages cover a lot of range in the problems you're likely to run into polx: Unfortunately both those languages are colonized in terms of math notation. They write in French notation … and I believe that the Japanese have been taking math notation since 1920s from Americans with almost no difference florian: From notation, yes, but from the way they speak it polx: you're right r12a: You know about our note about MathML? polx: ?? is the author, but is unfortunately not involved anymore … we had one guy which has just left recently, might come back, is BUlgarian and have a bit more exotic math formulae formatting … so we have French, German, and English in the group … and Dutch with Bert :) addison: Point though is not the math notation that's different, it's the natural language aspect polx: You're right that Bulgarian might not differ as much from grammar fremy: Right now the spec doesn't include list of templates polx: working in Google sheets fremy: Exercise that seems worthwhile is to sample 100 equations from Wikipedia, and ask people to write how they would read these formulas in their own language <Bert> [38]https://www.w3.org/TR/2006/NOTE-arabic-math-20060131/ [38] https://www.w3.org/TR/2006/NOTE-arabic-math-20060131/ fremy: it's difficult to imagine without this sampling … it will tell you which patterns are most often recurring, which will tell you the focus of intents … and will also tell you the different ways these are descried in different language, … will show whether your strategy will work … and if so do you need more, e.g. you realize you need singular/plural. or male/female … for some of the letters … maybe then you need to say this is an attribute we may want to consider … you will not be able to solve all the challenges in the first version … but it would get you idea of what are the major issues … Consider how can you cover with simplest possible approach these cases … It's a survey also that's not too hard to run … this will help a lot in shapin gyour desing polx: Also within a language, ppl will speak things differently <fremy> fantasai: I think we could probably run the survey at a Math conference <fremy> fantasai: and some would think of this exercice as "fun" <fremy> fantasai: compare how they would voice a formula vs friends <fremy> fantasai: and there would be people from all over the world in these conferences florian: When I was in engineering school, Vietnamese students and us understood each other better in math than anything else … they had learned to speak the notation in Vietnamese, and also learned in French r12a: You also have to be careful, I spent 6-7 years teaching globalization … and I would be teaching developers who spoke those languages how to develop i18n … and they'd never apply the idea of "oh, we do this differently to how it's being implemented here" … so you can ask them, but they might not have ever thought about it fantasai: The advantage of fremy's question is it's very simple, don't have to think deeply about it just write down how you would read it <fremy> fantasai: reading in your own language is easier because participants don't need to think about it addison: There are common patterns to this, this is similar to other things that ppl have done … so maybe we can connect you with some resources … and have some guiding discussion to show you the kinds of things that you can polx: One thing done in MathML 3 introducing long division … you have an amount of ppl, asking "how do you write long division in your country" … and found 17 different ways … and it differs addison: There are styles even within langguages … many ways to do the same thing, all of which are valid, just stylistic or preferential … so have to account for those differences r12a: Have to account for, whatever you come up with should be understood by everyone addison: Myles will join in 5 minutes, any other things on Math? polx: If you can send me links to experiments, would be very helpful … indeed the design seems like it is something ppl have been doing addison: wherea re you in the cycle? polx: This is the FPWD … so enough time to inform the design … really a big trade-off between simplicity and explicitness [discussion of possible survey] fremy: If you have this presentation, how do you read it? … might not be the preferred presentation but how do you read it <addison> [39]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aissue+is%3Aopen+label%3Ai18n-tracker [39] https://github.com/w3c/csswg-drafts/issues?q=is:issue+is:open+label:i18n-tracker <addison> [40]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aissue+is%3Aopen+label%3Ai18n-tracker+label%3A%22A genda%2B+TPAC%22 [40] https://github.com/w3c/csswg-drafts/issues?q=is:issue+is:open+label:i18n-tracker+label:"Agenda++TPAC" <r12a> [41]https://github.com/w3c/csswg-drafts/ issues?q=is%3Aopen+is%3Aissue++label%3Ai18n-needs-resolution [41] https://github.com/w3c/csswg-drafts/issues?q=is:open+is:issue++label:i18n-needs-resolution <addison> [42]https://github.com/w3c/csswg-drafts/issues/6848 [42] https://github.com/w3c/csswg-drafts/issues/6848 CSS issues myles: on windows certain fonts display backslash as yen sign, so people use backslash where they mean yen … so on macos we have to do something to make these fonts display to the user intention … only certain fonts or certain encodings fantasai: do we have an idea of the best way forware r12a: kida-san provided some recommendations <r12a> [43]https://github.com/w3c/csswg-drafts/issues/ 6848#issuecomment-1226798241 [43] https://github.com/w3c/csswg-drafts/issues/6848#issuecomment-1226798241 florian: can take a shortcut to talk about yen, but korean has won sign … they appear in file paths for windows … in asia, very familiar … makes me wonder if kida-san's recommendation is correct, since any webpage will use Unicode 5C but expect to show yen or won or yuan … normally characters should be different for a reason addison: this is holdover from DOS days … see ppl use \ as currency symbol … I don't think modern APIs generate that often florian: keyboards do addison: I agree with Myles we need to solve this in a consistent manner … because will be tricky … because intention is lost myles: Is there a key on Japanese keyboards for Yen or Won sign florian: I think answer is no, you press just the one key … backslash/yen key … how software converts that to Unicode is maybe they do 5C ¥ florian: but there's just one key atsushi: Some keyboards have both … my keyboard has both myles: Do you know if those keyboards are common? addison: just switching to IME doesn't get you yen sign until you swith out of directed mode … but in command shell you'll see paths displayed consistently in those localse with those symbols r12a: What about escape codes? Do they all start with yen sign? florian: I guess so … but not sure, not on Windows for too long … and this really is a Windows-ism … it's not a Linux thing and not a Mac thing addison: You could wish to start to repair the world <Bert> [44]Some photos of Japanese keyboards [44] https://en.wikipedia.org/wiki/Japanese_input_method addison: certainly backslashes as backslashes outside a path context florian: If you're thinking about a Mac author writing an article about Windows, it would be fine if you don't get it automatically <atsushi> [45]keyboard map examples [45] http://qa.elecom.co.jp/faq_detail.html?id=5262 florian: and have to work to find char for Windows users … but if you have a machine where the font renders \ as Yen … then won't notice the oddness … Kida-sans advice, does it work if we can't fix the font? … Removing tricks from fonts is nice, but fonts are already out there. Too late to fix myles: interesting observation is that if you use ICU to convert the byle 5C from Shift-JIS encoding … e.g. say this sequence of bytes is a SHift-JIS encoded string, and that byte is 5C … if you then take this string as 1 byte and ask convert to UTF-8, the result that ICU produces is also 5C … so ICU at least seems to be thinking that the encoded byte 5C in Shift-JIS is backslash rather than meaning yen sign addison: it absolutely has to, because underneath the hood the OS expects a backslash in the path … just a thing in East Asian OSes that the DOS fonts and later presentational fonts show paths as having the symbol in them … I don't think it was shift-JIS, I think it was the single-byte national code sets that had yen sign in them … so I think that's the right behavior for a converter … but what's happened is that everyone got used to path separators looking like currency sybol … even though underneath the hood they're really 5C … which is horrrifying fantasai: So what do we want to do here? DO we want other borwsers to adopt WebKit behavior or something else? myles: not a mode, just any time you have a particular encoding OR certain fonts, we will automatically swap out the two characters addison: My question is, is this something one could style on or off myles: not with a CSS property. That's one potential option, could control with a CSS property florian: Should we have in @font-face some descriptors to tell what the font is doing? … currently triggering WebKit behavior on several famous fonts, but could be non-famous fonts myles: sound sreasonable myles: also this list of names is heuristic … if you make @font-face rule with same name, but source is a different font, that will still trigger fantasai: I think at that point you're asking for trouble florian: Maybe intial value of descriptor can be auto … [missed] and trigger the right behavior myles: This code is older than WebKit-Blink fork, and Blink doesn't have it so must have intentionally removed it fantasai: They also aren't as focused on Mac, so maybe not as focused on that? florian: These fonts are not on Android either addison: These fonts are named in the stylesheet and subbed in OS … but taking the behavior florian: Chrome on Android should be having the same problems as WebKit on MacOS … but Chrome removed it, possibly on purpose <florian> fantasai: the two options we have are <florian> fantasai: 1: remove this special behavior from webkit, and just let the font do what it does <florian> fantasai: this will result in pages result very different on windows vs other OS <florian> fantasai: option 2: encode this behavior in all browsers, and possibly add some css to control it <florian> myles: we could change our heuristic <florian> fantasai: but something more or less like it <florian> fantasai: we should probably take that to the CSSWG Bert: This might also occur to other languages florian: It happens for sure in Japan and Korea addison: Also affects simplified Chinese, maybe also traditional fantasai: If we standardize this, should expand to other affected languages addison: I think limited to East Asian at least Bert: WebKit only does Yen sign, right? florian: Do you have equivalent heuristic for Korean, or don't do it for Korean? myles: I've exhaustively listed our cirteria polx: Is there special behavior for French francs? florian: There were symbols, but never intermingled with backslash in encodings ACTION: fantasai to summarize into issue, for discussion in CSSWG <trackbot> Created ACTION-1194 - Summarize into issue, for discussion in csswg [on Elika Etemad - due 2022-09-19]. myles: If other browsers refuse to implement, this makes our decision for us fantasai: Thats why need to discuss on Friday <r12a> [46]https://github.com/w3c/csswg-drafts/issues/7183 [46] https://github.com/w3c/csswg-drafts/issues/7183 <r12a> [css-text-4] Make autospace a property, rather than a value of text-spacing #7183 [47]https://github.com/w3c/csswg-drafts/issues/7183 [47] https://github.com/w3c/csswg-drafts/issues/7183 r12a: I think there are advantages of splitting these two apart … and may even be able to do some additional stuff, such as replacing normal spaces with autospacing myles: When you say autospacing, can you describe? r12a: in Japanese, there's typically a little bit of extra space between Japanese chars and numbers … or between Japanese chars and Latin … and that's something that if you put in an actual space before/after … those spaces are too big … and don't really belong there … so the autospace property applies that extra spacing without having to add that spacing … which everyone wants that … whereas text-spacing is stretching gaps myles: I'm confused, what's the difference? r12a: text-spacing applies equal amount of space … autospacing is particular to context … and another question of applying lots of these spaces across range fo text … about surrounding text with a bit of space on either space … often fixed-size space <r12a> myles, see this (read the whole section) [48]https:// r12a.github.io/scripts/jpan/#letterspace [48] https://r12a.github.io/scripts/jpan/#letterspace [fantasai explains what text-spacing does] myles: transform spaces in source? fantasai: either transform or to insert where not already there r12a: also includes reduction of space around punctuation … everything to do with space, rather that different types addison: so could split different classes of mechanical spacing … for CJK autospacing would give you for runs of non-native text … and not affect any other spacing r12a: Splitting it out allows you to be more specific … apply to certain cases and not others r12a: I wanted to throws this out there because I think there's been no movement on it fantasai: haven't been working on Text 4 lately myles: I don't wat to comment on property split … but our native text engine CoreText has a similar feature for Chinese and Japanese text … where it inserts spacing … in various places between different kana, punctuation, for Chinese and Japanese … and it has specific rules about where that happens … text-spacing property in Text L4 has a bunch of values which are fairly prescriptive about where space goes … so for us, the reason that we like the auto value here is it's a way for CSS text to match the native text engine … to get equal fidelity with native apps and webapps r12a: i'm not arguing against an auto value myles: If we have auto value in its own property, what would be the meaning if you specify "do autospacing" which for us would mean match platform *and* you supply different value to text-spacing in conjunction r12a: have a read of this stuff and the description I pasted I pasted into IRC … what I'm saying is that these are different things that involve gaps … for different reasons and in different ways myles: Question is what does it mean "do autospacing" and also say "text-spacing: trim-start" r12a: I'm not sure that there's a clash there … you're just offering content author ability to handle independently … I don't think they overlap fantasai: different ways of splitting the control … text-spacing could shorthand two properties … one for punct. vs. script boundaries … or could have an indepdendent property for controlling the space replacement vs. how much … set for whole doc "how much it is" vs. turning on and off … think about what is more ergonomic for authors … want to control how much spacing … could go in another level, was originally in L3, could consider in L4 … for example, underline position is separate from whether it is on or off myles: When reviewing r12a's document, I see text about letter-spacing, initial-line punctuation, text-indent, … want to make sure I'm not missing autospacing r12a: autospacing I'm talking about is the spacing around alphabetic or numeric phrases … seprately is spacing around punctuation … felt it easier to split up that way for readers <br type=lunch duration=50min> ~* Lunch *~ New Dial-in: [49]https://us02web.zoom.us/j/ 85205096646?pwd=Z0tIVk1PdHlPZ20vQlBVRmVqSG1RZz09 [49] https://us02web.zoom.us/j/85205096646?pwd=Z0tIVk1PdHlPZ20vQlBVRmVqSG1RZz09 Intros PeterR: Peter Rushforth PeterR: interested because of indigenous languages and making them happen in browsers … maps is my focus, but fact finding David: connect with Andrew Cunningham perhaps? CSS Stuff fantasai: color contrast discussions [50]https://github.com/ w3c/csswg-drafts/issues/6319 … to be aware of [50] https://github.com/w3c/csswg-drafts/issues/6319 fantasai: question about whether color contrast values are affected by writing system, and how to have algos account for this fantasai: Another unsolved issue is top metrics for non-Western scripts [51]https://github.com/w3c/csswg-drafts/issues/5244 [51] https://github.com/w3c/csswg-drafts/issues/5244 fantasai: Related to Kaiti issue is fangsong issue [52]https:// github.com/w3c/csswg-drafts/issues/4425 [52] https://github.com/w3c/csswg-drafts/issues/4425 [discussion of what styles fall back to what] florian: Define grasscript only over the CJK range … because don't want it to fall back to children's handwriting font dsinger: Maybe look at semantics of what the styles convey … e.g. if about emphasis, translate it florian: but what's the Khmer equivalent to writing German in Fraktur? dsinger: That would be archaic, that's the semantic, roughly … but this is how you emphasize in Chinese, so go to bold or italic in English florian: we sort of used to try to do, either serif or sans-serif or cursive, but moving away from that because mapping is too hard addison: semantically differnet and not 1-1 mapping … Japanese emphasis might have bg color difference, or emphasis marks, not bold or italc … can style em or strong to be these things … but really different things addison: drop-cap thing, if you try to smash everything into Western typographic, that doesn't match how fonts are structured or how the script works <florian> fantasai: what we need to consider is that we're not going to be able to map every style of font, even in western typography <florian> fantasai: it should not be our goal to be exhaustive <florian> fantasai: the reason to create a new generic style is if you were using that to convey semantic differences or contrast <florian> fantasai: we need to have css be able to fall back when the font is missing to something else that would express the same semantic contrast <florian> fantasai: in English text, you wouldn't switch between Times and Palatino to to express anything, but you might switch between italics or monospace or something. That needs preservation <florian> fantasai: same logic should apply to chinese: if the text switches from something to grass styles to express a distinction, then we'd need it, but I suspect you won't actually find text… <florian> fantasai: …where that is the only difference. Using a different style for a heading isn't strong enough, as there's other things that distinguish the heading. addison: are generics about "give me a font with this type of styling generally" or ... fantasai: The were added originally for that, but that's not what we need … fantasy or cursive are useless because their purpose is to convey a feeling and they cna't do that because such a wide variety of fonts in each category … fantasai: you can use lang tags to tweak generic choices dsinger: ... dsinger: We don't have a place to put information about shaping of certain language/writing systems addison: If you look at Urdu vs Arabic, they have different stylistic variations … not serif vs sans-serif … you can look at them and say their not really serif or sans-serif … can smash into those buckets, or do we recognize that without changing language there are different font styles … is it semantic thing … I can argue both sides, it's really hard to add generics … shaping engines work in specific ways with info we haave … some token to pass, this is what I intend … without being able to know what fonts are installed on a machine dsinger: I'm hearing it's inappropriate to talk about generics in Latin terminology … so we should have names for things they do in those scripts … but then we have a problem of translation, what does it mean in other script florian: don't necessarily have a problem, can apply :lang() selectors to choose fonts differently … but if we say that Kaiti is not cursive, but new, then how many such new things should we have? … do we want to go as far as fantasai said? … or go further, e.g. I want Humanist typeface? … not just about adding keywords, but also browser needs to have access to the fonts *and* know which fonts map to each keyword fantasai: I think we have two critera … one is what Florian mentioned, which is can we reasonably implement this generic … other is do we need the generic in order to ensure the semantic preservation … e.g. if these two fall back to the same font, will the text be less understandable … nevermind whether it feels appropriate florian: typical example would be italics in English … if you lose the italics, you lose the fact that there was emphasis … if you have a document which uses italics for emphasis and you fall back to normal text instead of italics, you lose information … what are the cases in other languages? <Bert> CSS font classification isn't based on Vox, but showing that others are struggling with classifications, too: [53]ATypI abandoned the Vox classification and is working on new one. [53] https://en.wikipedia.org/wiki/Vox-ATypI_classification <florian> fantasai: the reason for the change should be in the markup, and then you style it however you want. But it often happens that then way you style thing is that the only difference is between the font face, then we need generics to be able to preserve that distinction addison: I think I agree with you, but your test may be incorrect … if you suppose that someone used that as the only distinction … e.g. I've seen serif vs sans-serif, e.g. as a form of emphasis … but you could imagine that a document that would pass your test and still say, well, the fact that the browser smashed these two styles together is because of a limitation of our ability to express in generics <florian> fantasai: we should not introduce generics to deal with that problem just become some one-off document made a distinction, but if it is one commonly made in the language, then it calls for generics fantasai: ... florian: I would like to see generics for more things, but if we are going to get a more limited set, the criteria you mention are the minimum we should aim for … I think it would still be nice to pick from general categories for preferences … e.g. naastaliq … But regardless, we can't just make nice keywords in specs. The browsers need to be able to map them … if we create 500 generics in all known languages, it's not going to have good coverage and not going to be helpful addison: it's like counter styles … I know I want certain things, but to force everyone to implement … if you're styling documents can use these keywords in this way, and it will do a good job of getting fonts that matches florian: maybe can provide premade style sheets for this … but even though I would like all to be covered by browser, if we have smaller set … fantasai is hinting at the minimum neessary for international text to work … nice to go beyond, but should at least start there addison: it's about where font management taking plae <florian> fantasai: where it gets implemented is a bit more of an open question addison: not necessary to spec … up to implementations <florian> fantasai: it often happens that introducing it in CSS puts the pressure on the reste of the ecosystem to make it happen <florian> fantasai: as the i18n WG, what we need to do is to identify the critical things that needs to exist so that the earlier criteria can be handled <florian> fantasai: just like western designers may wish to get the distinction between serif and slab serif, Arabic designers might wish for many distinctions, but that's not a priority dsinger: if writing a document [gives example of switching font styles] florian: if it's a one-off, that's one thing. If it's a regular pattern, need to build into CSS <florian> fantasai: but if the common type of document wouldn't make sense on a phone because the phone doesn't show the right distinction, then that's a problem. addison: [...] <florian> fantasai: css should be designed in a way that as you fall back through fonts, you may loose some styling, but you shouldn't lose meaning. Whichever generics are needed to make that happen should exist florian: Imagine we were not all familiar with Latin, and only had distinction relevant to our own language … discussing about adding generic keywords … as i18n, and they explain italics … if you miss that, you'll have difficulty understanding … if you can't preserve that you will miss information … they have many different font styles, which is nice, but need italics vs non italics … CSSWG knows how to introduce generics … but doesn't know what's needed to add … if i18n can say, in language X you will use font face changes to distinguish these different uses … functions similar to switching to monospace or switching to italics in Latin … if i18nwg comes back and says these 7 keywords would solve these problems, CSSWG can add them … but i18nwg needs to find these cases addison: Can identify here's a group of languages, and here's what they do … forgetting about the outside world, this is how they classify fonts fantasai: but we don't care how they classify fonts if they're not using those classifications to make distinctions within the same document addison: there are mental classificatoins … for emphasis, we've introduced different ways to style emphasis … because obliquing things is not the way to do it … We can describe what those all are … but can show what the cases are and have a discussion of where the bar should be … before we take the plunge and introduce a new generic … or should we do interstitial work that's separate <florian> fantasai: nastaliq vs kufi is not going to be a distinction used within a document to contrast things. Would be nice to have, but not critical for understanding [discussion of kaiti vs non-kaiti being used simlar to italic vs non-italic] fantasai: I think the problem with classifing kaiti as cursive would be that if you ask for kaiti, you might get grasscript which would be totally inappropriate florian: Would be like asking for monospace to express code and fell back to Zapfino … the contrast would be there, but what it means is lost … falling back to monospace would be better dsinger: in this document, use the font as distinction, and in other as stylistic … what do we do in that case … want both documents to be readable at least florian: problem of mapping fonts to categories, browser can do it if we introduce 3 new keywords; but not if we introduce 50 … a handful (worldwide), they can do it and it will be usable … if instead of 3 (ignoring the 2 uselss ones) we had 8 or 9, would be manageable … if we are asking for 50, will not be impemented … so what are the few extra ones that are critical for understandability? florian: can we action i18n to find the cases where font face category switches are needed for understanding common documents? addison: other challenge is we'll not find a global generic … we'll find a set of traditions over here with Kaiti, over there with another one, etc. … will find islands of variations florian: that's fine David-Clarke: Would things like old-fashioned/modern/etc be types of categories to look fantasai: no, because that's just a stylistic preference florian: The distinction here is critical to have for understanding documents, vs stylistic preferences ACTION: addison: follow up with r12a and others about gap analysis for font generics <trackbot> Created ACTION-1195 - Follow up with r12a and others about gap analysis for font generics [on Addison Phillips - due 2022-09-19]. <florian> Florian: the distinction between old style or modern isn't a wrong one, but it isn't a critical one in the sense that both aren't commonly used in the same document to contrast two pieces of text Triage [54]https://github.com/w3c/csswg-drafts/ issues?page=2&q=is%3Aopen+is%3Aissue+label%3Ai18n-tracker [54] https://github.com/w3c/csswg-drafts/issues?page=2&q=is:open+is:issue+label:i18n-tracker [55]https://github.com/w3c/i18n-request/projects/1 [55] https://github.com/w3c/i18n-request/projects/1 [56]https://github.com/w3c/csswg-drafts/issues/1790 [56] https://github.com/w3c/csswg-drafts/issues/1790 fantasai: some kind of overview might make sense to me … lot of details handled in there, not just baseline alignment not just in one writing system, but when mixed … and this section tries to account for all of that overview of baselines in CSS at [57]https://www.w3.org/TR/ css-inline-3/#css-metrics [57] https://www.w3.org/TR/css-inline-3/#css-metrics fantasai: discussion of text-spacing and adding rules to handle non-fullwidth punctuation [58]https://github.com/w3c/ csswg-drafts/issues/6091 [58] https://github.com/w3c/csswg-drafts/issues/6091 ACTION: atsushi: follow up with jlreq on csswg#6091 to see if non-CJK enclosing punctuation should be included in space-trimming <trackbot> Created ACTION-1196 - Follow up with jlreq on csswg#6091 to see if non-cjk enclosing punctuation should be included in space-trimming [on Atsushi Shimono - due 2022-09-19]. [59]https://github.com/w3c/csswg-drafts/issues/1282 [59] https://github.com/w3c/csswg-drafts/issues/1282 [60]https://github.com/w3c/csswg-drafts/issues/ 1282#issuecomment-952428897 [60] https://github.com/w3c/csswg-drafts/issues/1282#issuecomment-952428897 fantasai: review miriam's comment linked above and convince csswg about direction [61]https://github.com/w3c/csswg-drafts/issues/6915 [61] https://github.com/w3c/csswg-drafts/issues/6915 [addison explains how lang tags for undetermined language work] <florian> conclusion 1: :lang("") matches lang="" <florian> conclusion 2: :lang("*") matches everything but lang="" <florian> conclusion 3: maybe add a note about lang="und" and lang="" being treated distinctly, despite having similar semantics ACTION: florian to reread issue, and if conclusions still make sense in the end, post as the proposal <trackbot> Created ACTION-1197 - Reread issue, and if conclusions still make sense in the end, post as the proposal [on Florian Rivoal - due 2022-09-19]. AOB? <Meeting adjourned for the day at 15:40> Summary of action items 1. [62]fantasai to summarize into issue, for discussion in CSSWG 2. [63]addison: follow up with r12a and others about gap analysis for font generics 3. [64]atsushi: follow up with jlreq on csswg#6091 to see if non-CJK enclosing punctuation should be included in space-trimming 4. [65]florian to reread issue, and if conclusions still make sense in the end, post as the proposal
Received on Tuesday, 20 September 2022 11:36:34 UTC