Fwd: [minutes] Internationalization telecon 2016-08-04 with Social Web WG

Forwarding to this list for information, since the whole meeting (joint 
with Social Media WG) was about support for base direction in JSON-based 
formats.

If you want to discuss these topics, see the Social Web WG issue list at 
https://github.com/w3c/activitystreams/issues

ri


-------- Forwarded Message --------
Subject: [minutes] Internationalization telecon 2016-08-04 with Social 
Web WG
Resent-Date: Fri, 05 Aug 2016 12:31:45 +0000
Resent-From: www-international@w3.org
Date: Fri, 5 Aug 2016 13:31:28 +0100
From: ishida@w3.org
To: www International <www-international@w3.org>, public-socialweb@w3.org

https://www.w3.org/2016/08/04-i18n-minutes.html




text version follows:


Internationalization Working Group Teleconference

04 Aug 2016

     [2]Agenda

        [2] 
https://lists.w3.org/Archives/Member/member-i18n-core/2016Aug/0001.html

     See also: [3]IRC log

        [3] http://www.w3.org/2016/08/04-i18n-irc

Attendees

     Present
            Addison, eprodrom, Steven, aaronpk, tantek, JcK,
            jasnell, rhiaro, cwebber2, r12a, Francesco, Sandro

     Regrets
     Chair
            Addison Phillips

     Scribe
            aphillip, rhiaro

Contents

       * [4]Topics
           1. [5]Agenda
           2. [6]Discussion of ActivityStreams with Social WG
           3. [7]AOB?
       * [8]Summary of Action Items
       * [9]Summary of Resolutions
       __________________________________________________________

Agenda

Discussion of ActivityStreams with Social WG

     r12a: One of the most urgent questions at the moment is how to
     go about ensuring that directionality works. I think we should
     not talk about language yet, but focus on text direction. There
     are significatn differences between how language and direction
     work

     <aaronpk> +1

     r12a: The key question is do we need a direction property to
     capture the base direction of the text
     ... There are two aspects to specifying the base direction
     ... The overall default base for a paragraph or sequence of
     paragraphs
     ... The other is directional changes inline

     <Francesco> finally I can hear you - sorry

     r12a: They're slightly different. certainly if we had a
     direction property it would only be capable of describing the
     default base direction for the paragraph as a whole and you'd
     still need other mechanisms to indicate inline changes in
     direction
     ... The AS2 spec, and micropub and webmention have the same
     problem
     ... What we say for aS2 is probably relevant for them as well
     ... We'll talka bout AS2 to keep it simple
     ... Allows two types of message. One can take html markup and
     one cannot
     ... There is another question, whether anything can be done
     about that
     ... I think we should talk about that after the direction thing
     ... Should we capture the default base direction for the
     paragraph in a separate property, or should we just rely on the
     data in order to obtain that information?
     ... One way that you can do that by relying on the data is by
     testing the first strong character in the text
     ... You miss out any weak or neutral characters until you find
     a strong one, and if it's a rtl character you say the overall
     direction for the paragraph is rtl
     ... That works a lot of the time and if you look at people's
     twitter streams you see that most of the time it works okay,
     because they tend to have just arabic text, or they have arabic
     with some embedded latin, which either is handled specially by
     twitter or is a simple embedding which doesn't produce any
     problems
     ... But there are situations where that first-strong rule can
     be duped
     ... Which is when you need to say explicitly no this is not
     actually a ltr phrase even though it starts with a ltr strong
     character
     ... Twitter actually doesn't use first strong characater to
     determine
     ... It looks at the number of characters in one direction and
     the number in another
     ... And the results due to that are unpredictable
     ... But that's the basic idea. If you use the text you can most
     of the time figure out the direction, and the rest you need to
     find a way to indicate it
     ... If you're dealing with markup you can add it there
     ... If you're dealing with name, you can add a lrm or rlm
     character at the beginning of the text
     ... (control characters)
     ... Unfortunately things are not quite so simple as that. The
     main question here is whether there is a value in having a
     separate direction property

     <Zakim> sandro, you wanted to ask about why directiion first,
     since it seems derived

     sandro: Does direction have to be managed separately from
     language? I would naively assume that if I knew the primary
     language of a text I'd know the primary direction of the text?

     aphillip: direction has a weak relation to the language. And
     language information isnt' always available or authoritative

     sandro: The order of solving these is surprising to me. If we
     solve the language problem we solve the direction problem?

     various: no

     <cwebber2> are there any languages that are both rtl and ltr?

     aphillip: Sometimes you can use language information to help
     infer the direction, but the direction you need in order to
     process it for display. It has it's own structure and needs to
     be managed in a particular way. Language does have some impact
     on display, but that's generally processes that are done
     separately from the bidi
     ... With language you're only inferring what the direction is
     likely to be for a particular paragraph

     <jasnell> there are fairly successful heuristic approaches to
     guessing the directionality from language, but it's not
     foolproof by any means

     r12a: there are many languages written in both ltr and rtl
     scripts

     sandro: that makes sense

     <aaronpk>
     [10]https://www.w3.org/International/questions/qa-bidi-unicode-
     controls

       [10] 
https://www.w3.org/International/questions/qa-bidi-unicode-controls

     <aaronpk> "how these control characters"

     aaronpk: I was reading the w3 guide on unicode controls and
     from this, unfortunately there's no anchor, but if you search
     for ^ you'll see the paragraph
     ... Speciically about the title attribute in html
     ... There's obviously no mechanism for base direction of an
     attribute in html

     Addison: You can set the base direction. You can't put markup
     inside the title

     aaronpk: The example given here is the example of mixed rtl and
     ltr text where there isn't one being dominent because the text
     is so short
     ... This seems to solve it
     ... I'm wondering why we can't just use this to sovle the
     problem everywhere?

     aphilip: If you want proper complete bidi layout then you need
     to do other thing sin order to make that happen. That can
     include using control characters
     ... A challenge is that in annotation cases you're going to be
     taking text that doesn't necessarily include that, or includes
     markup for directioanlity, and trying to get things to do the
     right thing

     jasnell: There are a number of consideratioins. If you have a
     name that's plain text and have these control characters, not
     every implementation is going to understand these control
     characters

     aaronpk: Having an extra property will need to be understood
     too

     jasnell: Control characters can work they just add extra
     complexity

     aaronpk: They have to support control characters anyway to
     suppoort bidi?

     aphilip: They ahve to support control characters to support
     unicode
     ... But again the question is where you get information in
     order to do an implementation when you're constructing text, we
     find that mostly markup generally works better than the
     invisible controls for helping people with authoring content

     aaronpk: Exaples of those?

     aphillip: Some in some articles.. the challenge is the controls
     are invisible
     ... Whereas markup is visible to people trying to get the right
     direction
     ... When somebody is authoring a tweet or an annotation on a
     document, they're not authoring markup the control characters
     are generated on the fly to get the display to look correct
     ... What the text direction property is to do is to capture the
     context of the text that's entered or selected
     ... If you snippet a piece of text from an html document, the
     base direction might be delcared as far back as the html
     element on a webpage
     ... The DOM structure knows what the base direction is and
     could populate a text direction property, even though there's
     no markup nearby on the text that's being snippeted

     aaronpk: the browser could also embed the control character in
     the text?

     aphillip: That's possible, but a more likely or simpler
     implementation is to take a piece of data you already have and
     apply it as metadata rather than having to mutate the text
     that's being clipped
     ... Or similarly if you write an android application you can
     know that your runtime environment is set to rtl and therefore
     the input control used to enter the text is a rtl base
     direction context
     ... not to say that you couldn't try to manage the control
     characters for the user, but you're interfering with their text
     by inserting or removing control characters based on the
     runtime context
     ... That's a reason why we might want to have a separate
     property
     ... That isnt' to say that we couldn't solve it by writing
     instructions instead that says your implementation must or must
     not include control characters in a particular way

     jasnell: Just to provide more context wrt the property
     approach. AS2 is a JSON based format. It is written to be
     compatible with or aligned with JSON-LD. While we can have
     objects, embedded and nested objects, there really is no
     concept of inheritance
     ... outside of the JSON-LD context within a document
     ... So if you have an object nested 3 or 4 levels deep it
     doesn't actually inherit the properties of its parent
     ... And these individual objects can be fro different sources,
     different authors
     ... In those cases declaring a base document level direction
     may not necessarily work, and we'd have to put the direction
     metadata at each object within that document
     ... So you potentially end up with multiple default direction
     properties throughout a single document

     aphillip: our best practice is to recommend that language and
     base direction information is associated with each object that
     could contain it,s o each can be set separately
     ... And it's also useful to have a document level way of saying
     the default to have a fallback so you don't have tos et it on
     every single thing
     ... JSON-LD itself doesn't provide any of this structuring

     jasnell: The point there is that adding.. I have no quarrel
     with adding this information as properties. The tradeoff though
     is that it does have a fairly significant complexity tradeoff
     for implementors. Ther'es also a backwards compat concern with
     as1
     ... Existing implementations, display name is a simple string
     as plain text without any language tagging or directionality
     ... We have made breaking changes from as1, so less of a
     concern now than it was before, but if we are going to provide
     this metadata we need to do it in a way that causes as little
     disrpution and complexity as possible

     aphillip: I think these are all optional properties, that's
     less intrusive than requiring implementations to do control
     character insertion?

     jasnell: It is less, we just need to be careful with the
     wording
     ... If we strongly recommend, it sends a signal. Implicit MUST

     r12a: I'm for the idea of putting the information in the text
     itself, and I've been trying hard to think of scenarios where
     having a separate direction property would be advantageous and
     I haven't come up witha lot, however there are a couple of
     situations that are worth mentioning
     ... james, you mentioned increased implementor complexity
     ... If you had typed the text in a field, and it knew its rtl
     because of the context from the html, the user wouldnt' type in
     any information to say this is rtl
     ... If you're working with first-strong heuristics you wouldn't
     need that
     ... But if you had started with @mention which is in latin,
     then unless you have some very special handling in the target
     to say that's a twitter handle and you should ignore it, then
     you're going to get a situation where the first-strong char is
     ltr when the resto fth emessage is rtl
     ... I don't think you could expect the users to say 'this is
     going to be wrong if it goes somewhere else'
     ... The user isn't going to think about or want to add control
     characters to do that
     ... You want to get the data that th eDOM knows about and apply
     that in some way to the text so it comes out appropriately
     ... Whether you do that by putting hte data into a property
     value or by changing the text I'm not sure which is best, but
     they both would invovle some additional complexity in terms of
     making sure that when that piece of text finially comes out
     somewhere there's informationa bout the default directionaly is
     expected to be
     ... The other issue that we have in AS2 which we didnt' have in
     WA, in WA each leaf in the object only has one text property,
     and therefore we had a direction property and a text property
     which were closely related
     ... in AS2 you can have map property with translations, summary
     and content in same object with only one direction property
     which would give a default for all of those strings which may
     be wrong
     ... that's an addtional problem with having a direction
     property

     cwebber: You were just saying that you weren't sure where it
     would be a difficultly to have markup
     ... I definitely want to support i18n. The clear case where
     it's problematic is titles, which are supposed to be just text,
     and possible to be rendered out of band, but very simply
     rendered. We don't want bold, we don't want links... just text
     ... We have one language to parse which is JSON and then you
     have another to parse which is HTML
     ... If you put HTML in a title element, it's difficult to parse
     in the first place. But it's also broad. If we permit links and
     bold and CSS in there, that's a lot more stuff to be concerned
     about than ... maybe we could reduce and say it's just <span>
     and that's all you're allowed to have
     ... Maybe that would work. It would reduce the scope, but would
     still be much more complex
     ... I have seen myself, sometimes people embed in RSS and atom
     readers you end up looking at blog entries and there are angle
     brackets rendered on the title, and I'm pretty sure that that's
     what will happen in our implementations
     ... I would like to support rtl stuff correctly, but that's why
     I feel this incling that having the control characters would be
     nicer
     ... But I have this itching feeling that we're going to end up
     with a lot of trouble if we permit html in this element

     tantek: +1 to what chris said. The only experience we have with
     formats that are not html but then try to do embedded markup
     have basically all been failures in terms of implementation
     support, interop, and dependability by anyone trying to use
     those
     ... the hypothesis that using nested html markup in json,
     theonly data we have when that hypothesis has been tested has
     shown that that is false
     ... That that solution does not work
     ... we have zero examples of that working
     ... Iw ould go so far as to say we MUST NOT add markup in these
     elements
     ... the control character approach I'm not as familar with
     ... but that seems to be a simpler solution to try
     ... Has anyone tried that and what are the results? I would
     defer to i18n for research on that

     r12a: Most people just want to type the text. Even fairly
     technical people who write in arabic or hebrew hate the control
     characters. They are hard to use. One of the problems is that
     they're invisible and you can never quite know whether you got
     it right
     ... If you try to edit something with the embedding, and you
     need start and end, and it gets really complicated

     tantek: my understanding is the same tools the user is using to
     input text would be generating the embedded markup
     ... No user would ever type in or see any control codes
     ... Nobody is advocating users typing control codes

     <r12a>
     [11]https://www.w3.org/International/wiki/Bidi_in_social_media

       [11] https://www.w3.org/International/wiki/Bidi_in_social_media

     r12a: most people don't have access to these control characters
     on their keyboard either. I did some testing ^
     ... If you put the rlm at the beginning and then try to make
     that work on twitter or facebook it doesn't actually work. They
     strip them out before posting the message

     <aphillip> most is probably too strong, but certainly mobile
     users

     r12a: There are all those disadvantages with control codes.
     What I wanted to understand was that there are properties like
     summary and content that can hold html. Where does that html
     come from? How do they end up with html in them?
     ... Maybe one answer is what you just said tantek, maybe it's
     created during the process of creating the text
     ... I was trying to understand.. people are not going to type
     in html either

     jasnell: If you look at like blog software, for the authoring
     UI they provide a plain text title field and a rich text or
     markup editor that allows the user to format the content
     ... The editing tool itself is providng the markup for those
     values
     ... THe title tends to be plain text, and that's what would end
     up in the name property
     ... Whereas a rich text editor would provide the values for
     summary and content

     r12a: I wonder how we would manage direction in that sort of
     context

     jasnell: I'm not aware of any rich text editors that have
     directionality as default option. If they do they would be
     markup oriented not control characters

     aphillip: there are a number. IN Arabic and hebrew context.
     Yahoo mail has controls for that
     ... Not necessarily obvious
     ... particulary to non-users of them

     jasnell: And they operate in terms of markup, setting the
     directional spans rather than using control characters

     aphillip: that's my understanding

     <KevinMarks> Hebrew and Arabic keyboards often have the
     relevant chars

     <aaronpk>
     [12]https://github.com/w3c/activitystreams/issues/338#issuecomm
     ent-237570361

       [12] 
https://github.com/w3c/activitystreams/issues/338#issuecomment-237570361

     tantek: I left a long but clear comment on the AS2 github
     ... tha'ts my last point, I have to leave

     jasnell: on the point of markup in name, and I made this in 338
     too, one of the primary points in use cases, the whole semantic
     of the name property, is to provide a reliably readable label
     for the object
     ... If some implementation for instance doesn't understand the
     object type, it would still have a relable fallback to use the
     label
     ... Allowing markup of any kind makes it more problematic and
     complicated
     ... We have to retain that ability in order for the open
     extensibility model to continue working as it has been
     ... That's something we cannot lose
     ... Thatw as the point, part of the earlier discussion

     aphillip: It's very hard to only permit limited forms of markup
     as well
     ... Once you kind of let some html in then you're kind of
     inviting a whole bunch of other html
     ... I don't think there's a lot of success in trying to limit
     what markup is applied
     ... It's not just bs and is and ems and strongs

     cwebber: I think, building off what James said, and what you
     just said, we have to assume that it's not possible to embed
     html in that name element. So what can we do given that it's
     really not possible?
     ... THere's a real semantic need to have a plain text name for
     that object which won't work if we have markup
     ... It seems the control characters, or an addtional property.
     Are there any other options?

     <KevinMarks_> ‏the vreating user agent can embed the control
     chars

     cwebber: We definitely want to support that, everybody wants
     this to work
     ... If we assume that markup is not possible, what can we do at
     this point?
     ... Can we simplify the conversation if we acknowledge that?

     aphillip: A property is supplying a base direction, I made that
     distinction early
     ... The base direction is not the same as providing inline
     controls to fix.. Richard has a whole bunch of examples.. text
     that needs help with multiple directions
     ... That's why we'd additionally need to look for control
     characters inside the text
     ... If you're going to have a plaintext string, you're still
     going to need control characters for perfect bidi

     jasnell: if we're not going to allow markup, to propertly
     support bidi the only way is to support control characters
     ... We do have the option right now in the json format to say
     name is an object, as an option, that has a direction and
     language property, and a value
     ... It's mroe complicated for implementors and consumers, but
     it does give us the option of declaring on a per-field basis
     without having to rely on markup
     ... What is the complexity tradeoff?

     <cwebber2> it would be possible, but a big headache to add that
     so late to all our activitystreams libraries

     aphillip: Can we describe rules for insertion and removal of
     control codes for the bidi
     ... Properties of the field... just the base direction that
     would be a property there... vs inline metadata

     <Zakim> aaronpk, you wanted to say I completely agree with
     tantek, and was never advocating that users type control
     characters themselves

     aaronpk: I'm not sure about the comment r12a made about me, I
     want to echo tantek earlier, I fully expect that the tools
     would be the ones adding the appropriate characters to the
     string, I'd never expect users to add that themselves
     ... My understanding is that the main reason html has a base
     dir property for elements is not so much so that the string
     itself is in the correct order, but that html elements can flow
     in the correct order

     aphillip: that's not correct
     ... It doesn't change the order in which the elements flow
     ... What it has to do with is how the text is processed for
     unicode base direction, but doesn't hcange what order the
     elements are presented in

     aaronpk: One reason that html needs the attribute is if you
     imagine a full width element, setting the base direction on
     that element means the text will appear on the right side of
     the screen. That won't happen in control characters..

     aphillip: that's not necessarily true

     aaronpk: html is describing the layout. In most of these json
     format we're not describing the layout, just the string
     ... we don't know what format it will be presented in
     ... html is specifically describing the presentation

     aphillip: I think that's an invalid reading of the use of dir

     <Steve_Atkin> I have to drop the call now.

     aphillip: It's the case that the dir attribute causes that kind
     of rtl display that a rtl user would expect. But it's also an
     inherant property of the text. the reason it doesn't live in
     CSS is because it's an inherant property of the text

     aaronpk: that's absolutely my point
     ... outside of the context of html, the text does not have an
     inherant presentation

     aphillip: we're not talking about presentation
     ... We're talking about if i Get a piece of text, I'm going to
     assume a base direction generally of ltr, and that will cause
     rtl text to display incorrectly

     r12a: aaron, there are two aspects of rendering
     ... One aspect is that if you know the base direction is rtl
     and you have "{arabic} w3c" that woudl determine where "w3c"
     goes in relation to the arabic text
     ... And another aspect is where the entire line of text appears
     on the page, against the left margin or against the right
     ... Sometimes you might want to sequence things rtl but keep
     them on the lefthand side
     ... If youlook at twitter and facebook dealing simply with
     strings and they detect rtl direction and they move it to the
     right side of the box. That's some processing their application
     does

     aaronpk: what I'm actually trying to say is that while html is
     describing the presentation of the whole rendering of the page,
     but AS2 does not talk about presentation at all. The
     presentation is left up to the consuming application. It feels
     wrong to use a mechanism that exists in a presentation format
     in a spec that does not talk about presentation

     aphillip: I think you're missing the point. There are two kinds
     of presentation
     ... One is what you're talking about, layout sand that sort of
     thing
     ... What html is concerned with
     ... But the data itself has a direction.. the example Richard
     gave is which side of the string do the letters "w3c" on the
     arabic, depends on the base direction of that text regardless
     of where you present it
     ... That's a property of the text, not a property of the
     presentation of the text
     ... same on a teletype, html, etc

     aaronpk: that's why I'm so interested in it being actually in
     the text, not as a property on the text

     r12a: aaron, I wanted to get some background infromation out.
     The problems we have with control characters may be something
     we have to deal with in applications rather than AS2
     ... I wanted to go back to the question chris said, what are
     the options here
     ... It seems to me that the options we are looking at currently
     are either if we know that the thing should be rtl that we
     stick a control char at the beginning of the string, or we
     stick it in an extra field
     ... I'm not sure that we're saying you sould necessarily have a
     direction property partly because it's not specific enough when
     we have multiple strings within one object
     ... I'm just saying I think we have two options
     ... We change the string, or we put some metadata alongside
     each specific string where needed

     aphillip: I think tha'ts what's necessary
     ... you can't have one text direction property that applies to
     six strings

     r12a: so which of those is the better approach

     cwebber: The control character at the start of the string will
     be fine, but having the additional metadata as a separate
     property... instead of having say name : "text" having name: {
     object } I think is going to screw up implementations just as
     much as having html in there
     ... Most of the fields in this can have html

     <cwebber2> {"name": "This is LTR", "nameDir": "ltr"}

     cwebber: The vast majority of the fields in which this applies
     is kind of a non-issue. Only name you can't
     ... So what if ^

     <KevinMarks_> ‮ inline works in reverse without implementers
     knowing

     cwebber: Just solve this for name or a few small fields where
     html is not permitted
     ... if an implementation doesn't know how to pay attention to
     nameDir they were going to fail anyway
     ... It will maybe hit the best middle ground

     <rhiaro> doesn't solve multiple directions in one name value
     though

     scribe: or stick a control character at the start

     <aaronpk> again you *need* to support control characters in
     strings in order to properly support bidirectional text (a
     string with text in both directions)

     KevinMarks: The advantage of doing it with injected control
     characters should work for anyone who is correctly using utf8
     ... whereas an extra property we're creating extra work for
     anyone creating and display
     ... in terms of most likely preservation of intent, putting it
     directly in the utf8 seems to be the strongest way to do that
     ... Maybe adding a note that creating user agents should do
     that

     <r12a>
     [13]https://www.w3.org/International/wiki/Bidi_in_social_media

       [13] https://www.w3.org/International/wiki/Bidi_in_social_media

     r12a: The additional wrinkle here, third thing at the bottom of
     that url
     ... There's a two line text input
     ... THe top line needs to be treated ltr and the second line is
     rtl
     ... If you don't do that then text is in the wrong place
     ... The rest of the stuff there shows that twitter and facebook
     don't manage this very well
     ... If the name property has multiple lines in it (haven't seen
     examples of that yet) then it's not just a question of sticking
     a control character at the beginning of the strong, it's
     putting it at the beginning of each line
     ... Same applies with summary and content where you have html

     <aphillip> line == paragraph

     r12a: Perhaps it's more likely, where you have multiple
     paragraphs
     ... You probably ought to establish the basedir for each
     paragraph
     ... Or you could put a wrapper around the whole thing like <div
     dir="rtl"
     ... There are intricacies in there I'm not terribly clear about

     jasnell: Whatever we do with the metadata, however we indicate
     this base direction, there is definitely a tradeoff cost
     ... We already have some complexity of name and nameMap
     ... I'm suspecting that the property approach is probably goign
     to be the most reliable for the base direction. Some
     combination of this property and the control codes
     ... But we need to take that time to balance the approach
     against existing complexity of name vs nameMap
     ... We should take our time, put together a proposal

     r12a: I'll try to provide some tests you can use

     jasnell: appreciate that

     aphillip: do you all want to come back next week? How shall we
     proceed?

     jasnell: works for me

     aphillip: I will reserve time next week to discuss language
     ... If there are proposals for how to discuss direction
     further, do we want to use a particular list or github issue
     for that discussion?
     ... Preferences?

     jasnell: if we can get a proposal in place by then we can
     discuss it then

     aphillip: it's taken years of our lives, so don't be
     surprised..

AOB?

     aphillip: thanks social, I'll reserve time next week

Summary of Action Items

Summary of Resolutions

     [End of minutes]

Received on Tuesday, 9 August 2016 12:20:39 UTC