- From: <ishida@w3.org>
- Date: Fri, 5 Aug 2016 13:31:28 +0100
- To: www International <www-international@w3.org>, public-socialweb@w3.org
https://www.w3.org/2016/08/04-i18n-minutes.html
text version follows:
Internationalization Working Group Teleconference
04 Aug 2016
[2]Agenda
[2]
https://lists.w3.org/Archives/Member/member-i18n-core/2016Aug/0001.html
See also: [3]IRC log
[3] http://www.w3.org/2016/08/04-i18n-irc
Attendees
Present
Addison, eprodrom, Steven, aaronpk, tantek, JcK,
jasnell, rhiaro, cwebber2, r12a, Francesco, Sandro
Regrets
Chair
Addison Phillips
Scribe
aphillip, rhiaro
Contents
* [4]Topics
1. [5]Agenda
2. [6]Discussion of ActivityStreams with Social WG
3. [7]AOB?
* [8]Summary of Action Items
* [9]Summary of Resolutions
__________________________________________________________
Agenda
Discussion of ActivityStreams with Social WG
r12a: One of the most urgent questions at the moment is how to
go about ensuring that directionality works. I think we should
not talk about language yet, but focus on text direction. There
are significatn differences between how language and direction
work
<aaronpk> +1
r12a: The key question is do we need a direction property to
capture the base direction of the text
... There are two aspects to specifying the base direction
... The overall default base for a paragraph or sequence of
paragraphs
... The other is directional changes inline
<Francesco> finally I can hear you - sorry
r12a: They're slightly different. certainly if we had a
direction property it would only be capable of describing the
default base direction for the paragraph as a whole and you'd
still need other mechanisms to indicate inline changes in
direction
... The AS2 spec, and micropub and webmention have the same
problem
... What we say for aS2 is probably relevant for them as well
... We'll talka bout AS2 to keep it simple
... Allows two types of message. One can take html markup and
one cannot
... There is another question, whether anything can be done
about that
... I think we should talk about that after the direction thing
... Should we capture the default base direction for the
paragraph in a separate property, or should we just rely on the
data in order to obtain that information?
... One way that you can do that by relying on the data is by
testing the first strong character in the text
... You miss out any weak or neutral characters until you find
a strong one, and if it's a rtl character you say the overall
direction for the paragraph is rtl
... That works a lot of the time and if you look at people's
twitter streams you see that most of the time it works okay,
because they tend to have just arabic text, or they have arabic
with some embedded latin, which either is handled specially by
twitter or is a simple embedding which doesn't produce any
problems
... But there are situations where that first-strong rule can
be duped
... Which is when you need to say explicitly no this is not
actually a ltr phrase even though it starts with a ltr strong
character
... Twitter actually doesn't use first strong characater to
determine
... It looks at the number of characters in one direction and
the number in another
... And the results due to that are unpredictable
... But that's the basic idea. If you use the text you can most
of the time figure out the direction, and the rest you need to
find a way to indicate it
... If you're dealing with markup you can add it there
... If you're dealing with name, you can add a lrm or rlm
character at the beginning of the text
... (control characters)
... Unfortunately things are not quite so simple as that. The
main question here is whether there is a value in having a
separate direction property
<Zakim> sandro, you wanted to ask about why directiion first,
since it seems derived
sandro: Does direction have to be managed separately from
language? I would naively assume that if I knew the primary
language of a text I'd know the primary direction of the text?
aphillip: direction has a weak relation to the language. And
language information isnt' always available or authoritative
sandro: The order of solving these is surprising to me. If we
solve the language problem we solve the direction problem?
various: no
<cwebber2> are there any languages that are both rtl and ltr?
aphillip: Sometimes you can use language information to help
infer the direction, but the direction you need in order to
process it for display. It has it's own structure and needs to
be managed in a particular way. Language does have some impact
on display, but that's generally processes that are done
separately from the bidi
... With language you're only inferring what the direction is
likely to be for a particular paragraph
<jasnell> there are fairly successful heuristic approaches to
guessing the directionality from language, but it's not
foolproof by any means
r12a: there are many languages written in both ltr and rtl
scripts
sandro: that makes sense
<aaronpk>
[10]https://www.w3.org/International/questions/qa-bidi-unicode-
controls
[10]
https://www.w3.org/International/questions/qa-bidi-unicode-controls
<aaronpk> "how these control characters"
aaronpk: I was reading the w3 guide on unicode controls and
from this, unfortunately there's no anchor, but if you search
for ^ you'll see the paragraph
... Speciically about the title attribute in html
... There's obviously no mechanism for base direction of an
attribute in html
Addison: You can set the base direction. You can't put markup
inside the title
aaronpk: The example given here is the example of mixed rtl and
ltr text where there isn't one being dominent because the text
is so short
... This seems to solve it
... I'm wondering why we can't just use this to sovle the
problem everywhere?
aphilip: If you want proper complete bidi layout then you need
to do other thing sin order to make that happen. That can
include using control characters
... A challenge is that in annotation cases you're going to be
taking text that doesn't necessarily include that, or includes
markup for directioanlity, and trying to get things to do the
right thing
jasnell: There are a number of consideratioins. If you have a
name that's plain text and have these control characters, not
every implementation is going to understand these control
characters
aaronpk: Having an extra property will need to be understood
too
jasnell: Control characters can work they just add extra
complexity
aaronpk: They have to support control characters anyway to
suppoort bidi?
aphilip: They ahve to support control characters to support
unicode
... But again the question is where you get information in
order to do an implementation when you're constructing text, we
find that mostly markup generally works better than the
invisible controls for helping people with authoring content
aaronpk: Exaples of those?
aphillip: Some in some articles.. the challenge is the controls
are invisible
... Whereas markup is visible to people trying to get the right
direction
... When somebody is authoring a tweet or an annotation on a
document, they're not authoring markup the control characters
are generated on the fly to get the display to look correct
... What the text direction property is to do is to capture the
context of the text that's entered or selected
... If you snippet a piece of text from an html document, the
base direction might be delcared as far back as the html
element on a webpage
... The DOM structure knows what the base direction is and
could populate a text direction property, even though there's
no markup nearby on the text that's being snippeted
aaronpk: the browser could also embed the control character in
the text?
aphillip: That's possible, but a more likely or simpler
implementation is to take a piece of data you already have and
apply it as metadata rather than having to mutate the text
that's being clipped
... Or similarly if you write an android application you can
know that your runtime environment is set to rtl and therefore
the input control used to enter the text is a rtl base
direction context
... not to say that you couldn't try to manage the control
characters for the user, but you're interfering with their text
by inserting or removing control characters based on the
runtime context
... That's a reason why we might want to have a separate
property
... That isnt' to say that we couldn't solve it by writing
instructions instead that says your implementation must or must
not include control characters in a particular way
jasnell: Just to provide more context wrt the property
approach. AS2 is a JSON based format. It is written to be
compatible with or aligned with JSON-LD. While we can have
objects, embedded and nested objects, there really is no
concept of inheritance
... outside of the JSON-LD context within a document
... So if you have an object nested 3 or 4 levels deep it
doesn't actually inherit the properties of its parent
... And these individual objects can be fro different sources,
different authors
... In those cases declaring a base document level direction
may not necessarily work, and we'd have to put the direction
metadata at each object within that document
... So you potentially end up with multiple default direction
properties throughout a single document
aphillip: our best practice is to recommend that language and
base direction information is associated with each object that
could contain it,s o each can be set separately
... And it's also useful to have a document level way of saying
the default to have a fallback so you don't have tos et it on
every single thing
... JSON-LD itself doesn't provide any of this structuring
jasnell: The point there is that adding.. I have no quarrel
with adding this information as properties. The tradeoff though
is that it does have a fairly significant complexity tradeoff
for implementors. Ther'es also a backwards compat concern with
as1
... Existing implementations, display name is a simple string
as plain text without any language tagging or directionality
... We have made breaking changes from as1, so less of a
concern now than it was before, but if we are going to provide
this metadata we need to do it in a way that causes as little
disrpution and complexity as possible
aphillip: I think these are all optional properties, that's
less intrusive than requiring implementations to do control
character insertion?
jasnell: It is less, we just need to be careful with the
wording
... If we strongly recommend, it sends a signal. Implicit MUST
r12a: I'm for the idea of putting the information in the text
itself, and I've been trying hard to think of scenarios where
having a separate direction property would be advantageous and
I haven't come up witha lot, however there are a couple of
situations that are worth mentioning
... james, you mentioned increased implementor complexity
... If you had typed the text in a field, and it knew its rtl
because of the context from the html, the user wouldnt' type in
any information to say this is rtl
... If you're working with first-strong heuristics you wouldn't
need that
... But if you had started with @mention which is in latin,
then unless you have some very special handling in the target
to say that's a twitter handle and you should ignore it, then
you're going to get a situation where the first-strong char is
ltr when the resto fth emessage is rtl
... I don't think you could expect the users to say 'this is
going to be wrong if it goes somewhere else'
... The user isn't going to think about or want to add control
characters to do that
... You want to get the data that th eDOM knows about and apply
that in some way to the text so it comes out appropriately
... Whether you do that by putting hte data into a property
value or by changing the text I'm not sure which is best, but
they both would invovle some additional complexity in terms of
making sure that when that piece of text finially comes out
somewhere there's informationa bout the default directionaly is
expected to be
... The other issue that we have in AS2 which we didnt' have in
WA, in WA each leaf in the object only has one text property,
and therefore we had a direction property and a text property
which were closely related
... in AS2 you can have map property with translations, summary
and content in same object with only one direction property
which would give a default for all of those strings which may
be wrong
... that's an addtional problem with having a direction
property
cwebber: You were just saying that you weren't sure where it
would be a difficultly to have markup
... I definitely want to support i18n. The clear case where
it's problematic is titles, which are supposed to be just text,
and possible to be rendered out of band, but very simply
rendered. We don't want bold, we don't want links... just text
... We have one language to parse which is JSON and then you
have another to parse which is HTML
... If you put HTML in a title element, it's difficult to parse
in the first place. But it's also broad. If we permit links and
bold and CSS in there, that's a lot more stuff to be concerned
about than ... maybe we could reduce and say it's just <span>
and that's all you're allowed to have
... Maybe that would work. It would reduce the scope, but would
still be much more complex
... I have seen myself, sometimes people embed in RSS and atom
readers you end up looking at blog entries and there are angle
brackets rendered on the title, and I'm pretty sure that that's
what will happen in our implementations
... I would like to support rtl stuff correctly, but that's why
I feel this incling that having the control characters would be
nicer
... But I have this itching feeling that we're going to end up
with a lot of trouble if we permit html in this element
tantek: +1 to what chris said. The only experience we have with
formats that are not html but then try to do embedded markup
have basically all been failures in terms of implementation
support, interop, and dependability by anyone trying to use
those
... the hypothesis that using nested html markup in json,
theonly data we have when that hypothesis has been tested has
shown that that is false
... That that solution does not work
... we have zero examples of that working
... Iw ould go so far as to say we MUST NOT add markup in these
elements
... the control character approach I'm not as familar with
... but that seems to be a simpler solution to try
... Has anyone tried that and what are the results? I would
defer to i18n for research on that
r12a: Most people just want to type the text. Even fairly
technical people who write in arabic or hebrew hate the control
characters. They are hard to use. One of the problems is that
they're invisible and you can never quite know whether you got
it right
... If you try to edit something with the embedding, and you
need start and end, and it gets really complicated
tantek: my understanding is the same tools the user is using to
input text would be generating the embedded markup
... No user would ever type in or see any control codes
... Nobody is advocating users typing control codes
<r12a>
[11]https://www.w3.org/International/wiki/Bidi_in_social_media
[11] https://www.w3.org/International/wiki/Bidi_in_social_media
r12a: most people don't have access to these control characters
on their keyboard either. I did some testing ^
... If you put the rlm at the beginning and then try to make
that work on twitter or facebook it doesn't actually work. They
strip them out before posting the message
<aphillip> most is probably too strong, but certainly mobile
users
r12a: There are all those disadvantages with control codes.
What I wanted to understand was that there are properties like
summary and content that can hold html. Where does that html
come from? How do they end up with html in them?
... Maybe one answer is what you just said tantek, maybe it's
created during the process of creating the text
... I was trying to understand.. people are not going to type
in html either
jasnell: If you look at like blog software, for the authoring
UI they provide a plain text title field and a rich text or
markup editor that allows the user to format the content
... The editing tool itself is providng the markup for those
values
... THe title tends to be plain text, and that's what would end
up in the name property
... Whereas a rich text editor would provide the values for
summary and content
r12a: I wonder how we would manage direction in that sort of
context
jasnell: I'm not aware of any rich text editors that have
directionality as default option. If they do they would be
markup oriented not control characters
aphillip: there are a number. IN Arabic and hebrew context.
Yahoo mail has controls for that
... Not necessarily obvious
... particulary to non-users of them
jasnell: And they operate in terms of markup, setting the
directional spans rather than using control characters
aphillip: that's my understanding
<KevinMarks> Hebrew and Arabic keyboards often have the
relevant chars
<aaronpk>
[12]https://github.com/w3c/activitystreams/issues/338#issuecomm
ent-237570361
[12]
https://github.com/w3c/activitystreams/issues/338#issuecomment-237570361
tantek: I left a long but clear comment on the AS2 github
... tha'ts my last point, I have to leave
jasnell: on the point of markup in name, and I made this in 338
too, one of the primary points in use cases, the whole semantic
of the name property, is to provide a reliably readable label
for the object
... If some implementation for instance doesn't understand the
object type, it would still have a relable fallback to use the
label
... Allowing markup of any kind makes it more problematic and
complicated
... We have to retain that ability in order for the open
extensibility model to continue working as it has been
... That's something we cannot lose
... Thatw as the point, part of the earlier discussion
aphillip: It's very hard to only permit limited forms of markup
as well
... Once you kind of let some html in then you're kind of
inviting a whole bunch of other html
... I don't think there's a lot of success in trying to limit
what markup is applied
... It's not just bs and is and ems and strongs
cwebber: I think, building off what James said, and what you
just said, we have to assume that it's not possible to embed
html in that name element. So what can we do given that it's
really not possible?
... THere's a real semantic need to have a plain text name for
that object which won't work if we have markup
... It seems the control characters, or an addtional property.
Are there any other options?
<KevinMarks_> the vreating user agent can embed the control
chars
cwebber: We definitely want to support that, everybody wants
this to work
... If we assume that markup is not possible, what can we do at
this point?
... Can we simplify the conversation if we acknowledge that?
aphillip: A property is supplying a base direction, I made that
distinction early
... The base direction is not the same as providing inline
controls to fix.. Richard has a whole bunch of examples.. text
that needs help with multiple directions
... That's why we'd additionally need to look for control
characters inside the text
... If you're going to have a plaintext string, you're still
going to need control characters for perfect bidi
jasnell: if we're not going to allow markup, to propertly
support bidi the only way is to support control characters
... We do have the option right now in the json format to say
name is an object, as an option, that has a direction and
language property, and a value
... It's mroe complicated for implementors and consumers, but
it does give us the option of declaring on a per-field basis
without having to rely on markup
... What is the complexity tradeoff?
<cwebber2> it would be possible, but a big headache to add that
so late to all our activitystreams libraries
aphillip: Can we describe rules for insertion and removal of
control codes for the bidi
... Properties of the field... just the base direction that
would be a property there... vs inline metadata
<Zakim> aaronpk, you wanted to say I completely agree with
tantek, and was never advocating that users type control
characters themselves
aaronpk: I'm not sure about the comment r12a made about me, I
want to echo tantek earlier, I fully expect that the tools
would be the ones adding the appropriate characters to the
string, I'd never expect users to add that themselves
... My understanding is that the main reason html has a base
dir property for elements is not so much so that the string
itself is in the correct order, but that html elements can flow
in the correct order
aphillip: that's not correct
... It doesn't change the order in which the elements flow
... What it has to do with is how the text is processed for
unicode base direction, but doesn't hcange what order the
elements are presented in
aaronpk: One reason that html needs the attribute is if you
imagine a full width element, setting the base direction on
that element means the text will appear on the right side of
the screen. That won't happen in control characters..
aphillip: that's not necessarily true
aaronpk: html is describing the layout. In most of these json
format we're not describing the layout, just the string
... we don't know what format it will be presented in
... html is specifically describing the presentation
aphillip: I think that's an invalid reading of the use of dir
<Steve_Atkin> I have to drop the call now.
aphillip: It's the case that the dir attribute causes that kind
of rtl display that a rtl user would expect. But it's also an
inherant property of the text. the reason it doesn't live in
CSS is because it's an inherant property of the text
aaronpk: that's absolutely my point
... outside of the context of html, the text does not have an
inherant presentation
aphillip: we're not talking about presentation
... We're talking about if i Get a piece of text, I'm going to
assume a base direction generally of ltr, and that will cause
rtl text to display incorrectly
r12a: aaron, there are two aspects of rendering
... One aspect is that if you know the base direction is rtl
and you have "{arabic} w3c" that woudl determine where "w3c"
goes in relation to the arabic text
... And another aspect is where the entire line of text appears
on the page, against the left margin or against the right
... Sometimes you might want to sequence things rtl but keep
them on the lefthand side
... If youlook at twitter and facebook dealing simply with
strings and they detect rtl direction and they move it to the
right side of the box. That's some processing their application
does
aaronpk: what I'm actually trying to say is that while html is
describing the presentation of the whole rendering of the page,
but AS2 does not talk about presentation at all. The
presentation is left up to the consuming application. It feels
wrong to use a mechanism that exists in a presentation format
in a spec that does not talk about presentation
aphillip: I think you're missing the point. There are two kinds
of presentation
... One is what you're talking about, layout sand that sort of
thing
... What html is concerned with
... But the data itself has a direction.. the example Richard
gave is which side of the string do the letters "w3c" on the
arabic, depends on the base direction of that text regardless
of where you present it
... That's a property of the text, not a property of the
presentation of the text
... same on a teletype, html, etc
aaronpk: that's why I'm so interested in it being actually in
the text, not as a property on the text
r12a: aaron, I wanted to get some background infromation out.
The problems we have with control characters may be something
we have to deal with in applications rather than AS2
... I wanted to go back to the question chris said, what are
the options here
... It seems to me that the options we are looking at currently
are either if we know that the thing should be rtl that we
stick a control char at the beginning of the string, or we
stick it in an extra field
... I'm not sure that we're saying you sould necessarily have a
direction property partly because it's not specific enough when
we have multiple strings within one object
... I'm just saying I think we have two options
... We change the string, or we put some metadata alongside
each specific string where needed
aphillip: I think tha'ts what's necessary
... you can't have one text direction property that applies to
six strings
r12a: so which of those is the better approach
cwebber: The control character at the start of the string will
be fine, but having the additional metadata as a separate
property... instead of having say name : "text" having name: {
object } I think is going to screw up implementations just as
much as having html in there
... Most of the fields in this can have html
<cwebber2> {"name": "This is LTR", "nameDir": "ltr"}
cwebber: The vast majority of the fields in which this applies
is kind of a non-issue. Only name you can't
... So what if ^
<KevinMarks_> inline works in reverse without implementers
knowing
cwebber: Just solve this for name or a few small fields where
html is not permitted
... if an implementation doesn't know how to pay attention to
nameDir they were going to fail anyway
... It will maybe hit the best middle ground
<rhiaro> doesn't solve multiple directions in one name value
though
scribe: or stick a control character at the start
<aaronpk> again you *need* to support control characters in
strings in order to properly support bidirectional text (a
string with text in both directions)
KevinMarks: The advantage of doing it with injected control
characters should work for anyone who is correctly using utf8
... whereas an extra property we're creating extra work for
anyone creating and display
... in terms of most likely preservation of intent, putting it
directly in the utf8 seems to be the strongest way to do that
... Maybe adding a note that creating user agents should do
that
<r12a>
[13]https://www.w3.org/International/wiki/Bidi_in_social_media
[13] https://www.w3.org/International/wiki/Bidi_in_social_media
r12a: The additional wrinkle here, third thing at the bottom of
that url
... There's a two line text input
... THe top line needs to be treated ltr and the second line is
rtl
... If you don't do that then text is in the wrong place
... The rest of the stuff there shows that twitter and facebook
don't manage this very well
... If the name property has multiple lines in it (haven't seen
examples of that yet) then it's not just a question of sticking
a control character at the beginning of the strong, it's
putting it at the beginning of each line
... Same applies with summary and content where you have html
<aphillip> line == paragraph
r12a: Perhaps it's more likely, where you have multiple
paragraphs
... You probably ought to establish the basedir for each
paragraph
... Or you could put a wrapper around the whole thing like <div
dir="rtl"
... There are intricacies in there I'm not terribly clear about
jasnell: Whatever we do with the metadata, however we indicate
this base direction, there is definitely a tradeoff cost
... We already have some complexity of name and nameMap
... I'm suspecting that the property approach is probably goign
to be the most reliable for the base direction. Some
combination of this property and the control codes
... But we need to take that time to balance the approach
against existing complexity of name vs nameMap
... We should take our time, put together a proposal
r12a: I'll try to provide some tests you can use
jasnell: appreciate that
aphillip: do you all want to come back next week? How shall we
proceed?
jasnell: works for me
aphillip: I will reserve time next week to discuss language
... If there are proposals for how to discuss direction
further, do we want to use a particular list or github issue
for that discussion?
... Preferences?
jasnell: if we can get a proposal in place by then we can
discuss it then
aphillip: it's taken years of our lives, so don't be
surprised..
AOB?
aphillip: thanks social, I'll reserve time next week
Summary of Action Items
Summary of Resolutions
[End of minutes]
Received on Friday, 5 August 2016 12:31:43 UTC