RE: Summary of bidi discussion with Social Web WG

Hi Richard,

I put a summary into the open issue in their github and I have to admit to being confused.

One of the things I pointed out in my summary is that this conversation is not consistent with the items being discussed. The fields in question are 'processingLanguage' and 'textDirection' attributes in the ActivityStreams spec. Those attributes relate to *external* files, including text files, rather than to text contained in the actual annotation. My confusion is that there are actually votes to close the issue based on my summary... but the discussion you encountered had nothing to do with external resources/files.

I do agree that natural language text inside an annotation also needs language and base direction properties, of course. The problem that JSON doesn't provide content metadata for natural language strings (and thus document formats based on JSON have to define and provide this as [frequently quite ugly] additional attributes or extensions) is a real one. Many implementations have to deal with this or with the poor results that obtain when you don't have this information.

First strong is certainly better than no analysis for bidi. Statistical detection of language is better than no language information. But not building capabilities into document formats because algorithms like this "are better than nothing" is about like saying "there are ways to transcribe all scripts into the Latin script, so Unicode support is unnecessary". We don't really want a "separate direction property", but we do need direction (and language) metadata for natural language content in document formats (including JSON-based ones) lest we institutionalize a crappy experience.

My 2p.

Addison

Addison Phillips
Principal SDE, I18N Architect (Amazon)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.




> -----Original Message-----
> From: ishida@w3.org [mailto:ishida@w3.org]
> Sent: Wednesday, August 03, 2016 4:43 AM
> To: public-i18n-core@w3.org; public-i18n-bidi@w3.org
> Subject: Summary of bidi discussion with Social Web WG
> 
> In preparation for the discussion tomorrow (Thursday) during the i18n
> telecon, here are the minutes of the discussion held yesterday (Tuesday)
> with the Social Web group.
> 
> https://www.w3.org/2016/08/02-social-minutes.html#item07

> 
> I wasn't invited to speak until 5 minutes before the end, and the discussion
> wasn't as productive as i hoped.  A lot of the feedback centred around dislike
> of the idea of using a separate direction property to set the default base
> direction (which actually i wasn't recommending, it was just one possibility on
> the table).  There was strong preference for first-strong detection coupled
> with Unicode control codes for problem cases for plain text strings (eg.
> name), and presumably first-strong detection for default paragraph direction
> when using markup (i guess in the absence of markup to the contrary, but
> that wasn't discussed).  Grounds for pushback mainly centred on the
> supposition that there are no APIs out there that do that.
> 
> So a key question for Thursday is whether anyone sees any advantages in
> using a separate direction property.  Would first-strong detection coupled
> with control code/markup for tricky cases be sufficient?  To my mind, this
> may be ok for plain text, although there appears to be a problem that people
> can often not access the control codes (and when they can, not easily).  That
> may need fixes to keyboards, however, rather than to the model.
> 
> For marked up text, i suspect that the spec needs to be a little more careful
> in the way it indicates how the default direction should be established for
> paragraphs.  If the paragraph starts with <p dir="rtl"> then first-strong should
> be not used.
> 
> Btw, i put together some tests for Twitter and Facebook that look at various
> problem situations and show the results.  See https://github.com/w3c/i18n-

> activity/wiki/Bidi-handling-in-Facebook-and-Twitter
> 
> ri
> 
> 
> 
> 

Received on Wednesday, 3 August 2016 23:28:18 UTC