W3C home > Mailing lists > Public > www-international@w3.org > January to March 2006

Re: [backstage] Persian news IM bot now on Jabber (and other updates)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Thu, 09 Mar 2006 12:36:51 +0900
Message-Id: <6.0.0.20.2.20060309114843.057f3920@localhost>
To: Dan Brickley <danbri@danbri.org>, www-international@w3.org
Cc: mmenti@gmail.com

This is interesting.

It very clearly shows that more thought should go into supporting
internationalization markup in all kinds of document or document-like
(in the sense that they use free text rather than data items) formats.

The only blog format that got that right (sic!) from the start is
Atom (http://www.ietf.org/rfc/rfc4287.txt). Elements such as
<title> all allow for embedded XHTML markup, which then can take
a dir attribute. RSS 1.0 has a content module that could do the
same thing, but I'm not sure how well it is supported.

Of course, the Atom way is not the only way of doing this;
it would be great if there were less overhead (e.g. not
requiring the <div> element). It might also be helpful
if the dir attribute can be inherited from an overall
element (e.g. for a feed that's all right-to-left (rtl).
But inheriting an attribute from outside into the namespace
it actually belongs may not be the best idea for robustness.

In Ian's article and in Mario's messages, there is also some
extent of confusion with regards to bidi. If the text in a
line or paragraph contains only rtl characters, or neutral
characters such as punctuation, any application is supposed
to display it in the correct order. No attributes are neccessary,
except for where to start the line (flush left or flush right),
which can be considered a matter of taste (in mixed English/
Farsi text, I wouldn't consider having all English messages
flush left and all Farsi messages flush right necessarily
always the best display) and which could be handled by a
switch in the user agent.

It's only when a line or paragraph mixes both rtl and ltr
text where having additional information becomes really
necessary, to indicate whether the text is a (e.g.) Farsi
sentence with some English embedded or the other way
round (or even a more complicated structure).

Regards,     Martin.


At 09:01 06/03/09, Dan Brickley wrote:
 >
 >Forwarding from the BBC backstage list; some interesting I18N issues
 >around bidi content in IM tools. See
 >http://www.flickr.com/photos/danbri/109247073/ for screenshot of the
 >MSN version of Mario's bot, and links to his earlier announcement.
 >This is a tool that exposes BBC Persian newsfeeds via MSN and Jabber.
 >Exercises some of the issues Ian Forrester talked about in
 >http://www.idealliance.org/proceedings/xtech05/papers/02-08-04/
 >
 >Dan
 >
 >----- Forwarded message from Mario Menti <mmenti@gmail.com> -----
 >
 >From: Mario Menti <mmenti@gmail.com>
 >Date: Wed, 8 Mar 2006 23:33:40 +0000
 >To: backstage@lists.bbc.co.uk
 >Cc: hoder@hoder.com
 >Subject: [backstage] Persian news IM bot now on Jabber (and other updates)
 >Message-ID: <2dd7faef0603081533h4ad87a59w9d3b54ff2732076b@mail.gmail.com>
 >Reply-To: backstage@lists.bbc.co.uk
 >
 >Hi all,
 >
 >here's a brief update on recent conversations and developments on the
 >BBCPersian.com news bot.
 >
 >1. A Jabber version of the bot is online at " bbcpersian@menti.name ".
 >
 >I tried it with a number of different Jabber clients, and "proper" support
 >for Arabic/Farsi and BiDi rendering seems rather patchy (although I have to
 >admit there is the possibility that my bot is not sending the correct info
 >to make this work, in which case I am grateful for suggestions...).
 >Here's the summary of some tests I did with Windows Jabber clients:
 >
 >- the one client that works beautifully, and handles BiDi correctly even in
 >messages with mixed English/Farsi text (i.e displays English LTR and Farsi
 >RTL):
 >       Gaim (I used v2.0 beta2)
 >
 >- clients that render the text ok, but don't orientate the overall message
 >RTL (so the text is OK, but not right-aligned):
 >       Psi, Exodus, Google Talk, Gajim, Pandion, meebo
 >
 >- clients that don't even seem to render the Farsi UTF-8 text properly
 >(although maybe there's some settings to fix this, I haven't spent too much
 >time with this):
 >      Miranda IM
 >
 >Haven't tried other clients or OSs yet.
 >
 >2. Support for other IM networks:
 >
 >Consensus appears to be that Yahoo IM is popular in Iran, so a Yahoo bot
 >would be useful. My current problem is that I can't find a working Yahoo
 >Messenger Protocol Perl module (the ones on CPAN seem out of date and not
 >functional). If anyone here knows of a working YIM Perl module, please let
 >me know (the bots are implemented in Perl).
 >
 >3. Push or Pull?
 >
 >When developing the previous (English) newsflash bot, I came across a
 >problem with the MSN switchboard when sending out a large number of
 >newsflash messages. I worked around it by staggering the sending of
 >messages, but the problem may well re-appear if large numbers of users make
 >use of the bot (on MSN at least). This problem is caused by the MSN bot
 >requesting too many active switchboards when it initiates the conversation
 >in order to send the newsflash, so is not an issue in a scenario where the
 >users initiate a chat with the bot (I had around 2000 users over a couple of
 >days on the BBC TV schedules bot after it was posted on Betanews and
 >Microsoft Watch, without any problems).
 >
 >In order to minimise potential issues like these (if there ever are large
 >numbers of users), it may make sense to turn the bot from "push" to "pull" -
 >i.e. you contact the bot to get the latest news, rather than the bot sending
 >you the news headlines on a regular basis. I'd be interested to hear other
 >people's opinions on this.
 >Thanks,
 >Mario.
 >
 >----- End forwarded message ----- 
Received on Thursday, 9 March 2006 04:58:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:06 GMT