Re: I18N attributes in RSS 1.0

Ian Forrester wrote:
> 
> 
> On 12/12/05, *Jeremy Carroll* <jjc@hpl.hp.com <mailto:jjc@hpl.hp.com>> 
> wrote:
> 
> 
>     Notionally, RSS 1.0 is RDF/XML and an example of XHTML in RDF/XML is
>     relevant.
> 
>     Here is one that I made earlier (as they say in Blue Peter - UK specific
>     joke - actually Martin helped make it):
> 
>     http://www.w3.org/TR/owl-test/misc-200-xmlliteral#misc-200-xmlliteral
> 
>     (the XHTML2 has Japanese text, so only displays if you have the fonts)
> 
>     Jeremy
> 
> 
> 
> Thank you Jeremy,
> 
> 
> So if I was to quickly write a subset of RSS based on what I've seen it 
> would look something like this?
> 
> <item>
> <title>a title in english</title>
> <link>
> http://www.cubicgarden.com/some/unique/link</link>
> <dc:date>2005-12-13</dc:date>
> <description

rdf:parseType="Literal" ><span xml:lang="en-uk">some british english 
text with some
> 
> example persian text</span> <span xml:lang="fa">امام جمعه موقت جديد 
> تهران عضو هيئت رئيسه مجلس خبرگان و از روحانيون مشهور به هوادارای از 
> 
> گروههای اصولگرای افراطی در ايران و همچنين جوانترين امام جمعه تهران 
> است.</span></description>
> </item>
> 
> I think this is fine but maybe there are other solutions which do not 
> 
> require wrapping spans around all the text? 

No. In RDF either a literal is a plain literal, which is a string with a 
language tag, but, for instance, there is no possibility of indicating a 
direction; or it is an XML literal, without a language tag, and any 
xml:lang value needs to be embedded within the literal, hence the 
xhtml:span's in the example.


> I'm also interested in how 
> we would indicate the right to left persian text if the feed is not 
> Unicode or has no Unicode control characters?
> 

The feed is Unicode.
It is XML (in all variants of RSS). An XML file may superficially be 
encoded in some encoding maybe iso-8859-1 or utf-8, this refers to the 
bytes in the file. The XML document though is a sequence of unicode 
characters that these bytes represent using the encoding mechanism. i.e. 
every XML file is Unicode, but maybe encoded differently.

So in an arbitrary XML file, a NCR can be used to represent the special 
bidi control characters. These NCRs are written with the unicode 
character numbers. The other characters are converted into unicode 
characters when transforming the XML file into a DOM say (obviously we 
might not do this step, but when correctly handling an XML file in some 
other way, the behaviour is logically equivalent). So that what we have 
in the XML is a sequence of Unicode characters, each of which maps onto 
one or more bytes of the file on the disk.

Jeremy

> Thank you Felix, this wiki page was very useful - 
> 
> http://esw.w3.org/topic/its0505ReqBidi
> 
> It would seem using a combination of the dir and 
> xml:lang attributes 
> would be enough to specify all languages in RSS/ATOM?
> 
> Regards,
> 
> Ian Forrester
> 

Received on Tuesday, 13 December 2005 14:22:30 UTC