- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Tue, 13 Dec 2005 14:21:52 +0000
- To: Ian Forrester <ian.a.forrester@googlemail.com>
- CC: Martin Duerst <duerst@it.aoyama.ac.jp>, "John.Cowan" <jcowan@reutershealth.com>, Misha Wolf <Misha.Wolf@reuters.com>, www-international@w3.org
Ian Forrester wrote: > > > On 12/12/05, *Jeremy Carroll* <jjc@hpl.hp.com <mailto:jjc@hpl.hp.com>> > wrote: > > > Notionally, RSS 1.0 is RDF/XML and an example of XHTML in RDF/XML is > relevant. > > Here is one that I made earlier (as they say in Blue Peter - UK specific > joke - actually Martin helped make it): > > http://www.w3.org/TR/owl-test/misc-200-xmlliteral#misc-200-xmlliteral > > (the XHTML2 has Japanese text, so only displays if you have the fonts) > > Jeremy > > > > Thank you Jeremy, > > > So if I was to quickly write a subset of RSS based on what I've seen it > would look something like this? > > <item> > <title>a title in english</title> > <link> > http://www.cubicgarden.com/some/unique/link</link> > <dc:date>2005-12-13</dc:date> > <description rdf:parseType="Literal" ><span xml:lang="en-uk">some british english text with some > > example persian text</span> <span xml:lang="fa">امام جمعه موقت جديد > تهران عضو هيئت رئيسه مجلس خبرگان Ùˆ از روØانيون مشهور به هوادارای از > > گروههای اصولگرای اÙراطی در ايران Ùˆ همچنين جوانترين امام جمعه تهران > است.</span></description> > </item> > > I think this is fine but maybe there are other solutions which do not > > require wrapping spans around all the text? No. In RDF either a literal is a plain literal, which is a string with a language tag, but, for instance, there is no possibility of indicating a direction; or it is an XML literal, without a language tag, and any xml:lang value needs to be embedded within the literal, hence the xhtml:span's in the example. > I'm also interested in how > we would indicate the right to left persian text if the feed is not > Unicode or has no Unicode control characters? > The feed is Unicode. It is XML (in all variants of RSS). An XML file may superficially be encoded in some encoding maybe iso-8859-1 or utf-8, this refers to the bytes in the file. The XML document though is a sequence of unicode characters that these bytes represent using the encoding mechanism. i.e. every XML file is Unicode, but maybe encoded differently. So in an arbitrary XML file, a NCR can be used to represent the special bidi control characters. These NCRs are written with the unicode character numbers. The other characters are converted into unicode characters when transforming the XML file into a DOM say (obviously we might not do this step, but when correctly handling an XML file in some other way, the behaviour is logically equivalent). So that what we have in the XML is a sequence of Unicode characters, each of which maps onto one or more bytes of the file on the disk. Jeremy > Thank you Felix, this wiki page was very useful - > > http://esw.w3.org/topic/its0505ReqBidi > > It would seem using a combination of the dir and > xml:lang attributes > would be enough to specify all languages in RSS/ATOM? > > Regards, > > Ian Forrester >
Received on Tuesday, 13 December 2005 14:22:30 UTC