Re: Specifing language direction in RDF from Lars Marius Garshol on 2002-03-02 (www-rdf-interest@w3.org from March 2002)

From: Lars Marius Garshol <larsga@garshol.priv.no>
Date: 02 Mar 2002 12:22:35 +0100
To: Chris Croome <chris@webarchitects.co.uk>
Cc: www-rdf-interest@w3.org
Message-ID: <m3u1rzfc3o.fsf@pc36.avidiaasen.online.no>
* Lars Marius Garshol
|
| So it seems that you have a property derived from the XHTML 'dir'
| attribute which associates the string "ltr" with your abstract.
| Formally that looks like perfectly OK RDF to me. The property you
| use also seems reasonable, though http://www.w3.org/1999/xhtml#dir
| might be better.

* Chris Croome
| 
| Yes I did wonder about that, http://www.w3.org/1999/xhtmldir, seems
| wrong, but can/should one use http://www.w3.org/1999/xhtml# for the
| XHTML namespace?

Well, you are translating between two different systems. In XML
Namespaces the xhtml:dir attribute has no defined URI, but in RDF you
need a URI in order to create a property for this. Putting in a hash
mark seems the most reasonable approach to me.
 
* Lars Marius Garshol
|
| An obvious question is whether there's any point in doing this at all.
| English written in the latin script is *always* LTR, so you are adding
| no useful information.

* Chris Croome
|
| The reason that I was having a play with ltr, rather than rtl, is
| that I can't write any rtl languages...

Even if you had written in an RTL script (languages have no
directionality, only the scripts used to write them) you wouldn't have
needed this, as RTL scripts are *always* RTL.

It's only when you mix scripts with different directionality that you
need to specify the base direction of the text. (As my other posting
explains in more detail.)
 
| I'm anticipating that I'm going to be asked by a client to set up a
| web site with Arabic and Urdu content and this will result in Dublin
| Core RDF metadata files that have a mixture of directionality and
| wasn't sure how to implement the advice from the W3C and Unicode
| consortium [1], thats why I posted to this list.

Then I understand. If this content only contains Arabic/Urdu and no
latin or other LTR text then you shouldn't need to do anything special
with the RDF.

If it does contain bidirectional text what you will need for this to
work is to capture any specifications of base direction for text that
you want to become RDF literals, and also to turn any elements used to
specify embedding levels (that is, elements with xhtml:dir attributes
below paragraph level) into Unicode control codes.

Capturing the base direction can be done either with an RDF property,
or by inserting a right-to-left/left-to-right mark at the beginning of
the text.
 
| The reason I wasn't sure that specifing the directionality was
| unnecessary was from an experiment in which I took a Hebrew [2] and
| a Farsi [3] file from the Unicode web site, removed all HTML and CSS
| directionality markup and then opened them in mozilla, the Farsi one
| still displayed the text correctly but the Hebrew one was backwards.

If you don't explicitly set base direction in HTML/XHTML the browsers
assume that the base direction is LTR. They don't do this text
analysis that I discussed in the other email. This is acceptable
according to the Unicode standard, but it's not obvious that RDF
software will behave the same way. In fact, I think RDF software
shouldn't.  It seems much more reasonable for RDF software to analyze
the literals and get the base direction that way.

I suspect that most RDF software does nothing at all in order to
support bidi correctly, and in that case indicating base direction
using RLM/LRM codes seems safer than using a property. (It is also a
lot simpler, since you no longer have to reify anything.)

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >
Received on Saturday, 2 March 2002 06:23:17 UTC