W3C home > Mailing lists > Public > www-international@w3.org > October to December 2011

Re: Possible issues with the CLDR quote marks info

From: Matitiahu Allouche <matial@il.ibm.com>
Date: Tue, 8 Nov 2011 15:59:58 +0200
To: Richard Ishida <ishida@w3.org>
Cc: www International <www-international@w3.org>
Message-ID: <OFC6A8F23D.370A9600-ONC2257942.004C1446-C2257942.004CEF16@il.ibm.com>
Here is feedback that I received from 2 colleagues about the quotes used 
in Hebrew.

=================================================================

According to late discussions on keyboard, Hebrew quotation marks should 
be paired in what foreigners would find a surprising way:

201e 201d 201a 2019

(right-low-9-quotes, right-quotes)

The current row reads

:lang(he) { quotes: '\201c' '\201d' '\0022' '\0022'; }

For sure, the use of 0022 (ANSI quotes) as secondary quotes in Hebrew is 
wrong.


Shai.

=================================================================


The current definition for Hebrew is probably wrong.

The source is the CLDR. CLDR listing for Hebrew says that the values
for quotationEnd and quotationStart are "draft=contributed" and i
wonder who contributed them.

To the best of my understanding, the Hebrew Language Academy
best-practice recommendation [1] is:
quotationStart: „ (201e)
quotationEnd: ” (201d)

The recommended characters for a quotation inside a quotation are:
opening: ‚ (201a)
closing: ’ (2019)

(Sorry, i'm not sure about the correct CLDR names for this.)

That's what i suggested for the keyboard standard, too.

These quotation marks were actually used quite frequently in older
printed books in Hebrew and the Academy still defines this as the best
practice.

Of course, the most common practice for daily writing (emails etc.) is
to use " and ' for both opening and closing quotation marks. Most
professionally printed books and journals probably use ” (201d) and ’
(2019) for both opening and closing, but occasionally i see the use of
lower quotation marks, too (and not just in my own blog!). Using “
(201c) anywhere is neither common nor recommended by the Academy.

Another comment is that Apple devices such as iPhone use ׳ (geresh,
05f3) and ״ (gershayim, 05f4) for quotation marks, which is
interesting, but not quite right. They should be used for acronyms and
pronunciation marks (like in ג׳ורג׳ and מנכ״ל). Again, the most common
practice in other consumer devices is to use " and ', because that's
what most keyboards have. I'm not even sure that geresh and gershayim
these should be anywhere in the CLDR.

Please correct me if i'm mistaken about anything.

Richard also asked about Arabic typography. I'm quite sure that the
current is wrong about Arabic, but an Arabic typography expert should
be consulted on that matter.

[1] http://hebrew-academy.huji.ac.il/hahlatot/Punctuation/Pages/P31.aspx


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com


=================================================================

Shalom (Regards),  Mati
       Bidi Architect
       Globalization Center Of Competency - Bidirectional Scripts
       IBM Israel
       Mobile: +972 52 2554160


> 
> On 2011/11/05 0:33, Richard Ishida wrote:
>> Before I send a note to the Unicode Consortium, I thought I'd check for
>> feedback here. Looking through the list of quotation marks that Ian
>> Hickson {1} just added to the HTML5 spec I noticed one or two things
>> that look like anomalies (in the Unicode data). (That table is 
generated
>> automatically from the CLDR XML files.)
>> 
>> [1] A couple of locales have non-paired punctuation marks for secondary
>> quotes. They are af and tg. tg is not yet confirmed, but af is. Is this
>> really correct?
>> 
>> [2] The arabic entry has the following:
>> 
>> '\201c' '\201d' '\2018' '\2019'
>> 
>> ie.
>> 
>> ³ U+201C LEFT DOUBLE QUOTATION MARK
>> ² U+201D RIGHT DOUBLE QUOTATION MARK
>> ΠU+2018 LEFT SINGLE QUOTATION MARK
>> ¹ U+2019 RIGHT SINGLE QUOTATION MARK
>> 
>> which corresponds to
>> 
>> quotationStart quotationEnd alternateQuotationStart 
alternateQuotationEnd
>> 
>> I think this is wrong. Since these are not mirrored characters in
>> Unicode, surely the order should be
>> 
>> ² U+201D RIGHT DOUBLE QUOTATION MARK
>> ³ U+201C LEFT DOUBLE QUOTATION MARK
>> ¹ U+2019 RIGHT SINGLE QUOTATION MARK
>> ΠU+2018 LEFT SINGLE QUOTATION MARK
>> 
>> Same applies for Hebrew and i assume other languages when they are
>> written in rtl scripts.
>> 
>> (Note, btw, that these assignments are only default settings. They can
>> be changed using CSS if desired, eg. to substitute angle brackets for
>> quotes in Arabic text.)
>> 
>> Any thoughts on this?
>> 
>> RI
>> 
>> 
>> PS: (I guess I need to say ;-) Please keep replies to the questions
>> above, rather than moving the discussion (at least in this thread) to
>> whether the q element should or should not automatically apply 
quotation
>> marks and if so all the pitfalls that that may entail.
>> 
>> 
>> {1} http://dev.w3.org/html5/spec/rendering.html#quotes

>> 
>> 

Received on Tuesday, 8 November 2011 14:01:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 November 2011 14:01:46 GMT