Possible issues with the CLDR quote marks info

Before I send a note to the Unicode Consortium, I thought I'd check for 
feedback here. Looking through the list of quotation marks that Ian 
Hickson {1} just added to the HTML5 spec I noticed one or two things 
that look like anomalies (in the Unicode data). (That table is generated 
automatically from the CLDR XML files.)

[1] A couple of locales have non-paired punctuation marks for secondary 
quotes. They are af and tg. tg is not yet confirmed, but af is. Is this 
really correct?

[2] The arabic entry has the following:

'\201c' '\201d' '\2018' '\2019'

ie.

“  ‎U+201C  LEFT DOUBLE QUOTATION MARK
”  ‎U+201D  RIGHT DOUBLE QUOTATION MARK
‘  ‎U+2018  LEFT SINGLE QUOTATION MARK
’  ‎U+2019  RIGHT SINGLE QUOTATION MARK

which corresponds to

quotationStart quotationEnd alternateQuotationStart alternateQuotationEnd

I think this is wrong. Since these are not mirrored characters in 
Unicode, surely the order should be

”  ‎U+201D  RIGHT DOUBLE QUOTATION MARK
“  ‎U+201C  LEFT DOUBLE QUOTATION MARK
’  ‎U+2019  RIGHT SINGLE QUOTATION MARK
‘  ‎U+2018  LEFT SINGLE QUOTATION MARK

Same applies for Hebrew and i assume other languages when they are 
written in rtl scripts.

(Note, btw, that these assignments are only default settings. They can 
be changed using CSS if desired, eg. to substitute angle brackets for 
quotes in Arabic text.)

Any thoughts on this?

RI


PS: (I guess I need to say ;-) Please keep replies to the questions 
above, rather than moving the discussion (at least in this thread) to 
whether the q element should or should not automatically apply quotation 
marks and if so all the pitfalls that that may entail.


{1} http://dev.w3.org/html5/spec/rendering.html#quotes


-- 
Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/

Received on Friday, 4 November 2011 15:33:40 UTC