Re: [clreq] Document uses artificial attribute data-lang instead of the built-in lang attribute

it is true that we should mark up the content with the lang attribute 
(and i plan to do that at some point, but not before FPWD), but 
data-lang was chosen for the bilingual management on purpose, since 
the use of the lang attribute may not always coincide with the 
_structural_ usage for which data-lang is designed.  

Both lang and data-lang mean different things and are used in 
different ways.  The lang attribute means just "the text in this 
element is in the declared language".  The data-lang attribute means 
"the text in this attribute is part of a pair of elements that make up
 a unit of content".  

This is most clearly seen where someone creates new content in 
chinese, but is unable to create the corresponding english text.  The 
guidelines (which i have prepared and will upload tomorrow now) say 
that both the data-lang=zh and the data-lang=en fields must be created
 at the same time, and if there is no english available, use the 
chinese twice.  This ensures that all content is available when one of
 the language filters (top right) is applied, and the structure is 
always properly balanced.  It also helps quickly identify items that 
are in need of translation – and will be a significant improvement 
over the process we've been using so far.  If we used the lang 
attribute the filters wouldn't work in the same way.  

Additionally, there are likely (as the text and inline markup is 
developed - it's still at an early stage, esp. for the Chinese), that 
there will be additional inline ranges to which the lang attribute is 
applied to mark language differences. These are irrelevant to the 
structural view of the document, and would make the filtering process 
much more complicated, and perhaps less successful, if we relied only 
on lang attributes.

Bottom line, lang attribute and data-lang attribute are doing 
different things, although there is likely to be some overlap, and 
they actually are properly tagged, although we do still need to add 
lang attributes for better font control.  (I'd like to explore the 
possibility of doing some of that via javascript, btw, to save work 
for the editors.)  For the time being, we have mixed SC and TC, so 
font issues are a secondary concern just now.

-- 
GitHub Notif of comment by r12a
See https://github.com/w3c/clreq/issues/71#issuecomment-122111170

Received on Thursday, 16 July 2015 21:41:53 UTC