W3C home > Mailing lists > Public > www-international@w3.org > July to September 2011

Issue with dcterms.language description

From: Richard Ishida <ishida@w3.org>
Date: Sat, 23 Jul 2011 09:26:54 +0100
Message-ID: <4E2A85CE.7070802@w3.org>
To: andy.powell@eduserv.org.uk, pete.johnston@eduserv.org.uk
CC: dc-usage@jiscmail.ac.uk, www International <www-international@w3.org>
I hope I'm addressing this to the right people. If not, please let me 
know where to send.

While reviewing the HTML5 Metaextensions registry I came across the 
entry for dcterms.language.  There are two issues with that that I'd 
like to bring to your attention:

[1] The description "A language of the resource. Recommended best 
practice is to use a controlled vocabulary such as RFC 4646 [RFC4646]." 
is referring to an out of date specification.  RFC 4646 was obsoleted by 
RFC 5646.

It would be much better to refer to BCP 47 
http://www.rfc-editor.org/rfc/bcp/bcp47.txt.  BCP 47 is an unchanging 
name created specifically to refer to the latest version of the specs 
related to tags for identifying languages.


[2] The 4th column contains the following text:

"Redundant with the lang attribute on the html element. (Browsers pay 
attention to the lang attribute but not dcterms.language)"

It's not clear to me who wrote that, but it appears to be misleading.

The lang (or xml:lang) attribute on the html element defines the default 
or primary language of the *text* inside the html element (and is used 
by such text-processing applications as spell-checking, style choices, 
voice browser settings, etc. which need a clear indication of which 
(one) language they are dealing with), whereas an indication of the 
language of 'the resource' is presumably intended to be metadata about 
the intended audience of the *resource as a whole*, as described in the 
HTTP specification referring to the Content-Language header 
(http://tools.ietf.org/html/rfc2616#section-14.12).

Note that the lang attribute can take only one language tag at a time as 
its value, since the text it is referring to can only be in one language 
at a time. The Content-Language header, however, can use as many 
language tags as are appropriate to describe the intended audience of 
the resource.

This makes the lang attribute and the Content-Language header like chalk 
and cheese.

Note also that the use of http-equiv=Content-Language on the meta 
element was recently declared non-conformant in HTML5, due to the 
confusion that has surrounded its use over the years.  I'd hate to 
revive that confusion with name=dcterms.language, and so I think it 
would be good to clarify the intended usage.

The loss of http-equiv=Content-Language of course means that there is no 
in-document way of signalling language metadata for the resource. I'm 
guessing that the intent of dcterms.language is to provide such a thing.

If so, I think its usage needs to be described more clearly as metadata 
about the intended audience of the resource, and linked to the HTTP 
Content-Language header.  It also has to allow for a comma-separated 
list of language tags (using BCP 47 rules).

If my assumptions are incorrect, I think it should be removed from the 
metaextensions registry.

I hope this is of some help. Please let me know your thoughts.

Best regards,
Richard.


-- 
Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/


Register for the W3C MultilingualWeb Workshop!
Limerick, 21-22 September 2011
http://multilingualweb.eu/register
Received on Saturday, 23 July 2011 08:27:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 23 July 2011 08:27:22 GMT