Re: Language of a node and HTML+RDFa JavaScript implementations

In that case, I'll just explain more what I mean.

Section 3.3 of HTML+RDFa says:

     RDFa processors MUST use the mechanism described in The lang and
     xml:lang attributes section of the [HTML5] specification to determine
     the language of a node.

And section 3.2.3.3, The lang and xml:lang attributes section [1] of HTML5
reads, in part:

     To determine the language of a node, user agents must look at the 
nearest
     ancestor element ... that has a lang attribute in the XML namespace set 
or
     is an HTML element and has a lang in no namespace attribute set. [If 
there
     is none,] but there is a pragma-set default language set, then that is 
the
     language of the node. If there is no pragma-set default language set,
     then language information from a higher-level protocol (such as HTTP),
     if any, must be used as the final fallback language instead.

Accordingly, the process to determine the language of a node relies on a
higher-level protocol if no language information is specified in the 
document itself. For
example:

Assume that there is no HTTP Content-Language header set.  Then for this
document:

<html lang="en"><p>Document 1</p></html>

Then the language of the "p" element is "en".  And for this document:

<html lang="fr"><meta http-equiv="content-language" content="en"><p>Document
2</p></html>

The language of the "p" element  is "fr". And for this document:

<html><meta http-equiv="content-language" content="en"><p>Document
3</p></html>

The language of the "p" element is "en".

Now assume that the HTTP headers include a Content-Language header with the
value "de". Then for the three documents above, the language of the "p"
element remains the same, since the language is given in the document 
itself.

But for this document:

<html><p>Document 4</p></html>

The language of the "p" element would be "de", the value of the 
Content-Language
header, since no language information is given in the document. (If there 
were no
Content-Language header, the language of the "p" element would be unknown.)

Thus, the language of a node relies on information not given in the
document itself.

Unfortunately, unlike for most other values, there is no DOM attribute to
get the "language of a node" within the meaning of HTML5.  In another 
mailing
list thread [2] I stated that "[w]hile there is a 'lang' DOM attribute, it's 
inadequate
because it's only affected by the element's 'lang' content attribute." The 
best
one could do in this situation is traverse the document tree manually, and 
even so
there is no way to get the value of the HTTP Content-Type header, which is 
needed
if no language is stated in the document itself. There is also the
"getComputedStyle(node).webkitLocale" attribute [3], but it isn't portable 
and,
moreover, it relies on stylesheets.

This is why (most) browser-based JavaScript implementations cannot fully 
conform to HTML+RDFa, since finding the language of a node relies on 
information outside of the document tree.  While they will pass on the first 
three documents, they will fail on the fourth document.

--Peter

[1]: http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes
[2]: 
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2013-April/039417.html
[3]: 
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2013-May/039480.html

-----Original Message----- 
From: Ivan Herman
Sent: Sunday, May 26, 2013 4:39 AM
To: Peter Occil
Cc: public-rdfa-wg@w3.org
Subject: Re: Language of a node and HTML+RDFa JavaScript implementations

Dear Peter,

(to avoid misunderstanding, commenting here as a technical person, not
as an official W3C staff member!)

First of all, thanks for the comment. I must admit I do not remember
this issue coming up before. However... I believe that the fundamental
approach to RDFa has always been that, conceptually, RDFa data
extraction operates on the DOM tree, ie, whatever the DOM is supposed to
give us. I have not checked the specs yet but I believe the specs that
we refer to, and therefore RDFa itself, is in the clear in this sense...

Thanks

Ivan

Peter Occil wrote:
> Section 3.3 of HTML+RDFa says:
> RDFa processors MUST use the mechanism described in The lang and xml:lang
> attributes section of the [HTML5] specification to determine the
> language of a node.
> But a node's language depends not only on the lang and xml:lang
> attributes,
> but also on the HTTP Content-Language header.
> This may be problematic for JavaScript browser implementations since
> there is no current
> way to get the Content-Language header of a DOM Document object. That
> means those
> implementations will not be fully conforming, unless the rule is changed
> so as to rely
> on only the lang and xml:lang attributes, or a new DOM attribute is
> added that retrieves
> the Content-Language header, or in some other way.
> Moreover, I don't believe this requirement is fully reflected in the
> current
> test suite, I'm not sure.
> --Peter

-- 
Ivan Herman, W3C
Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
http://www.ivan-herman.net/foaf#me 

Received on Thursday, 30 May 2013 03:04:54 UTC