Re: Language of a node and HTML+RDFa JavaScript implementations

Peter,

I *think* I understand the issue and, coincidentally, we will have a call in a
half an hour where this issue may be discussed. Again as an individual, I
believe that the only way we can handle that in RDFa is that the generated RDF
uses whatever the markup gives us (which indeed means that the current section
3.3. may not be precise enough). Ie, to use the example below, in the case of
Document 4:

<html><p>Document 4</p></html>

the generated RDF literal will _not_ include a language tag. Actually, that
would be the case for

<html><meta http-equiv="content-language" content="en"><p>Document>3</p></html>

because RDFa tries to be language neutral. AFAIK, all current RDFa processors
work this way.

I think the important point is that RDF makes a difference between plain
literals and literals with language tags. Ie, the generated RDF from RDFa has
the freedom to generate a plain literal if no language tag has been assigned.

Thanks!

Ivan


Peter Occil wrote:
> In that case, I'll just explain more what I mean.
> 
> Section 3.3 of HTML+RDFa says:
> 
>     RDFa processors MUST use the mechanism described in The lang and
>     xml:lang attributes section of the [HTML5] specification to determine
>     the language of a node.
> 
> And section 3.2.3.3, The lang and xml:lang attributes section [1] of HTML5
> reads, in part:
> 
>     To determine the language of a node, user agents must look at the nearest
>     ancestor element ... that has a lang attribute in the XML namespace set or
>     is an HTML element and has a lang in no namespace attribute set. [If there
>     is none,] but there is a pragma-set default language set, then that is the
>     language of the node. If there is no pragma-set default language set,
>     then language information from a higher-level protocol (such as HTTP),
>     if any, must be used as the final fallback language instead.
> 
> Accordingly, the process to determine the language of a node relies on a
> higher-level protocol if no language information is specified in the document
> itself. For
> example:
> 
> Assume that there is no HTTP Content-Language header set.  Then for this
> document:
> 
> <html lang="en"><p>Document 1</p></html>
> 
> Then the language of the "p" element is "en".  And for this document:
> 
> <html lang="fr"><meta http-equiv="content-language" content="en"><p>Document
> 2</p></html>
> 
> The language of the "p" element  is "fr". And for this document:
> 
> <html><meta http-equiv="content-language" content="en"><p>Document
> 3</p></html>
> 
> The language of the "p" element is "en".
> 
> Now assume that the HTTP headers include a Content-Language header with the
> value "de". Then for the three documents above, the language of the "p"
> element remains the same, since the language is given in the document itself.
> 
> But for this document:
> 
> <html><p>Document 4</p></html>
> 
> The language of the "p" element would be "de", the value of the Content-Language
> header, since no language information is given in the document. (If there were no
> Content-Language header, the language of the "p" element would be unknown.)
> 
> Thus, the language of a node relies on information not given in the
> document itself.
> 
> Unfortunately, unlike for most other values, there is no DOM attribute to
> get the "language of a node" within the meaning of HTML5.  In another mailing
> list thread [2] I stated that "[w]hile there is a 'lang' DOM attribute, it's
> inadequate
> because it's only affected by the element's 'lang' content attribute." The best
> one could do in this situation is traverse the document tree manually, and even so
> there is no way to get the value of the HTTP Content-Type header, which is needed
> if no language is stated in the document itself. There is also the
> "getComputedStyle(node).webkitLocale" attribute [3], but it isn't portable and,
> moreover, it relies on stylesheets.
> 
> This is why (most) browser-based JavaScript implementations cannot fully conform
> to HTML+RDFa, since finding the language of a node relies on information outside
> of the document tree.  While they will pass on the first three documents, they
> will fail on the fourth document.
> 
> --Peter
> 
> [1]: http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes
> [2]: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2013-April/039417.html
> [3]: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2013-May/039480.html
> 
> -----Original Message----- From: Ivan Herman
> Sent: Sunday, May 26, 2013 4:39 AM
> To: Peter Occil
> Cc: public-rdfa-wg@w3.org
> Subject: Re: Language of a node and HTML+RDFa JavaScript implementations
> 
> Dear Peter,
> 
> (to avoid misunderstanding, commenting here as a technical person, not
> as an official W3C staff member!)
> 
> First of all, thanks for the comment. I must admit I do not remember
> this issue coming up before. However... I believe that the fundamental
> approach to RDFa has always been that, conceptually, RDFa data
> extraction operates on the DOM tree, ie, whatever the DOM is supposed to
> give us. I have not checked the specs yet but I believe the specs that
> we refer to, and therefore RDFa itself, is in the clear in this sense...
> 
> Thanks
> 
> Ivan
> 
> Peter Occil wrote:
>> Section 3.3 of HTML+RDFa says:
>> RDFa processors MUST use the mechanism described in The lang and xml:lang
>> attributes section of the [HTML5] specification to determine the
>> language of a node.
>> But a node's language depends not only on the lang and xml:lang
>> attributes,
>> but also on the HTTP Content-Language header.
>> This may be problematic for JavaScript browser implementations since
>> there is no current
>> way to get the Content-Language header of a DOM Document object. That
>> means those
>> implementations will not be fully conforming, unless the rule is changed
>> so as to rely
>> on only the lang and xml:lang attributes, or a new DOM attribute is
>> added that retrieves
>> the Content-Language header, or in some other way.
>> Moreover, I don't believe this requirement is fully reflected in the
>> current
>> test suite, I'm not sure.
>> --Peter
> 

-- 
Ivan Herman, W3C
Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
http://www.ivan-herman.net/foaf#me

Received on Thursday, 30 May 2013 13:34:43 UTC