Re: Language of a node and HTML+RDFa JavaScript implementations

On 05/31/2013 01:10 PM, Alex Milowski wrote:
> I dug into this a bit and essentially, as Peter points out, the 
> "lang" property on any element node in the DOM is mostly useless for
>  determining the language of a node.  The language is defined in
> HTML5 as the nearest ancestor with an lang/xml:lang attribute [1]. If
> there is no such ancestor, the "pragma-set default language" is used
> (i.e. the "meta" element with http-equiv="content-language").

I had a chat with the guys working on the HTML5 spec and DOM spec about
this particular topic. The answer is a bit more complicated than we'd
like it to be, but I think the logic on it is clear. Here's a link to
the discussion:

http://krijnhoetmer.nl/irc-logs/whatwg/20130606#l-554

and a link to the DOM bug on this particular issue:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16489

Just to be clear, this issue occurs when a Content-Language is set via
an HTTP header and the document does not contain any language
information. When processing an HTML5+RDFa document, a Javascript-based
processor would think there was no language and a processor that has
access to the HTTP response would set the language correctly.

We have a couple of options moving forward:

1. Create a language discovery mechanism unique to HTML+RDFa.
2. Ignore Content-Language.
3. Follow the HTML5 spec wrt. language discovery.

Doing #1 and #2 basically equate to the same thing. We end up creating
our own language discovery mechanism in RDFa and it's incompatible with
all of the host languages. The editors of the HTML5 and DOM specs
thought that this would be a bad direction and I agree with them.

The third option seems to be the most logical approach forward. The
specification would stay as it is right now. Implementations must follow
the HTML5 processing rules wrt. setting an element's language. The
downside with this approach is that until the DOM bug above is fixed,
Javascript-based processors will be non-conforming in this particular
edge case (when the document's language is only set using
Content-Language). The HTML5 and DOM spec authors agreed that this is
the correct approach forward.

To mitigate this issue, we can make a minor editorial change to the
specification that states that authors SHOULD specify the language of
their document in the document if they want to ensure that all RDFa
processors will be capable of discovering the correct language for the
document.

The premise is that there are not a large percentage of documents out
there that are served in this way (it's a corner case), and addressing
the issue is fairly trivial (specify the document language in the
document itself).

Peter, Alex - does that solution work for each of you?

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.
blog: Meritora - Web payments commercial launch
http://blog.meritora.com/launch/

Received on Thursday, 6 June 2013 16:15:34 UTC