Re: Language of a node and HTML+RDFa JavaScript implementations

On Jun 6, 2013, at 9:15 AM, Manu Sporny <msporny@digitalbazaar.com> wrote:

> On 05/31/2013 01:10 PM, Alex Milowski wrote:
>> I dug into this a bit and essentially, as Peter points out, the 
>> "lang" property on any element node in the DOM is mostly useless for
>> determining the language of a node.  The language is defined in
>> HTML5 as the nearest ancestor with an lang/xml:lang attribute [1]. If
>> there is no such ancestor, the "pragma-set default language" is used
>> (i.e. the "meta" element with http-equiv="content-language").
> 
> I had a chat with the guys working on the HTML5 spec and DOM spec about
> this particular topic. The answer is a bit more complicated than we'd
> like it to be, but I think the logic on it is clear. Here's a link to
> the discussion:
> 
> http://krijnhoetmer.nl/irc-logs/whatwg/20130606#l-554
> 
> and a link to the DOM bug on this particular issue:
> 
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=16489
> 
> Just to be clear, this issue occurs when a Content-Language is set via
> an HTTP header and the document does not contain any language
> information. When processing an HTML5+RDFa document, a Javascript-based
> processor would think there was no language and a processor that has
> access to the HTTP response would set the language correctly.
> 
> We have a couple of options moving forward:
> 
> 1. Create a language discovery mechanism unique to HTML+RDFa.
> 2. Ignore Content-Language.
> 3. Follow the HTML5 spec wrt. language discovery.
> 
> Doing #1 and #2 basically equate to the same thing. We end up creating
> our own language discovery mechanism in RDFa and it's incompatible with
> all of the host languages. The editors of the HTML5 and DOM specs
> thought that this would be a bad direction and I agree with them.
> 
> The third option seems to be the most logical approach forward. The
> specification would stay as it is right now. Implementations must follow
> the HTML5 processing rules wrt. setting an element's language. The
> downside with this approach is that until the DOM bug above is fixed,
> Javascript-based processors will be non-conforming in this particular
> edge case (when the document's language is only set using
> Content-Language). The HTML5 and DOM spec authors agreed that this is
> the correct approach forward.

+1, as an HTML extension the third option is the only one that makes sense.

> To mitigate this issue, we can make a minor editorial change to the
> specification that states that authors SHOULD specify the language of
> their document in the document if they want to ensure that all RDFa
> processors will be capable of discovering the correct language for the
> document.

It seems like this note only serves a transient need, until the HTML/DOM specs are fixed, and there's really no evidence from the wild that this is causing any problems, so I don't really see the need to add such wording. However I will go along with this, if that's the consensus.

Gregg

> The premise is that there are not a large percentage of documents out
> there that are served in this way (it's a corner case), and addressing
> the issue is fairly trivial (specify the document language in the
> document itself).
> 
> Peter, Alex - does that solution work for each of you?
> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: Meritora - Web payments commercial launch
> http://blog.meritora.com/launch/
> 

Received on Thursday, 6 June 2013 16:48:56 UTC