Re: Language of a node and HTML+RDFa JavaScript implementations

I dug into this a bit and essentially, as Peter points out, the "lang"
property on any element node in the DOM is mostly useless for determining
the language of a node.  The language is defined in HTML5 as the nearest
ancestor with an lang/xml:lang attribute [1].  If there is no such
ancestor, the "pragma-set default language" is used (i.e. the "meta"
element with http-equiv="content-language").

...and then there is this document [3] that says the "meta" element should
not be used.

I think it would be good idea, as a future option, to allow the language to
set in the initial context.  This would allow processors to property pick
up the Content-Language header from the environment.  This would require a
change to section 7.2.  It would also be a quality of implementation detail
because the Content-Language isn't necessarily exposed in a standard way in
all environments (e.g. the browser vs. otherwise).

The good news is that the algorithm will correctly calculate the language
for an element in a HTML document as specified as long as the user does not
use the meta element or expect the Content-Language header to set the
default.  Those are somewhat big caveats for random HTML "in the wild" but
not necessarily a show-stopper for people publishing new documents using
RDFa.  That is, the recommended path forward is to use lang/xml:lang
attributes as appropriate.

[1] http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes
[2]
http://www.w3.org/TR/html5/document-metadata.html#pragma-set-default-language
[3] http://www.w3.org/International/questions/qa-html-language-declarations




On Fri, May 31, 2013 at 2:58 AM, Ivan Herman <ivan@w3.org> wrote:

> It was discussed yesterday on the call and this is certainly the general
> direction. Manu will check with the HTML5 experts on what happens in the
> DOM...
>
> Thanks
>
> Ivan
>
> Peter Occil wrote:
> > Your suggestion would be fine by me if it's accepted by the working
> group.
> >
> > --Peter
> >
> > -----Original Message----- From: Ivan Herman
> > Sent: Thursday, May 30, 2013 9:34 AM
> > To: Peter Occil
> > Cc: public-rdfa-wg@w3.org
> > Subject: Re: Language of a node and HTML+RDFa JavaScript implementations
> >
> > Peter,
> >
> > I *think* I understand the issue and, coincidentally, we will have a
> call in a
> > half an hour where this issue may be discussed. Again as an individual, I
> > believe that the only way we can handle that in RDFa is that the
> generated RDF
> > uses whatever the markup gives us (which indeed means that the current
> section
> > 3.3. may not be precise enough). Ie, to use the example below, in the
> case of
> > Document 4:
> >
> > <html><p>Document 4</p></html>
> >
> > the generated RDF literal will _not_ include a language tag. Actually,
> that
> > would be the case for
> >
> > <html><meta http-equiv="content-language"
> content="en"><p>Document>3</p></html>
> >
> > because RDFa tries to be language neutral. AFAIK, all current RDFa
> processors
> > work this way.
> >
> > I think the important point is that RDF makes a difference between plain
> > literals and literals with language tags. Ie, the generated RDF from
> RDFa has
> > the freedom to generate a plain literal if no language tag has been
> assigned.
> >
> > Thanks!
> >
> > Ivan
> >
>
> --
> Ivan Herman, W3C
> Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> http://www.ivan-herman.net/foaf#me
>



-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Friday, 31 May 2013 17:10:52 UTC